AvocadoDB query language (DRAFT!)

AvocadoDB query language
Jan Steemann (triAGENS)

© 2012 triAGENS GmbH | 2012-04-13 1

Database query languages / paradigms

 There are many database query languages and paradigms around
 Some examples:
 SQL
declarative query language for relational databases, well-known and popular
 UNQL
declarative query language for document databases, SQL-syntax like,
embeds JSON
 graph query languages (Cypher, Gremlin, ...)
declarative languages focusing on graph queries
 fluent query languages/interfaces
e.g. db.user.find(...).sort(...)
 map/reduce
imperative query formulation/programming
 ...

© 2012 triAGENS GmbH | 2012-04-13 2

AvocadoDB query language: status quo

 There is a query language in AvocadoDB
 The language syntax is very similar to SQL / UNQL
 The language currently supports reading data from collections (i.e.
equivalent to an SQL/UNQL SELECT query)
 Some complex access patterns (e.g. joins using multiple
collections) are also supported
 There are some specials as creating inline lists from a list of
documents (named: LIST JOIN)

© 2012 triAGENS GmbH | 2012-04-13 3


 There is a query language in AvocadoDB
 The language syntax is very similar to SQL / UNQL
 The language currently supports reading data from collections (i.e.
equivalent to an SQL/UNQL SELECT query)
 Some complex access patterns (e.g. joins using multiple
collections) are also supported
 There are some specials as creating inline lists from a list of
documents (named: LIST JOIN)

© 2012 triAGENS GmbH | 2012-04-13 4


 Syntax example:
SELECT
{ "user": u, "friends": f }
FROM
users u
LIST JOIN
friends f
ON (u.id == f.uid)
WHERE
u.type == 1
ORDER BY
u.name

© 2012 triAGENS GmbH | 2012-04-13 5

Language problems

 The current query language has the problem that
some queries cannot be expressed very well with it
 This might be due to the query language being based on SQL,
and SQL being a query language for relational databases
 AvocadoDB is mainly a document-oriented database and its
object model does only partly overlap with the SQL object model:
 SQL (relational):  AvocadoDB (document-oriented):
 tables  collections
 (homogenous) rows  (inhomogenous) documents
 columns  attributes
 scalars  scalars
 lists
 references  edges

© 2012 triAGENS GmbH | 2012-04-13 6

Language problems: multi-valued attributes

 Attributes in AvocadoDB can and shall be stored denormalised
(multi-valued attributes, lists, ...):
{ "user":
{ "name": "Fred",
"likes": [ "Fishing", "Hiking", "Swimming" ]
}
}
 In an SQL database, this storage model would be an anti-pattern
 Problem: SQL is not designed to access multi-valued attributes/lists
but in AvocadoDB we want to support them via the language
 UNQL addresses this partly, but does not go far enough

© 2012 triAGENS GmbH | 2012-04-13 7

Language problems: graph queries

 AvocadoDB also supports querying graphs
 Neither SQL nor UNQL offer any „natural“ graph traversal facilities
 Instead, there are:
 SQL language extensions: e.g. CONNECT BY, proprietary
 SQL stored procedures: e.g. PL/SQL imperative code, does not match
well with the declarative nature of SQL
 Neither SQL nor UNQL are the languages of choice for graph
queries, but we want to support graph queries in AvocadoDB

© 2012 triAGENS GmbH | 2012-04-13 8

AvocadoDB query language, version 2

 During the past few weeks we thought about moving
AvocadoDB's query language from the current SQL-/
UNQL-based syntax to something else
 We did not find an existing query language that addresses
the problems we had too well
 So we tried to define a syntax for a new query language

© 2012 triAGENS GmbH | 2012-04-13 9


 The new AvocadoDB query language should
 have an easy-to-understand syntax for the end user
 offer a way to declaratively express queries
 avoid ASCII art queries
 still allow more complex queries (joins, sub-queries etc.)
 allow accessing lists and list elements more naturally
 be usable with the different data models AvocadoDB supports
(e.g. document-oriented, graph, „relational“)
 be consistent and easy to process
 have one syntax regardless of the underlying client language

© 2012 triAGENS GmbH | 2012-04-13 10


 A draft of the new language version is presented as follows
 It is not yet finalized and not yet implemented
 Your feedback on it is highly appreciated
 Slides will be uploaded to http://www.avocadodb.org/

© 2012 triAGENS GmbH | 2012-04-13 11

Data types

 The language has the following data types:
 absence of a value:
null
 boolean truth values:
false, true
 numbers (signed double precision):
1, -34.24
 strings, e.g.
"John", "goes fishing"
 lists (with elements accessible by their position), e.g.
[ "one", "two", false, -1 ]
 documents (with elements accessible by their name), e.g.
{ "user": { "name": "John", "age": 25 } }

Note: names of document attributes can also be used without surrounding quotes

© 2012 triAGENS GmbH | 2012-04-13 12

Bind parameters

 Queries can be parametrized using bind parameters
 This allows separation of query text and actual query values
 Any literal values, including lists and documents can be bound
 Collection names can also be bound
 Bind parameters can be accessed in the query using the @ prefix
 Example:
@age
u.name == @name
u.state IN @states

© 2012 triAGENS GmbH | 2012-04-13 13

Operators

 The language has the following operators:
 logical: will return a boolean value or an error
&& || !
 arithmetic: will return a numeric value or an error
+ - * / %
 relational: will return a boolean value or an error
== != < <= > >= IN
 ternary: will return the true or the false part
? :
 String concatentation will be provided via a function

© 2012 triAGENS GmbH | 2012-04-13 14

Type casts

 Typecasts can be achieved by explicitly calling typecast functions
 No implicit type cast will be performed
 Performing an operation with invalid/inappropriate types
will result in an error
 When performing an operation that does not have a valid
or defined result, the outcome will be an error:
1 / 0 => error
1 + "John" => error
 Errors might be caught and converted to null in a query
or bubble up to the top, aborting the query.
This depends on settings

© 2012 triAGENS GmbH | 2012-04-13 15

Null

 When referring to something non-existing (e.g. a non-existing
attribute of a document), the result will be null:
users.nammme => null
 Using the comparison operators, null can be compared to other
values and also null itself. The result will be a boolean
(not null as in SQL)

© 2012 triAGENS GmbH | 2012-04-13 16

Type comparisons

 When comparing two values, the following algorithm is used
 If the types of the compared values are not equal,
the compare result is as follows:
null < boolean < number < string < list < document
 Examples:
null < false 0 != null
false < 0 null != false
true < 0 false != ""
true < [ 0 ] "" != [ ]
true < [ ] null != [ ]
0 < [ ]
[ ] < { }

© 2012 triAGENS GmbH | 2012-04-13 17

Type comparisons

 If the types are equal, the actual values are compared
 For boolean values, the order is:
false < true
 For numeric values, the order is determined by the numeric value
 For string values, the order is determined by bytewise comparison
of the strings characters
 Note: at some point, collations will need to be introduced for
string comparisons

© 2012 triAGENS GmbH | 2012-04-13 18

Type comparisons

 For list values, the elements from both lists are compared at each
position. For each list element value, the described comparisons
will be done recursively:
[ 1 ] > [ 0 ]
[ 2, 0 ] > [ 1, 2 ]
[ 99, 4 ] > [ 99, 3 ]
[ 23 ] > [ true ]
[ [ 1 ] ] > 99
[ ] > 1
[ true ] > [ ]
[ null ] > [ ]
[ true, 0 ] > [ true ]

© 2012 triAGENS GmbH | 2012-04-13 19

Type comparisons

 For document values, the attribute names from both documents
are collected and sorted. The sorted attribute names are then
checked individually: if one of the documents does not have the
attribute, it will be considered „smaller“. If both documents have the
attribute, a value comparison will be done recursively:
{ } < { "age": 25 }
{ "age": 25 } < { "age": 26 }
{ "age": 25 } > { "name": "John" }
{ "name": "John", == { "age": 25,
"age": 25 } "name": "John" }
{ "age": 25 } < { "age": 25,
"name": "John" }

© 2012 triAGENS GmbH | 2012-04-13 20

Base building block: lists

 A good part of the query language is about processing lists
 There are several types of lists:
 statically declared lists, e.g.
[ { "user": { "name": "Fred" } },
{ "user": { "name": "John" } } ]
 lists of documents from collections, e.g.
 users
 locations
 result lists from filters/queries, e.g.
 NEAR(locations, [ 43, 10 ], 100)

© 2012 triAGENS GmbH | 2012-04-13 21

FOR: List iteration

 The FOR keyword can be used to iterate over all elements from a list
 Example (collection-based, collection „users“):
FOR
u IN users
 A result document (named: u) is produced on each iteration
 The above example produces the following result list:
[ u1, u2, u3, ..., un ]
 Note: this is comparable to the following SQL:
SELECT * FROM users u
 In each iteration, the individual element is accessible via its name (u)

© 2012 triAGENS GmbH | 2012-04-13 22

FOR: List iteration

 Nesting of multiple FOR blocks is possible
 Example: cross product of users and locations (u x l):
FOR
u IN users
FOR
l IN locations
 A result document containing both variables (u, l) is produced on each
iteration of the inner loop
 The result document contains both u and l
 Note: this is equivalent to the following SQL queries:
SELECT * FROM users u, locations l
SELECT * FROM users u INNER JOIN locations l
ON (1=1)

© 2012 triAGENS GmbH | 2012-04-13 23

FOR: List iteration

 Example: cross product of years & quarters (non collection-based):
FOR
year IN [ 2011, 2012, 2013 ]
FOR
quarter IN [ 1, 2, 3, 4 ]
 Note: this is equivalent to the following SQL query:
SELECT * FROM
(SELECT 2011 UNION SELECT 2012 UNION
SELECT 2013) year,
(SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION
SELECT 4) quarter

© 2012 triAGENS GmbH | 2012-04-13 24

FILTER: results filtering

 The FILTER keyword can be used to restrict the results to
elements that match some definable condition
 Example: retrieve all users that are active
Access to the individual
FOR list elements in FOR list
u IN users using variable name u
FILTER
u.active == true
 Note: this is equivalent to the following SQL:
SELECT * FROM users u WHERE u.active = true

© 2012 triAGENS GmbH | 2012-04-13 25


 The FILTER keyword in combination with nested FOR blocks
can be used to perform joins
 Example: retrieve all users that have matching locations
FOR
u IN users
FOR Access to the individual
l IN locations list elements using
FILTER variable names
u.a == l.b
 Note: this is equivalent to the following SQL queries:
SELECT * FROM users u, locations l
WHERE u.a == l.b
SELECT * FROM users u (INNER) JOIN locations l
ON u.a == l.b

© 2012 triAGENS GmbH | 2012-04-13 26

Base building block: scopes

 The query language is scoped
 Variables can only be used after they have been declared
 Example: Introduces u
FOR
u IN users
FOR Introduces l
l IN locations
FILTER
u.a == l.b
Can use both u and l

 Scopes can be made explicit using brackets (will be shown later)

© 2012 triAGENS GmbH | 2012-04-13 27


 Thanks to scopes, the FILTER keyword can be used
everywhere where SQL needs multiple keywords:
 ON
 WHERE
 HAVING

© 2012 triAGENS GmbH | 2012-04-13 28


 That means: in AvocadoDB you would use FILTER
FOR
u IN users
FOR
l IN locations
FILTER
u.a == l.b
 whereas in SQL you would use either ON
SELECT * FROM users (INNER) JOIN locations l
ON u.a == l.b
 or WHERE:
SELECT * FROM users, locations l
WHERE u.a == l.b

© 2012 triAGENS GmbH | 2012-04-13 29


 FILTER can be used to model both an SQL ON and
an SQL WHERE in one go:
FOR
u IN users
FOR
l IN locations
FILTER
u.active == 1 && u.a == l.b
 This is equivalent to the following SQL query:
SELECT * FROM users u (INNER) JOIN locations l
ON u.a == l.b WHERE u.active = 1

© 2012 triAGENS GmbH | 2012-04-13 30


 More than one FILTER condition allowed per query
 The following queries are all equivalent
 Optimizer's job is to figure out best positions for applying FILTERs
FOR FOR FOR
u IN users u IN users u IN users
FILTER FOR FOR
u.c == 1 == l IN locations == l IN locations
FOR FILTER FILTER
l IN locations u.c == 1 && l.d == 2 &&
FILTER l.d == 2 && u.a == l.b
l.d == 2 u.a == l.b FILTER
FILTER u.c == 1
u.a == l.b

© 2012 triAGENS GmbH | 2012-04-13 31

RETURN: results projection

 The RETURN keyword produces the end result documents
from the intermediate results produced by the query
 Comparable to the SELECT part in an SQL query
 RETURN part is mandatory at the end of a query
(and at the end of each subquery)
 RETURN is partly left out in this presentation for space reasons

© 2012 triAGENS GmbH | 2012-04-13 32


 Example:
FOR
u IN users
RETURN {
"name" : u.name,
"likes" : u.likes,
"numFriends": LENGTH(u.friends)
}
 Produces such document for each u found

© 2012 triAGENS GmbH | 2012-04-13 33


 To return all documents as they are in the original list, there
is the following variant:
FOR
u IN users
RETURN
u
 Would produce:
[ { "name": "John", "age": 25 },
{ "name": "Tina", "age": 29 },
... ]
 Note: this is similar to SQL's SELECT u.*

© 2012 triAGENS GmbH | 2012-04-13 34


 To return just the names for all users, the following query would do:
FOR
u IN users
RETURN
u.name
 Would produce:
[ "John", "Tina", ... ]
 Note: this is similar to SQL's SELECT u.name

© 2012 triAGENS GmbH | 2012-04-13 35


 To return a hierchical result (e.g. data from multiple collections),
the following query could be used:
FOR
u IN users
FOR
l IN locations
RETURN { "user": u, "location" : l }
 Would produce:
[ { "user": { "name": "John", "age": 25 },
"location": { "x": 1, "y“: -1 } },
{ "user": { "name": "Tina", "age": 29 },
"location": { "x": -2, "y": 3 } },
... ]

© 2012 triAGENS GmbH | 2012-04-13 36


 To return a flat result from hierchical data (e.g. data from multiple
collections), the MERGE() function can be employed:
FOR
u IN users
FOR
l IN locations
RETURN MERGE(u, l)
 Would produce:
[ { "name": "John", "age": 25,
"x": 1, "y": -1 },
{ "name": "Tina", "age": 29,
"x": -2, "y": 3 },
... ]

© 2012 triAGENS GmbH | 2012-04-13 37

SORT: Sorting

 The SORT keyword will force a sort of the list of intermediate
results according to one or multiple criteria
 Example (sort by first and last name first, then by id):
FOR
u IN users
FOR
l IN locations
SORT
u.first, u.last, l.id DESC
 This is very similar to ORDER BY in SQL

© 2012 triAGENS GmbH | 2012-04-13 38

LIMIT: Result set slicing

 The LIMIT keyword allows slicing the list of result documents using
an offset and a count
 Example for top 3 (offset = 0, count = 3):
FOR
u IN users
SORT
u.first, u.last
LIMIT
0, 3

© 2012 triAGENS GmbH | 2012-04-13 39

LET: variable creation

 The LET keyword can be used to create a variable
using data from a subexpression (e.g. a FOR expression)
 Example (will populate variable t with the result of the FOR):
LET t = (
FOR explicit scope bounds
u IN users
)
 This will populate t with
[ u1, u2, u3, u4, ... un ]

© 2012 triAGENS GmbH | 2012-04-13 40


 The results created using LET can be filtered afterwards
using the FILTER keyword
 This is then similar to the behaviour of HAVING in SQL
 Example using a single collection (users):
FOR Iterates over an attribute
(„friends“) of each u
u IN users
LET friends = (
FOR function to retrieve the
f IN u.friends length of a list
)
FILTER
LENGTH(friends) > 5

© 2012 triAGENS GmbH | 2012-04-13 41


 Example using two collections (users, friends):
FOR
u IN users
LET friends = (
FOR
f IN friends
FILTER
u.id == f.uid
)
FILTER
LENGTH(friends) > 5
 Differences to previous one collection example:
 replaced f IN u.friends with just f IN friends
 added inner filter condition

© 2012 triAGENS GmbH | 2012-04-13 42


 SQL approach:
SELECT u.*, GROUP_CONCAT(f.uid) AS friends
FROM users u (INNER) JOIN friends f
ON u.id = f.uid
GROUP BY u.id HAVING COUNT(f.uid) > 5
 Notes:
 we are using 2 different tables now
 the GROUP_CONCAT() aggregate function will create the
friend list as a comma-separated string
 need to use GROUP BY to aggregate
 non-portable: GROUP_CONCAT is available in MySQL only

© 2012 triAGENS GmbH | 2012-04-13 43


 More complex example (selecting users along with logins and group membership):
FOR
u IN users
LET logins = ( for each user, all users
FOR logins are put into
l IN logins_2012 variable „logins“
FILTER
u.id == l.uid for each user, all group
) memberships are put into
variable „groups“
LET groups = (
FOR logins and groups are
g IN group_memberships independent of each
FILTER other
u.id == g.uid
)
RETURN {
"user": u, "logins": logins, "groups": groups
}

© 2012 triAGENS GmbH | 2012-04-13 44

COLLECT: grouping

 The COLLECT keyword can be used to group a list by
one or multiple group criteria
 Difference to SQL: in AvocadoDB COLLECT performs grouping,
but no aggregation
 Aggregation can be performed later using LET or RETURN
 The result of COLLECT is a (grouped/hierarchical) list of
documents, containing one document for each group
 This document contains the group criteria values
 The list of documents for the group can optionally be retrieved
by using the INTO keyword

© 2012 triAGENS GmbH | 2012-04-13 45

COLLECT: grouping

 Example: retrieve the users per city (non-aggregated):
FOR
u IN users group criterion
COLLECT (name: „city“, value: u.city)
city = u.city
captures group values into
INTO g variable g
RETURN { "c": city, "u": g } g contains all group
members
 Produces the following result:
[ { "c": "cgn",
"u": [ { "u": {..} }, { "u": {..} }, { "u": {..} } ] },
{ "c": "ffm",
"u": [ { "u": {..} }, { "u": {..} } ],
{ "c": "ddf",
"u": [ { "u": {..} } ] } ]

© 2012 triAGENS GmbH | 2012-04-13 46

COLLECT: grouping

 Example: retrieve the number of users per city (aggregated):
FOR
u IN users
COLLECT
city = u.city
INTO g
RETURN { "c": city, "numUsers": LENGTH(g) }
 Produces the following result:
[ { "c": "cgn", "numUsers": 3 },
{ "c": "ffm", "numUsers": 2 },
{ "c": "ddf", "numUsers": 1 } ]

© 2012 triAGENS GmbH | 2012-04-13 47

Aggregate functions

 Query language should provide some aggregate functions, e.g.
 MIN()
 MAX()
 SUM()
 LENGTH()
 Input to aggregate functions is a list of values to process. Example:
[ { "user": { "type": 1, "rating": 1 } },
{ "user": { "type": 1, "rating": 4 } },
{ "user": { "type": 1, "rating": 3 } } ]
 Problem: how to access the „user.rating“ attribute of each value
inside the aggregate function?

© 2012 triAGENS GmbH | 2012-04-13 48

Aggregate functions

 Solution 1: use „access to all list members“ shortcut:
FOR
u IN [ { "user": { "type": 1, "rating": 1 } },
{ "user": { "type": 1, "rating": 4 } },
{ "user": { "type": 1, "rating": 3 } } ]
COLLECT
type = u.type g[*] will iterate over all elements
INTO g in g and return each elements
u.user.rating attribute
RETURN {
"type": type,
"maxRating": MAX(g[*].u.user.rating)
}

© 2012 triAGENS GmbH | 2012-04-13 49

Aggregate functions

 Solution 2: use FOR sub-expression to iterate over group elements
 FOR
u IN users
capture group values
COLLECT
city = u.city g is a variable containing
INTO g all group members
RETURN {
"c" : city,
"numUsers" : LENGTH(g),
"maxRating": MAX((FOR sub-expression to iterate over
e IN g all elements in the group
RETURN
e.user.rating))
}

© 2012 triAGENS GmbH | 2012-04-13 50

Unions and intersections

 Unions and intersections can be created by invoking functions on
lists:
 UNION(list1, list2)
 INTERSECTION(list1, list2)
 There will not be special keywords as in SQL

© 2012 triAGENS GmbH | 2012-04-13 51

Graph queries

 In AvocadoDB, relations between documents can be stored
using graphs
 Graphs can be used to model tree structures, networks etc.
 Popular use cases:
 find friends of friends
 find similarities
 find recommendations

© 2012 triAGENS GmbH | 2012-04-13 52

Graph queries

 In AvocadoDB, a graph is composition of
 vertices: the nodes in the graph
 edges: the relations between nodes in the graph
 Vertices are stored as documents in regular collections
 Edges are stored as documents in special edge collections, with
each edge having the following attributes:
 _from id of linked vertex (incoming relation)
 _to id of linked vertex (outgoing relation)
 Additionally, all document have an _id attribute
 The _id values are used for linking in the edges collections

© 2012 triAGENS GmbH | 2012-04-13 53

Graph queries

 Task: find direct friends of users
 Data: users are related (friend relationships) to other users
 Example data (vertex collection „users“):
[ { "_id": 123, "name": "John", "age": 25 },
{ "_id": 456, "name": "Tina", "age": 29 },
{ "_id": 235, "name": "Bob", "age": 15 },
{ "_id": 675, "name": "Phil", "age": 12 } ]
 Example data (edge collection „relations“):
[ { "_id": 1, "_from": 123, "_to": 456 },
{ "_id": 2, "_from": 123, "_to": 235 },
{ "_id": 3, "_from": 456, "_to": 123 },
{ "_id": 4, "_from": 456, "_to": 235 },
{ "_id": 5, "_from": 235, "_to": 456 },
{ "_id": 6, "_from": 235, "_to": 675 } ]

© 2012 triAGENS GmbH | 2012-04-13 54

Graph queries

 To traverse the graph, the PATHS function can be used
 It traverses a graph's edges defined in an edge collection and
produces a list of paths found
 Each path object will have the following properties:
 _from id of vertex the path started at
 _to id of vertex the path ended with
 _edges edges visited along the path
 _vertices vertices visited along the path

© 2012 triAGENS GmbH | 2012-04-13 55

Graph queries

 Example: edge collection: relations
direction: OUTBOUND
FOR max path length: 1
u IN users
LET friends = ( path variable name: p
FOR
p IN PATHS(relations, OUTBOUND, 1)
FILTER
p._from == u._id only consider paths starting at
) the current user (using the
user's _id attribute)

© 2012 triAGENS GmbH | 2012-04-13 56

Graph queries

 Produces:
[ { "u": { "_id": 123, "name": "John", "age": 25 },
"p": [ { "_from": 123, "_to": 456, ... },
{ "_from": 123, "_to": 235, ... } ] },
{ "u": { "_id": 456, "name": "Tina", "age": 29 },
"p": [ { "_from": 456, "_to": 123, ... },
{ "_from": 456, "_to": 235, ... } ] },
{ "u": { "_id": 235, "name": "Bob", "age": 15},
"p": [ { "_from": 235, „_to": 456, ... },
{ "_from": 235, „_to": 675, ... } ] },
{ "u": { "_id": 675, "name": "Phil", "age": 12 },
"p": [ ] }
]
 Note: _edges and _vertices attributes for each p left out
for space reasons

© 2012 triAGENS GmbH | 2012-04-13 57

Summary: main keywords

Keyword Use case
 FOR ... IN  List iteration
 FILTER  Results filtering
 RETURN  Results projection
 SORT  Sorting
 LIMIT  Results set slicing
 LET  Variable creation
 COLLECT ... INTO  Grouping

© 2012 triAGENS GmbH | 2012-04-13 58

Q&A

 Your feedback on the draft is highly appreciated
 Please let us know what you think:
 m.schoenert@triagens.de
f.celler@triagens.de
j.steemann@triagens.de
#AvocadoDB

 And please try out AvocadoDB:
 http://www.avocadodb.org/
 https://github.com/triAGENS/AvocadoDB

© 2012 triAGENS GmbH | 2012-04-13 59

AvocadoDB query language (DRAFT!)

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (19)

En vedette

En vedette (9)

Similaire à AvocadoDB query language (DRAFT!)

Similaire à AvocadoDB query language (DRAFT!) (20)

Dernier

Dernier (20)

AvocadoDB query language (DRAFT!)