Contenu connexe Similaire à AvocadoDB query language (DRAFT!) (20) AvocadoDB query language (DRAFT!)2. Database query languages / paradigms
There are many database query languages and paradigms around
Some examples:
SQL
declarative query language for relational databases, well-known and popular
UNQL
declarative query language for document databases, SQL-syntax like,
embeds JSON
graph query languages (Cypher, Gremlin, ...)
declarative languages focusing on graph queries
fluent query languages/interfaces
e.g. db.user.find(...).sort(...)
map/reduce
imperative query formulation/programming
...
© 2012 triAGENS GmbH | 2012-04-13 2
3. AvocadoDB query language: status quo
There is a query language in AvocadoDB
The language syntax is very similar to SQL / UNQL
The language currently supports reading data from collections (i.e.
equivalent to an SQL/UNQL SELECT query)
Some complex access patterns (e.g. joins using multiple
collections) are also supported
There are some specials as creating inline lists from a list of
documents (named: LIST JOIN)
© 2012 triAGENS GmbH | 2012-04-13 3
4. AvocadoDB query language: status quo
There is a query language in AvocadoDB
The language syntax is very similar to SQL / UNQL
The language currently supports reading data from collections (i.e.
equivalent to an SQL/UNQL SELECT query)
Some complex access patterns (e.g. joins using multiple
collections) are also supported
There are some specials as creating inline lists from a list of
documents (named: LIST JOIN)
© 2012 triAGENS GmbH | 2012-04-13 4
5. AvocadoDB query language: status quo
Syntax example:
SELECT
{ "user": u, "friends": f }
FROM
users u
LIST JOIN
friends f
ON (u.id == f.uid)
WHERE
u.type == 1
ORDER BY
u.name
© 2012 triAGENS GmbH | 2012-04-13 5
6. Language problems
The current query language has the problem that
some queries cannot be expressed very well with it
This might be due to the query language being based on SQL,
and SQL being a query language for relational databases
AvocadoDB is mainly a document-oriented database and its
object model does only partly overlap with the SQL object model:
SQL (relational): AvocadoDB (document-oriented):
tables collections
(homogenous) rows (inhomogenous) documents
columns attributes
scalars scalars
lists
references edges
© 2012 triAGENS GmbH | 2012-04-13 6
7. Language problems: multi-valued attributes
Attributes in AvocadoDB can and shall be stored denormalised
(multi-valued attributes, lists, ...):
{ "user":
{ "name": "Fred",
"likes": [ "Fishing", "Hiking", "Swimming" ]
}
}
In an SQL database, this storage model would be an anti-pattern
Problem: SQL is not designed to access multi-valued attributes/lists
but in AvocadoDB we want to support them via the language
UNQL addresses this partly, but does not go far enough
© 2012 triAGENS GmbH | 2012-04-13 7
8. Language problems: graph queries
AvocadoDB also supports querying graphs
Neither SQL nor UNQL offer any „natural“ graph traversal facilities
Instead, there are:
SQL language extensions: e.g. CONNECT BY, proprietary
SQL stored procedures: e.g. PL/SQL imperative code, does not match
well with the declarative nature of SQL
Neither SQL nor UNQL are the languages of choice for graph
queries, but we want to support graph queries in AvocadoDB
© 2012 triAGENS GmbH | 2012-04-13 8
9. AvocadoDB query language, version 2
During the past few weeks we thought about moving
AvocadoDB's query language from the current SQL-/
UNQL-based syntax to something else
We did not find an existing query language that addresses
the problems we had too well
So we tried to define a syntax for a new query language
© 2012 triAGENS GmbH | 2012-04-13 9
10. AvocadoDB query language, version 2
The new AvocadoDB query language should
have an easy-to-understand syntax for the end user
offer a way to declaratively express queries
avoid ASCII art queries
still allow more complex queries (joins, sub-queries etc.)
allow accessing lists and list elements more naturally
be usable with the different data models AvocadoDB supports
(e.g. document-oriented, graph, „relational“)
be consistent and easy to process
have one syntax regardless of the underlying client language
© 2012 triAGENS GmbH | 2012-04-13 10
11. AvocadoDB query language, version 2
A draft of the new language version is presented as follows
It is not yet finalized and not yet implemented
Your feedback on it is highly appreciated
Slides will be uploaded to http://www.avocadodb.org/
© 2012 triAGENS GmbH | 2012-04-13 11
12. Data types
The language has the following data types:
absence of a value:
null
boolean truth values:
false, true
numbers (signed double precision):
1, -34.24
strings, e.g.
"John", "goes fishing"
lists (with elements accessible by their position), e.g.
[ "one", "two", false, -1 ]
documents (with elements accessible by their name), e.g.
{ "user": { "name": "John", "age": 25 } }
Note: names of document attributes can also be used without surrounding quotes
© 2012 triAGENS GmbH | 2012-04-13 12
13. Bind parameters
Queries can be parametrized using bind parameters
This allows separation of query text and actual query values
Any literal values, including lists and documents can be bound
Collection names can also be bound
Bind parameters can be accessed in the query using the @ prefix
Example:
@age
u.name == @name
u.state IN @states
© 2012 triAGENS GmbH | 2012-04-13 13
14. Operators
The language has the following operators:
logical: will return a boolean value or an error
&& || !
arithmetic: will return a numeric value or an error
+ - * / %
relational: will return a boolean value or an error
== != < <= > >= IN
ternary: will return the true or the false part
? :
String concatentation will be provided via a function
© 2012 triAGENS GmbH | 2012-04-13 14
15. Type casts
Typecasts can be achieved by explicitly calling typecast functions
No implicit type cast will be performed
Performing an operation with invalid/inappropriate types
will result in an error
When performing an operation that does not have a valid
or defined result, the outcome will be an error:
1 / 0 => error
1 + "John" => error
Errors might be caught and converted to null in a query
or bubble up to the top, aborting the query.
This depends on settings
© 2012 triAGENS GmbH | 2012-04-13 15
16. Null
When referring to something non-existing (e.g. a non-existing
attribute of a document), the result will be null:
users.nammme => null
Using the comparison operators, null can be compared to other
values and also null itself. The result will be a boolean
(not null as in SQL)
© 2012 triAGENS GmbH | 2012-04-13 16
17. Type comparisons
When comparing two values, the following algorithm is used
If the types of the compared values are not equal,
the compare result is as follows:
null < boolean < number < string < list < document
Examples:
null < false 0 != null
false < 0 null != false
true < 0 false != ""
true < [ 0 ] "" != [ ]
true < [ ] null != [ ]
0 < [ ]
[ ] < { }
© 2012 triAGENS GmbH | 2012-04-13 17
18. Type comparisons
If the types are equal, the actual values are compared
For boolean values, the order is:
false < true
For numeric values, the order is determined by the numeric value
For string values, the order is determined by bytewise comparison
of the strings characters
Note: at some point, collations will need to be introduced for
string comparisons
© 2012 triAGENS GmbH | 2012-04-13 18
19. Type comparisons
For list values, the elements from both lists are compared at each
position. For each list element value, the described comparisons
will be done recursively:
[ 1 ] > [ 0 ]
[ 2, 0 ] > [ 1, 2 ]
[ 99, 4 ] > [ 99, 3 ]
[ 23 ] > [ true ]
[ [ 1 ] ] > 99
[ ] > 1
[ true ] > [ ]
[ null ] > [ ]
[ true, 0 ] > [ true ]
© 2012 triAGENS GmbH | 2012-04-13 19
20. Type comparisons
For document values, the attribute names from both documents
are collected and sorted. The sorted attribute names are then
checked individually: if one of the documents does not have the
attribute, it will be considered „smaller“. If both documents have the
attribute, a value comparison will be done recursively:
{ } < { "age": 25 }
{ "age": 25 } < { "age": 26 }
{ "age": 25 } > { "name": "John" }
{ "name": "John", == { "age": 25,
"age": 25 } "name": "John" }
{ "age": 25 } < { "age": 25,
"name": "John" }
© 2012 triAGENS GmbH | 2012-04-13 20
21. Base building block: lists
A good part of the query language is about processing lists
There are several types of lists:
statically declared lists, e.g.
[ { "user": { "name": "Fred" } },
{ "user": { "name": "John" } } ]
lists of documents from collections, e.g.
users
locations
result lists from filters/queries, e.g.
NEAR(locations, [ 43, 10 ], 100)
© 2012 triAGENS GmbH | 2012-04-13 21
22. FOR: List iteration
The FOR keyword can be used to iterate over all elements from a list
Example (collection-based, collection „users“):
FOR
u IN users
A result document (named: u) is produced on each iteration
The above example produces the following result list:
[ u1, u2, u3, ..., un ]
Note: this is comparable to the following SQL:
SELECT * FROM users u
In each iteration, the individual element is accessible via its name (u)
© 2012 triAGENS GmbH | 2012-04-13 22
23. FOR: List iteration
Nesting of multiple FOR blocks is possible
Example: cross product of users and locations (u x l):
FOR
u IN users
FOR
l IN locations
A result document containing both variables (u, l) is produced on each
iteration of the inner loop
The result document contains both u and l
Note: this is equivalent to the following SQL queries:
SELECT * FROM users u, locations l
SELECT * FROM users u INNER JOIN locations l
ON (1=1)
© 2012 triAGENS GmbH | 2012-04-13 23
24. FOR: List iteration
Example: cross product of years & quarters (non collection-based):
FOR
year IN [ 2011, 2012, 2013 ]
FOR
quarter IN [ 1, 2, 3, 4 ]
Note: this is equivalent to the following SQL query:
SELECT * FROM
(SELECT 2011 UNION SELECT 2012 UNION
SELECT 2013) year,
(SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION
SELECT 4) quarter
© 2012 triAGENS GmbH | 2012-04-13 24
25. FILTER: results filtering
The FILTER keyword can be used to restrict the results to
elements that match some definable condition
Example: retrieve all users that are active
Access to the individual
FOR list elements in FOR list
u IN users using variable name u
FILTER
u.active == true
Note: this is equivalent to the following SQL:
SELECT * FROM users u WHERE u.active = true
© 2012 triAGENS GmbH | 2012-04-13 25
26. FILTER: results filtering
The FILTER keyword in combination with nested FOR blocks
can be used to perform joins
Example: retrieve all users that have matching locations
FOR
u IN users
FOR Access to the individual
l IN locations list elements using
FILTER variable names
u.a == l.b
Note: this is equivalent to the following SQL queries:
SELECT * FROM users u, locations l
WHERE u.a == l.b
SELECT * FROM users u (INNER) JOIN locations l
ON u.a == l.b
© 2012 triAGENS GmbH | 2012-04-13 26
27. Base building block: scopes
The query language is scoped
Variables can only be used after they have been declared
Example: Introduces u
FOR
u IN users
FOR Introduces l
l IN locations
FILTER
u.a == l.b
Can use both u and l
Scopes can be made explicit using brackets (will be shown later)
© 2012 triAGENS GmbH | 2012-04-13 27
28. FILTER: results filtering
Thanks to scopes, the FILTER keyword can be used
everywhere where SQL needs multiple keywords:
ON
WHERE
HAVING
© 2012 triAGENS GmbH | 2012-04-13 28
29. FILTER: results filtering
That means: in AvocadoDB you would use FILTER
FOR
u IN users
FOR
l IN locations
FILTER
u.a == l.b
whereas in SQL you would use either ON
SELECT * FROM users (INNER) JOIN locations l
ON u.a == l.b
or WHERE:
SELECT * FROM users, locations l
WHERE u.a == l.b
© 2012 triAGENS GmbH | 2012-04-13 29
30. FILTER: results filtering
FILTER can be used to model both an SQL ON and
an SQL WHERE in one go:
FOR
u IN users
FOR
l IN locations
FILTER
u.active == 1 && u.a == l.b
This is equivalent to the following SQL query:
SELECT * FROM users u (INNER) JOIN locations l
ON u.a == l.b WHERE u.active = 1
© 2012 triAGENS GmbH | 2012-04-13 30
31. FILTER: results filtering
More than one FILTER condition allowed per query
The following queries are all equivalent
Optimizer's job is to figure out best positions for applying FILTERs
FOR FOR FOR
u IN users u IN users u IN users
FILTER FOR FOR
u.c == 1 == l IN locations == l IN locations
FOR FILTER FILTER
l IN locations u.c == 1 && l.d == 2 &&
FILTER l.d == 2 && u.a == l.b
l.d == 2 u.a == l.b FILTER
FILTER u.c == 1
u.a == l.b
© 2012 triAGENS GmbH | 2012-04-13 31
32. RETURN: results projection
The RETURN keyword produces the end result documents
from the intermediate results produced by the query
Comparable to the SELECT part in an SQL query
RETURN part is mandatory at the end of a query
(and at the end of each subquery)
RETURN is partly left out in this presentation for space reasons
© 2012 triAGENS GmbH | 2012-04-13 32
33. RETURN: results projection
Example:
FOR
u IN users
RETURN {
"name" : u.name,
"likes" : u.likes,
"numFriends": LENGTH(u.friends)
}
Produces such document for each u found
© 2012 triAGENS GmbH | 2012-04-13 33
34. RETURN: results projection
To return all documents as they are in the original list, there
is the following variant:
FOR
u IN users
RETURN
u
Would produce:
[ { "name": "John", "age": 25 },
{ "name": "Tina", "age": 29 },
... ]
Note: this is similar to SQL's SELECT u.*
© 2012 triAGENS GmbH | 2012-04-13 34
35. RETURN: results projection
To return just the names for all users, the following query would do:
FOR
u IN users
RETURN
u.name
Would produce:
[ "John", "Tina", ... ]
Note: this is similar to SQL's SELECT u.name
© 2012 triAGENS GmbH | 2012-04-13 35
36. RETURN: results projection
To return a hierchical result (e.g. data from multiple collections),
the following query could be used:
FOR
u IN users
FOR
l IN locations
RETURN { "user": u, "location" : l }
Would produce:
[ { "user": { "name": "John", "age": 25 },
"location": { "x": 1, "y“: -1 } },
{ "user": { "name": "Tina", "age": 29 },
"location": { "x": -2, "y": 3 } },
... ]
© 2012 triAGENS GmbH | 2012-04-13 36
37. RETURN: results projection
To return a flat result from hierchical data (e.g. data from multiple
collections), the MERGE() function can be employed:
FOR
u IN users
FOR
l IN locations
RETURN MERGE(u, l)
Would produce:
[ { "name": "John", "age": 25,
"x": 1, "y": -1 },
{ "name": "Tina", "age": 29,
"x": -2, "y": 3 },
... ]
© 2012 triAGENS GmbH | 2012-04-13 37
38. SORT: Sorting
The SORT keyword will force a sort of the list of intermediate
results according to one or multiple criteria
Example (sort by first and last name first, then by id):
FOR
u IN users
FOR
l IN locations
SORT
u.first, u.last, l.id DESC
This is very similar to ORDER BY in SQL
© 2012 triAGENS GmbH | 2012-04-13 38
39. LIMIT: Result set slicing
The LIMIT keyword allows slicing the list of result documents using
an offset and a count
Example for top 3 (offset = 0, count = 3):
FOR
u IN users
SORT
u.first, u.last
LIMIT
0, 3
© 2012 triAGENS GmbH | 2012-04-13 39
40. LET: variable creation
The LET keyword can be used to create a variable
using data from a subexpression (e.g. a FOR expression)
Example (will populate variable t with the result of the FOR):
LET t = (
FOR explicit scope bounds
u IN users
)
This will populate t with
[ u1, u2, u3, u4, ... un ]
© 2012 triAGENS GmbH | 2012-04-13 40
41. LET: variable creation
The results created using LET can be filtered afterwards
using the FILTER keyword
This is then similar to the behaviour of HAVING in SQL
Example using a single collection (users):
FOR Iterates over an attribute
(„friends“) of each u
u IN users
LET friends = (
FOR function to retrieve the
f IN u.friends length of a list
)
FILTER
LENGTH(friends) > 5
© 2012 triAGENS GmbH | 2012-04-13 41
42. LET: variable creation
Example using two collections (users, friends):
FOR
u IN users
LET friends = (
FOR
f IN friends
FILTER
u.id == f.uid
)
FILTER
LENGTH(friends) > 5
Differences to previous one collection example:
replaced f IN u.friends with just f IN friends
added inner filter condition
© 2012 triAGENS GmbH | 2012-04-13 42
43. LET: variable creation
SQL approach:
SELECT u.*, GROUP_CONCAT(f.uid) AS friends
FROM users u (INNER) JOIN friends f
ON u.id = f.uid
GROUP BY u.id HAVING COUNT(f.uid) > 5
Notes:
we are using 2 different tables now
the GROUP_CONCAT() aggregate function will create the
friend list as a comma-separated string
need to use GROUP BY to aggregate
non-portable: GROUP_CONCAT is available in MySQL only
© 2012 triAGENS GmbH | 2012-04-13 43
44. LET: variable creation
More complex example (selecting users along with logins and group membership):
FOR
u IN users
LET logins = ( for each user, all users
FOR logins are put into
l IN logins_2012 variable „logins“
FILTER
u.id == l.uid for each user, all group
) memberships are put into
variable „groups“
LET groups = (
FOR logins and groups are
g IN group_memberships independent of each
FILTER other
u.id == g.uid
)
RETURN {
"user": u, "logins": logins, "groups": groups
}
© 2012 triAGENS GmbH | 2012-04-13 44
45. COLLECT: grouping
The COLLECT keyword can be used to group a list by
one or multiple group criteria
Difference to SQL: in AvocadoDB COLLECT performs grouping,
but no aggregation
Aggregation can be performed later using LET or RETURN
The result of COLLECT is a (grouped/hierarchical) list of
documents, containing one document for each group
This document contains the group criteria values
The list of documents for the group can optionally be retrieved
by using the INTO keyword
© 2012 triAGENS GmbH | 2012-04-13 45
46. COLLECT: grouping
Example: retrieve the users per city (non-aggregated):
FOR
u IN users group criterion
COLLECT (name: „city“, value: u.city)
city = u.city
captures group values into
INTO g variable g
RETURN { "c": city, "u": g } g contains all group
members
Produces the following result:
[ { "c": "cgn",
"u": [ { "u": {..} }, { "u": {..} }, { "u": {..} } ] },
{ "c": "ffm",
"u": [ { "u": {..} }, { "u": {..} } ],
{ "c": "ddf",
"u": [ { "u": {..} } ] } ]
© 2012 triAGENS GmbH | 2012-04-13 46
47. COLLECT: grouping
Example: retrieve the number of users per city (aggregated):
FOR
u IN users
COLLECT
city = u.city
INTO g
RETURN { "c": city, "numUsers": LENGTH(g) }
Produces the following result:
[ { "c": "cgn", "numUsers": 3 },
{ "c": "ffm", "numUsers": 2 },
{ "c": "ddf", "numUsers": 1 } ]
© 2012 triAGENS GmbH | 2012-04-13 47
48. Aggregate functions
Query language should provide some aggregate functions, e.g.
MIN()
MAX()
SUM()
LENGTH()
Input to aggregate functions is a list of values to process. Example:
[ { "user": { "type": 1, "rating": 1 } },
{ "user": { "type": 1, "rating": 4 } },
{ "user": { "type": 1, "rating": 3 } } ]
Problem: how to access the „user.rating“ attribute of each value
inside the aggregate function?
© 2012 triAGENS GmbH | 2012-04-13 48
49. Aggregate functions
Solution 1: use „access to all list members“ shortcut:
FOR
u IN [ { "user": { "type": 1, "rating": 1 } },
{ "user": { "type": 1, "rating": 4 } },
{ "user": { "type": 1, "rating": 3 } } ]
COLLECT
type = u.type g[*] will iterate over all elements
INTO g in g and return each elements
u.user.rating attribute
RETURN {
"type": type,
"maxRating": MAX(g[*].u.user.rating)
}
© 2012 triAGENS GmbH | 2012-04-13 49
50. Aggregate functions
Solution 2: use FOR sub-expression to iterate over group elements
FOR
u IN users
capture group values
COLLECT
city = u.city g is a variable containing
INTO g all group members
RETURN {
"c" : city,
"numUsers" : LENGTH(g),
"maxRating": MAX((FOR sub-expression to iterate over
e IN g all elements in the group
RETURN
e.user.rating))
}
© 2012 triAGENS GmbH | 2012-04-13 50
51. Unions and intersections
Unions and intersections can be created by invoking functions on
lists:
UNION(list1, list2)
INTERSECTION(list1, list2)
There will not be special keywords as in SQL
© 2012 triAGENS GmbH | 2012-04-13 51
52. Graph queries
In AvocadoDB, relations between documents can be stored
using graphs
Graphs can be used to model tree structures, networks etc.
Popular use cases:
find friends of friends
find similarities
find recommendations
© 2012 triAGENS GmbH | 2012-04-13 52
53. Graph queries
In AvocadoDB, a graph is composition of
vertices: the nodes in the graph
edges: the relations between nodes in the graph
Vertices are stored as documents in regular collections
Edges are stored as documents in special edge collections, with
each edge having the following attributes:
_from id of linked vertex (incoming relation)
_to id of linked vertex (outgoing relation)
Additionally, all document have an _id attribute
The _id values are used for linking in the edges collections
© 2012 triAGENS GmbH | 2012-04-13 53
54. Graph queries
Task: find direct friends of users
Data: users are related (friend relationships) to other users
Example data (vertex collection „users“):
[ { "_id": 123, "name": "John", "age": 25 },
{ "_id": 456, "name": "Tina", "age": 29 },
{ "_id": 235, "name": "Bob", "age": 15 },
{ "_id": 675, "name": "Phil", "age": 12 } ]
Example data (edge collection „relations“):
[ { "_id": 1, "_from": 123, "_to": 456 },
{ "_id": 2, "_from": 123, "_to": 235 },
{ "_id": 3, "_from": 456, "_to": 123 },
{ "_id": 4, "_from": 456, "_to": 235 },
{ "_id": 5, "_from": 235, "_to": 456 },
{ "_id": 6, "_from": 235, "_to": 675 } ]
© 2012 triAGENS GmbH | 2012-04-13 54
55. Graph queries
To traverse the graph, the PATHS function can be used
It traverses a graph's edges defined in an edge collection and
produces a list of paths found
Each path object will have the following properties:
_from id of vertex the path started at
_to id of vertex the path ended with
_edges edges visited along the path
_vertices vertices visited along the path
© 2012 triAGENS GmbH | 2012-04-13 55
56. Graph queries
Example: edge collection: relations
direction: OUTBOUND
FOR max path length: 1
u IN users
LET friends = ( path variable name: p
FOR
p IN PATHS(relations, OUTBOUND, 1)
FILTER
p._from == u._id only consider paths starting at
) the current user (using the
user's _id attribute)
© 2012 triAGENS GmbH | 2012-04-13 56
57. Graph queries
Produces:
[ { "u": { "_id": 123, "name": "John", "age": 25 },
"p": [ { "_from": 123, "_to": 456, ... },
{ "_from": 123, "_to": 235, ... } ] },
{ "u": { "_id": 456, "name": "Tina", "age": 29 },
"p": [ { "_from": 456, "_to": 123, ... },
{ "_from": 456, "_to": 235, ... } ] },
{ "u": { "_id": 235, "name": "Bob", "age": 15},
"p": [ { "_from": 235, „_to": 456, ... },
{ "_from": 235, „_to": 675, ... } ] },
{ "u": { "_id": 675, "name": "Phil", "age": 12 },
"p": [ ] }
]
Note: _edges and _vertices attributes for each p left out
for space reasons
© 2012 triAGENS GmbH | 2012-04-13 57
58. Summary: main keywords
Keyword Use case
FOR ... IN List iteration
FILTER Results filtering
RETURN Results projection
SORT Sorting
LIMIT Results set slicing
LET Variable creation
COLLECT ... INTO Grouping
© 2012 triAGENS GmbH | 2012-04-13 58
59. Q&A
Your feedback on the draft is highly appreciated
Please let us know what you think:
m.schoenert@triagens.de
f.celler@triagens.de
j.steemann@triagens.de
#AvocadoDB
And please try out AvocadoDB:
http://www.avocadodb.org/
https://github.com/triAGENS/AvocadoDB
© 2012 triAGENS GmbH | 2012-04-13 59