Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
The openCypher project
Michael Hunger
Topics
• Property Graph Model
• Cypher - A language for querying graphs
• Cypher History
• Cypher Demo
• Current implement...
The Property-Graph-Model
You know it, right?
CAR
name: “Dan”
born: May 29, 1970
twitter: “@dan”
name: “Ann”
born: Dec 5, 1975
since:
Jan 10, 2011
brand: “Volvo”
model:...
Relational Versus Graph Models
Relational Model Graph Model
KNOWS
ANDREAS
TOBIAS
MICA
DELIA
Person PersonPerson-Friend
AND...
Cypher Query Language
Why, How, When?
Why Yet Another Query Language (YAQL)?
• SQL and SparQL hurt our brains
• Our brains crave patterns
• It‘s all about patte...
What is Cypher?
• A graph query language that allows for expressive and efficient
querying of graph data
• Intuitive, powe...
What is Cypher?
• Cypher is declarative, which means it lets users express what
data to retrieve
• The guiding principle b...
Why Cypher?
Compared to:
• SPARQL (Cypher came from real-world use, not academia)
• Gremlin (declarative vs imperative)
• ...
Querying the Graph
Some Examples With Cypher
Basic Query: Who do people report to?
MATCH (:Employee {firstName:”Steven”} ) -[:REPORTS_TO]-> (:Employee {firstName:“Andr...
Basic Query Comparison: Who do people report to?
SELECT *
FROM Employee as e
JOIN Employee_Report AS er ON (e.id = er.mana...
Basic Query: Who do people report to?
Basic Query: Who do people report to?
Cypher Syntax
Only Tip of the Iceberg
Syntax: Patterns
( )-->( )
(node:Label {key:value})
(node1)-[rel:REL_TYPE {key:value}]->(node2)
(node1)-[:REL_TYPE1]->(nod...
Patterns are used in
• (OPTIONAL) MATCH
• CREATE, MERGE
• shortestPath()
• Predicates
• Expressions
• (Comprehensions)
Syntax: Structure
(OPTIONAL) MATCH <patterns>
WHERE <predicates>
RETURN <expression> AS <name>
ORDER BY <expression>
SKIP ...
Syntax: Automatic Aggregation
MATCH <patterns>
RETURN <expr>, collect([distinct] <expression>) AS <name>,
count(*) AS freq...
DataFlow: WITH
WITH <expression> AS <name>, ....
• controls data flow between query segments
• separates reads from writes...
Structure: Writes
CREATE <pattern>
MERGE <pattern> ON CREATE ... ON MATCH ...
(DETACH) DELETE <entity>
SET <property,label...
Data Import
[USING PERODIC COMMIT <count>]
LOAD CSV [WITH HEADERS] FROM „URL“ AS row
... any Cypher clauses, mostly match ...
Collections
UNWIND (range(1,10) + [11,12,13]) AS x
WITH collect(x) AS coll
WHERE any(x IN coll WHERE x % 2 = 0)
RETURN siz...
Maps & Entities
WITH {age:42, name: „John“, male:true} as data
WHERE exists(data.name) AND data[„age“] = 42
CREATE (n:Pers...
Optional Schema
CREATE INDEX ON :Label(property)
CREATE CONSTRAINT ON (n:Label) ASSERT n.property IS UNIQUE
CREATE CONSTRA...
And much more ...
neo4j.com/docs/stable/cypher-refcard
More Examples
MATCH (sub)-[:REPORTS_TO*0..3]->(boss),
(report)-[:REPORTS_TO*1..3]->(sub)
WHERE boss.firstName = 'Andrew'
RETURN sub.firs...
Who is in Robert’s (direct, upwards) reporting chain?
MATCH
path=(e:Employee)<-[:REPORTS_TO*]-(sub:Employee)
WHERE
sub.fir...
Who is in Robert’s (direct, upwards) reporting chain?
Product Cross-Sell
MATCH
(choc:Product {productName: 'Chocolade'})
<-[:ORDERS]-(:Order)<-[:SOLD]-(employee),
(employee)-[:...
Product Cross-Sell
Neo4j‘s Cypher Implementation
History of Cypher
• 1.4 - Cypher initially added to Neo4j
• 1.6 - Cypher becomes part of REST API
• 1.7 - Collection funct...
Try it out!
APIs
• Embedded
• graphDb.execute(query, params);
• HTTP – transactional Cypher endpoint
• :POST /db/data/transaction[/com...
Cypher Today - Neo4j Implementation
• Convert the input query into an abstract syntax tree (AST)
• Optimise and normalise ...
Cypher Today - Neo4j Implementation
Neo4j Query Planner
Cost based Query Planner since Neo4j 2.2
• Uses database stats to select best plan
• Currently for Rea...
openCypher
An open graph query language
Why ?
We love Cypher!
Our users love Cypher.
We want to make everyone happy through using it.
And have Cypher run on their...
We love the love
Future of (open)Cypher
• Decouple the language from Neo4j
• Open up and make the language design process transparent
• Enc...
CIP (Cypher Improvement Proposal)
• A CIP is a semi-formal specification
providing a rationale for new language
features a...
CIP structure
• Sections include:
• motivation,
• background,
• proposal (including the
syntax and semantics),
• alternati...
Deliverables
✔ Improvement Process
✔ Governing Body
✔ Language grammar (Jan-2016)
Technology certification kit (TCK)
Cyphe...
Cypher language specification
• EBNF Grammar
• Railroad diagrams
• Semantic specification
• Licensed under a Creative Comm...
Language Grammar (RELEASED Jan-30-2016)
…
Match = ['OPTIONAL', SP], 'MATCH', SP, Pattern, {Hint}, [Where] ;
Unwind = 'UNWI...
Technology Compliance Kit (TCK)
● Validates a Cypher implementation
● Certifies that it complies with a given version of C...
Cypher Reference Documentation
• Style Guide
• User documentation describing the use of Cypher
• Example datasets with que...
Style Guide
• Label are CamelCase
• Properties and functions are lowerCamelCase
• Keywords and Relationship-Types are ALL_...
Reference implementation (ASL 2.0)
• A fully functional implementation of key parts of the stack
needed to support Cypher ...
The Cypher Language Group (CLG)
• The steering committee for language evolution
• Reviews feature requests and proposals (...
“Graph processing is becoming an indispensable part of the modern big data stack. Neo4j’s Cypher
query language has greatl...
And support openCypher
Ressources
• http://www.opencypher.org/
• https://github.com/opencypher/openCypher
• https://github.com/opencypher/openCyp...
Please contribute
Feedback, Ideas, Proposals
Implementations
Thank You !
Questions ?
Prochain SlideShare
Chargement dans…5
×

The openCypher Project - An Open Graph Query Language

1 528 vues

Publié le

We want to present the openCypher project, whose purpose is to make Cypher available to everyone – every data store, every tooling provider, every application developer. openCypher is a continual work in progress. Over the next few months, we will move more and more of the language artifacts over to GitHub to make it available for everyone.

openCypher is an open source project that delivers four key artifacts released under a permissive license: (i) the Cypher reference documentation, (ii) a Technology compatibility kit (TCK), (iii) Reference implementation (a fully functional implementation of key parts of the stack needed to support Cypher inside a data platform or tool) and (iv) the Cypher language specification.

We are also seeking to make the process of specifying and evolving the Cypher query language as open as possible, and are actively seeking comments and suggestions on how to improve the Cypher query language.

The purpose of this talk is to provide more details regarding the above-mentioned aspects.

We want to present the openCypher project, whose purpose is to make Cypher available to everyone – every data store, every tooling provider, every application developer. openCypher is a continual work in progress. Over the next few months, we will move more and more of the language artifacts over to GitHub to make it available for everyone.

openCypher is an open source project that delivers four key artifacts released under a permissive license: (i) the Cypher reference documentation, (ii) a Technology compatibility kit (TCK), (iii) Reference implementation (a fully functional implementation of key parts of the stack needed to support Cypher inside a data platform or tool) and (iv) the Cypher language specification.

We are also seeking to make the process of specifying and evolving the Cypher query language as open as possible, and are actively seeking comments and suggestions on how to improve the Cypher query language.

The purpose of this talk is to provide more details regarding the above-mentioned aspects.

Publié dans : Données & analyses
  • Soyez le premier à commenter

The openCypher Project - An Open Graph Query Language

  1. 1. The openCypher project Michael Hunger
  2. 2. Topics • Property Graph Model • Cypher - A language for querying graphs • Cypher History • Cypher Demo • Current implementation in Neo4j • User Feedback • Opening up - The openCypher project • Governance, Contribution Process • Planned Deliverables
  3. 3. The Property-Graph-Model You know it, right?
  4. 4. CAR name: “Dan” born: May 29, 1970 twitter: “@dan” name: “Ann” born: Dec 5, 1975 since: Jan 10, 2011 brand: “Volvo” model: “V70” Labeled Property Graph Model Components Nodes • The objects in the graph • Can have name-value properties • Can be labeled Relationships • Relate nodes by type and direction • Can have name-value properties LOVES LOVES LIVES WITH PERSON PERSON
  5. 5. Relational Versus Graph Models Relational Model Graph Model KNOWS ANDREAS TOBIAS MICA DELIA Person PersonPerson-Friend ANDREAS DELIA TOBIAS MICA
  6. 6. Cypher Query Language Why, How, When?
  7. 7. Why Yet Another Query Language (YAQL)? • SQL and SparQL hurt our brains • Our brains crave patterns • It‘s all about patterns • Creating a query language is fun (and hard work)
  8. 8. What is Cypher? • A graph query language that allows for expressive and efficient querying of graph data • Intuitive, powerful and easy to learn • Write graph queries by describing patterns in your data • Focus on your domain not the mechanics of data access. • Designed to be a human-readable query language • Suitable for developers and operations professionals
  9. 9. What is Cypher? • Cypher is declarative, which means it lets users express what data to retrieve • The guiding principle behind Cypher is to make simple things easy and complex things possible • A humane query language • Stolen from SQL (common keywords), SPARQL (pattern matching), Python and Haskell (collection semantics)
  10. 10. Why Cypher? Compared to: • SPARQL (Cypher came from real-world use, not academia) • Gremlin (declarative vs imperative) • SQL (graph-specific vs set-specific) (Cypher)-[:LOVES]->(ASCII Art) A language should be readable, not just writable. You will read your code dozens more times than you write it. Regex for example are write-only.
  11. 11. Querying the Graph Some Examples With Cypher
  12. 12. Basic Query: Who do people report to? MATCH (:Employee {firstName:”Steven”} ) -[:REPORTS_TO]-> (:Employee {firstName:“Andrew”} ) REPORTS_TO Steven Andrew LABEL PROPERTY NODE NODE LABEL PROPERTY
  13. 13. Basic Query Comparison: Who do people report to? SELECT * FROM Employee as e JOIN Employee_Report AS er ON (e.id = er.manager_id) JOIN Employee AS sub ON (er.sub_id = sub.id) MATCH (e:Employee)-[:REPORTS_TO]->(mgr:Employee) RETURN *
  14. 14. Basic Query: Who do people report to?
  15. 15. Basic Query: Who do people report to?
  16. 16. Cypher Syntax Only Tip of the Iceberg
  17. 17. Syntax: Patterns ( )-->( ) (node:Label {key:value}) (node1)-[rel:REL_TYPE {key:value}]->(node2) (node1)-[:REL_TYPE1]->(node2)<-[:REL_TYPE2]-(node3) (node1)-[:REL_TYPE*m..n]->(node2)
  18. 18. Patterns are used in • (OPTIONAL) MATCH • CREATE, MERGE • shortestPath() • Predicates • Expressions • (Comprehensions)
  19. 19. Syntax: Structure (OPTIONAL) MATCH <patterns> WHERE <predicates> RETURN <expression> AS <name> ORDER BY <expression> SKIP <offset> LIMIT <size>
  20. 20. Syntax: Automatic Aggregation MATCH <patterns> RETURN <expr>, collect([distinct] <expression>) AS <name>, count(*) AS freq ORDER BY freq DESC
  21. 21. DataFlow: WITH WITH <expression> AS <name>, .... • controls data flow between query segments • separates reads from writes • can also • aggregate • sort • paginate • replacement for HAVING • as many WITHs as you like
  22. 22. Structure: Writes CREATE <pattern> MERGE <pattern> ON CREATE ... ON MATCH ... (DETACH) DELETE <entity> SET <property,label> REMOVE <property,label>
  23. 23. Data Import [USING PERODIC COMMIT <count>] LOAD CSV [WITH HEADERS] FROM „URL“ AS row ... any Cypher clauses, mostly match + updates ...
  24. 24. Collections UNWIND (range(1,10) + [11,12,13]) AS x WITH collect(x) AS coll WHERE any(x IN coll WHERE x % 2 = 0) RETURN size(coll), coll[0], coll[1..-1] , reduce(a = 0, x IN coll | a + x), extract(x IN coll | x*x), filter(x IN coll WHERE x > 10), [x IN coll WHERE x > 10 | x*x ]
  25. 25. Maps & Entities WITH {age:42, name: „John“, male:true} as data WHERE exists(data.name) AND data[„age“] = 42 CREATE (n:Person) SET n += data RETURN [k in keys(n) WHERE k CONTAINS „a“ | {key: k, value: n[k] } ]
  26. 26. Optional Schema CREATE INDEX ON :Label(property) CREATE CONSTRAINT ON (n:Label) ASSERT n.property IS UNIQUE CREATE CONSTRAINT ON (n:Label) ASSERT exists(n.property) CREATE CONSTRAINT ON (:Label)-[r:REL]->(:Label2) ASSERT exists(r.property)
  27. 27. And much more ... neo4j.com/docs/stable/cypher-refcard
  28. 28. More Examples
  29. 29. MATCH (sub)-[:REPORTS_TO*0..3]->(boss), (report)-[:REPORTS_TO*1..3]->(sub) WHERE boss.firstName = 'Andrew' RETURN sub.firstName AS Subordinate, count(report) AS Total; Express Complex Queries Easily with Cypher Find all direct reports and how many people they manage, each up to 3 levels down Cypher Query SQL Query
  30. 30. Who is in Robert’s (direct, upwards) reporting chain? MATCH path=(e:Employee)<-[:REPORTS_TO*]-(sub:Employee) WHERE sub.firstName = 'Robert' RETURN path;
  31. 31. Who is in Robert’s (direct, upwards) reporting chain?
  32. 32. Product Cross-Sell MATCH (choc:Product {productName: 'Chocolade'}) <-[:ORDERS]-(:Order)<-[:SOLD]-(employee), (employee)-[:SOLD]->(o2)-[:ORDERS]->(other:Product) RETURN employee.firstName, other.productName, count(distinct o2) as count ORDER BY count DESC LIMIT 5;
  33. 33. Product Cross-Sell
  34. 34. Neo4j‘s Cypher Implementation
  35. 35. History of Cypher • 1.4 - Cypher initially added to Neo4j • 1.6 - Cypher becomes part of REST API • 1.7 - Collection functions, global search, pattern predicates • 1.8 - Write operations • 1.9 Type System, Traversal Matcher, Caches, String functions, more powerful WITH, Lazyness, Profiling, Execution Plan • 2.0 Label support, label based indexes and constraints, MERGE, transactional HTTP endpoint, literal maps, slices, new parser, OPTIONAL MATCH • 2.1 – LOAD CSV, COST Planner, reduce eagerness, UNWIND, versioning • 2.2 – COST Planner default, EXPLAIN, PROFILE, vis. Query Plan, IDP • 2.3 -
  36. 36. Try it out!
  37. 37. APIs • Embedded • graphDb.execute(query, params); • HTTP – transactional Cypher endpoint • :POST /db/data/transaction[/commit] {statements:[{statement: „query“, parameters: params, resultDataContents:[„row“], includeStats:true},....]} • Bolt – binary protocol • Driver driver = GraphDatabase.driver( "bolt://localhost" ); Session session = driver.session(); Result rs = session.run("CREATE (n) RETURN n");
  38. 38. Cypher Today - Neo4j Implementation • Convert the input query into an abstract syntax tree (AST) • Optimise and normalise the AST (alias expansion, constant folding etc) • Create a query graph - a high-level, abstract representation of the query - from the normalised AST • Create a logical plan, consisting of logical operators, from the query graph, using the statistics store to calculate the cost. The cheapest logical plan is selected using IDP (iterative dynamic programming) • Create an execution plan from the logical plan by choosing a physical implementation for logical operators • Execute the query http://neo4j.com/blog/introducing-new-cypher-query-optimizer/
  39. 39. Cypher Today - Neo4j Implementation
  40. 40. Neo4j Query Planner Cost based Query Planner since Neo4j 2.2 • Uses database stats to select best plan • Currently for Read Operations • Query Plan Visualizer, finds • Non optimal queries • Cartesian Product • Missing Indexes, Global Scans • Typos • Massive Fan-Out
  41. 41. openCypher An open graph query language
  42. 42. Why ? We love Cypher! Our users love Cypher. We want to make everyone happy through using it. And have Cypher run on their data(base). We want to collaborate with community and industry partners to create the best graph query language possible!
  43. 43. We love the love
  44. 44. Future of (open)Cypher • Decouple the language from Neo4j • Open up and make the language design process transparent • Encourage use within of databases/tools/highlighters/etc • Delivery of language docs, tools and implementation • Governed by the Cypher Language Group (CLG)
  45. 45. CIP (Cypher Improvement Proposal) • A CIP is a semi-formal specification providing a rationale for new language features and constructs • Contributions are welcome: submit either a CIP (as a pull request) or a feature request (as an issue) at the openCypher GitHub repository • See „Ressources“ for • accepted CIPs • Contribution Process • Template github.com/opencypher/openCypher
  46. 46. CIP structure • Sections include: • motivation, • background, • proposal (including the syntax and semantics), • alternatives, • interactions with existing features, • benefits, • drawbacks • Example of the “STARTS WITH / ENDS WITH / CONTAINS” CIP
  47. 47. Deliverables ✔ Improvement Process ✔ Governing Body ✔ Language grammar (Jan-2016) Technology certification kit (TCK) Cypher Reference Documentation Cypher language specification Reference implementation (under Apache 2.0) Cypher style guide Opening up the CLG
  48. 48. Cypher language specification • EBNF Grammar • Railroad diagrams • Semantic specification • Licensed under a Creative Commons license
  49. 49. Language Grammar (RELEASED Jan-30-2016) … Match = ['OPTIONAL', SP], 'MATCH', SP, Pattern, {Hint}, [Where] ; Unwind = 'UNWIND', SP, Expression, SP, 'AS', SP, Variable ; Merge = 'MERGE', SP, PatternPart, {SP, MergeAction} ; MergeAction = ('ON', SP, 'MATCH', SP, SetClause) | ('ON', SP, 'CREATE', SP, SetClause); ... github.com/opencypher/openCypher/blob/master/grammar.ebnf
  50. 50. Technology Compliance Kit (TCK) ● Validates a Cypher implementation ● Certifies that it complies with a given version of Cypher ● Based on given dataset ● Executes a set of queries and ● Verifies expected outputs
  51. 51. Cypher Reference Documentation • Style Guide • User documentation describing the use of Cypher • Example datasets with queries • Tutorials • GraphGists
  52. 52. Style Guide • Label are CamelCase • Properties and functions are lowerCamelCase • Keywords and Relationship-Types are ALL_CAPS • Patterns should be complete and left to right • Put anchored nodes first • .... to be released ...
  53. 53. Reference implementation (ASL 2.0) • A fully functional implementation of key parts of the stack needed to support Cypher inside a platform or tool • First deliverable: parser taking a Cypher statement and parsing it into an AST (abstract syntax tree) • Future deliverables: • Rule-based query planner • Query runtime • Distributed under the Apache 2.0 license • Can be used as example or as a implementation foundation
  54. 54. The Cypher Language Group (CLG) • The steering committee for language evolution • Reviews feature requests and proposals (CIP) • Caretakers of the language • Focus on guiding principles • Long term focus, no quick fixes & hacks • Currently group of Cypher authors, developers and users • Publish Meeting Minutes -> opencypher.github.io/meeting-minutes/
  55. 55. “Graph processing is becoming an indispensable part of the modern big data stack. Neo4j’s Cypher query language has greatly accelerated graph database adoption. We are looking forward to bringing Cypher’s graph pattern matching capabilities into the Spark stack, making it easier for masses to access query graph processing.” - Ion Stoica, CEO & Founder Databricks “Lots of software systems could be improved by using a graph datastore. One thing holding back the category has been the lack of a widely supported, standard graph query language. We see the appearance of openCypher as an important step towards the broader use of graphs across the industry.” - Rebecca Parsons, ThoughtWorks, CTO Some people like it
  56. 56. And support openCypher
  57. 57. Ressources • http://www.opencypher.org/ • https://github.com/opencypher/openCypher • https://github.com/opencypher/openCypher/blob/master/CONTRIBUTING. adoc • https://github.com/opencypher/openCypher/tree/master/cip • https://github.com/opencypher/openCypher/pulls • http://groups.google.com/group/openCypher • @openCypher
  58. 58. Please contribute Feedback, Ideas, Proposals Implementations Thank You ! Questions ?

×