Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Cypher and apache spark multiple graphs and more in open cypher

430 vues

Publié le

Multiple Graphs and More in openCypher.

Publié dans : Technologie
  • Soyez le premier à commenter

Cypher and apache spark multiple graphs and more in open cypher

  1. 1. Cypher and Apache Spark Multiple graphs and more in openCypher Stefan Plantikow, Martin Junghanns, Max Kießling, Petra Selmer
  2. 2. (:openCypher)-[:IS_GOING]->(:Places)
  3. 3. openCypher in 2017 openCypher is a community effort to evolve the standard graph query language Cypher openCypher implementers: SAP, Redis, Agens Graph, Cypher.PL, Neo4j, ... openCypher events: Implementers meeting, Summer of Syntax openCypher process: Cypher Improvement Requests and Proposals (CIPs/CIRs) openCypher releases: Fronted, Grammar, TCK openCypher research: Formal semantics (U Edinburgh), Stream processing, … openCypher & standards: LDBC, ISO SQL PG Ad-Hoc … openCypher features: Multiple graphs, subqueries, path patterns, ...
  4. 4. Cypher originally conceived in the context of OLTP workloads at Neo4j. Beyond OLTP, many Neo4j customers have a data lake and use Apache Spark for - Big Data analytical processing - Data integration (wrangling) - Today's big data applications ○ Collect data from user interactions at website ○ Combine with other data from various departments (billing, marketing, ...) ○ Combine with ontological data ○ Analyze to better target customers, optimize supply chains, detect fraud, ... Is Cypher ready to be used in a big data lake context? Cypher for Big Data
  5. 5. - Data integration ○ Use multiple, large-scale data sets ○ Retain and reuse intermediary results ○ Integrate multiple data sources ○ Shape and handle heterogeneous data - Complex Execution ○ Compose complex workflows from building blocks ○ Use machine learning, AI, graph algorithms, domain specific business logic ○ Distributed query execution in a cluster - Common framework: Apache Spark over Hadoop (+ Neo4j) Distilling Graphs from the Data Lake
  6. 6. (:Cypher)-[:FOR]->(:Apache:Spark™)
  7. 7. Spark Package for the execution of Cypher on Apache Spark - Execute Cypher queries on multiple large, distributed graphs - Integrate Cypher into your Spark analytical pipeline - Integrate multiple data sources (Neo4j, HDFS, Local FS, ...) - Handle heterogeneous data - Compose Cypher queries
  8. 8. - Made by Neo4j, donated to to the openCypher community - Alpha release of source code under APL2 on GitHub. Available Now: github.com/openCypher/cypher-for-apache-spark - Release 1.0: Targeted for first half 2018 - Commercial extension for integrating more sophisticated data sources - Innovations: ○ Technical Architecture for executing Cypher on a big-data analytics system ○ Composable queries for working with multiple graphs CAPS
  9. 9. MATCH (n:Person)-[:LOVES]->(s1:System {title: “Neo4j”}) OPTIONAL MATCH (n)-[:LOVES]->(s2:System {title: “Spark”}) RETURN n, s1, s2 openCypher Frontend CAPS Spark Catalyst Optimizer Spark Runtime ➢ Based on proven Neo4j Cypher parser ➢ Parsing, Rewriting, Optimization ➢ Data Import and Export ➢ Schema and Type handling ➢ Query translation to DataFrame operations ➢ Rule based query optimization ➢ Distributed execution
  10. 10. Data in Spark Spark's core: Transform tabular data SparkSQL Table => Table => Table Cypher 9 (Single) Graph => Table How to handle multiple graphs?
  11. 11. (:Cypher)-[:WITH]->(:Multiple:Graphs)
  12. 12. Graph Transformation
  13. 13. Why Multiple Graphs? - Combining and transforming graphs from multiple sources - Versioning, snapshotting, computing difference graphs - Graph views for access control - Provisioning applications with tailored graph views - Shaping and integrating heterogenous graph data - Roll-up and drill-down at different levels of detail Graph Management Graph Modeling
  14. 14. Cypher today: Single graph model Graph Database System (e.g. a cluster) The (single) Graph Application Server Client 1 Client 2 Client 3
  15. 15. Cypher: Multiple graphs model Graph Database System (e.g. a cluster) Graph Space Application Server Client 1 Client 2 Client 3
  16. 16. Tables from graphs... It's easy to construct tables from a graph... but what's the inverse? MATCH (a)-->(b) WITH a, b ...
  17. 17. ...graphs from tables ...a graph is a set of pattern matches! WITH a, r, b RETURN GRAPH OF (a)-[r]->(b) AS foo
  18. 18. Cypher queries with multiple graphs
  19. 19. Cypher query pipeline composition
  20. 20. Current CAPS Multiple Graphs Syntax FROM GRAPH graph_A AT "bolt://.../people" MATCH (a:Person)-[:KNOWS]-(b:Person) FROM GRAPH graph_B AT "hdfs://.../products" MATCH (:Customer {name: a.name})-[:BOUGHT]->(p:Product) RETURN GRAPH OF (b)-[:SHOULD_BUY]->(p) (Ongoing work in CIP2017-06-18: Multiple Graphs)
  21. 21. Cypher support for multiple graphs - Graphs are addressed using URIs - Graphs and tabular data are passed into and returned from a query Extensions - Set operations and subqueries over multiple graphs - Updating graphs (DML) - Managing graph persistence (Move, Snapshot, Version, ...) - Creating views - Schema and constraint definitions for multiple graphs ... => Join the openCypher MG Task Force
  22. 22. (:Cypher)-[:ON]->(:Relational:Engine)
  23. 23. Challenge: Graph engine vs. Relational engine Neo4j Apache Spark Graph Format Native (i.e. optimized for graph ops) DataFrame (i.e. tables) Query operators Native (e.g. Expand, VarExpand) Relational operators Schema Schema optional Fixed Schema Data types Cypher type system Spark SQL type system
  24. 24. Node labels: :Employee name: STRING :Person name: STRING job: INTEGER (nullable) :System title: STRING Relationship types: :KNOWS name: STRING :LOVES Implied Labels: :Employee -> :Person :Employee:Person { name : Alice } :Person { name : Bob, yob : 1984 } :System { title : Spark } :KNOWS { since : 2017 } :LOVES:LOVES:LOVES :System { title : Neo4j } ● Required for Spark DataFrame • Explicitly defined (e.g. for HDFS data source) • Implicitly inferred (e.g. for Neo4j data source) ● Requires type mapping from Cypher types to Spark types
  25. 25. :Employee:Person { name : Alice } :Person { name : Bob, yob : 1984 } :System { title : Spark } :KNOWS { since : 2017 } :LOVES:LOVES:LOVES :System { title : Neo4j } Logical view Physical view (DataFrame) NodeScan(Person) n n:Employee n.name n.yob 0 true Alice null 1 false Bob 1984 NodeScan(System) n n.title 2 Spark 3 Neo4j RelScan(KNOWS) src(r) r trgt(r) n.since 0 0 1 2017 RelScan(KNOWS) src(r) r trgt(r) 0 1 2 0 2 3 1 3 3
  26. 26. MATCH (n:Person)-[:LOVES]->(s1:System {title: “Neo4j”}) OPTIONAL MATCH (n)-[:LOVES]->(s2:Database {title: “Spark”}) RETURN n, s1, s2 Logical view Physical view (DataFrame operations) NodeScan(System) RelScan(LOVES) NodeScan(Person) ResultAPPLY MAGIC HERE
  27. 27. Result n n.name n.yob n:Person s1 s1.title s1:System s2 s2.title s2:System 0 Alice null true 3 Neo4j true 2 Spark true 1 Bob 1984 true 3 Neo4j true null null null MATCH (n:Person)-[:LOVES]->(s1:System {title: “Neo4j”}) OPTIONAL MATCH (n)-[:LOVES]->(s2:Database {title: “Spark”}) RETURN n, s1, s2 :Employee:Person { name : Alice } :Person { name : Bob, yob : 1984 } :System { title : Spark } :KNOWS { since : 2017 } :LOVES :LOVES :LOVES :System { title : Neo4j } Logical view Physical view (DataFrame)
  28. 28. • Programmatic, high-level API (similar to Sparks’ DataFrame API) • Central entry point: CAPSSession 1: val sparkSession = SparkSession.builder().master("local[*]").appName("caps-example").getOrCreate() 2: val capsSession = CAPSSession.create(sparkSession) 3: val graph = capsSession.graphAt("hdfs://localhost:9000/path/to/graph") 4: val result = graph.cypher("MATCH (n:Person)-[:LOVES]->(s:System) RETURN n.name, s.title") 5: result.print +---------------------------------------------+ | n.name | s.title | +---------------------------------------------+ | 'Alice' | 'Neo4j' | | 'Alice' | 'Spark' | | 'Bob' | 'Neo4j' | +---------------------------------------------+ (3 rows)
  29. 29. • Mount graphs from multiple sources • Store graphs in session-local graph storage 1: capsSession.mountGraphAt("hdfs+csv://localhost:9000/path/to/graph", "/my-hdfs-graph") 2: capsSession.mountGraphAt( "bolt://localhost:7687&MATCH (n) RETURN n;MATCH ()-[r]->() RETURN r", "/my-neo-graph" ) 3: val result = capsSession.cypher(""" FROM GRAPH AT 'session://my-hdfs-graph' MATCH (e:Employee) FROM GRAPH AT 'session://my-neo-graph' MATCH (p:Person) WHERE e.email = p.email RETURN GRAPH result OF (e)-[:SAME_AS]->(p) """).graphs("result") 4: result.cypher("MATCH ()-[e]->() RETURN COUNT(e)")
  30. 30. (:Cypher)-[:FOR]->(Apache:Spark™) Demo
  31. 31. • Target specific customers in selected metropolitan areas as part of a marketing campaign • Combine multi-region social network data with product data to derive recommendations • Social network is partitioned by region (SN_NA, SN_EU) and stored in separate Neo4j instances • Product data is stored in HDFS using a CAPS-specific CSV format :Person { name : Bob, email : bob@gmail.com } :Person { name : Alice, email : alice@gmail.com } :Interest { name : Graphs } :KNOWS :LIKES:LIVES_IN:LIVES_IN :City { name : New York } :Customer { email : alice@gmail.com } :Product { name : Graph Databases } :BOUGHT { rating : 5, votes : 10 helpful : 6 } :BELONGS_TO :Category { name : Books } Social Network (SN) Products (PROD)
  32. 32. 1. Load data from the corresponding data sources (i.e. Neo4j and HDFS) 2. Extract metropolitan subgraphs from Social Networks (e.g. people from NY / SFO for SN_NA) 3. Merge Social Network data with Product data using identifying properties (i.e. Email) 4. Compute recommendations based on friends’ interests and bought products :Person { name : Bob, email : bob@gmail.com } :Person { name : Alice, email : alice@gmail.com } :Interest { name : Graphs } :KNOWS :LIKES:LIVES_IN:LIVES_IN :City { name : New York } :Customer { email : alice@gmail.com } :Product { name : Graph Databases } :BOUGHT { rating : 5, votes : 10 helpful : 6 } :BELONGS_TO :Category { name : Books } Social Network (SN) Products (PROD) :IS
  33. 33. (:Let)-[:MAGIC*]->(:Happen)
  34. 34. (:openCypher)-[:EVOLVES]->(:Cypher)
  35. 35. How does a feature make it into Cypher? CIR = Cypher Improvement Request - Ideas & suggestions, topics for discussion, … - Raise a Github issue at https://github.com/opencypher/openCypher CIP = Cypher Improvement Proposal - Response to a CIR - Full description of behaviour and syntax - Create a Pull Request at https://github.com/opencypher/openCypher
  36. 36. openCypher openCypher Implementers Group (oCIG) - Evolve Cypher through an open process - Comprises vendors, researchers, implementers, interested parties Face-to-face and virtual meetings to present, discuss and agree upon new features - Germany (February) - UK (May) - France (November)
  37. 37. (:Cypher)-[:WITH]->(:Subqueries)
  38. 38. Why?
  39. 39. Why? Queries are easier to - construct - maintain - read Subqueries enable - composition of query pipelines - post-processing of results - multiple write actions for each record
  40. 40. Example: Post-UNION processing MATCH { // authored tweets MATCH (me:User {name: 'Alice'})-[:FOLLOWS]->(user:User), (user)<-[:AUTHORED]-(tweet:Tweet) RETURN tweet, tweet.time AS time, user.country AS country UNION // favorited tweets MATCH (me:User {name: 'Alice'})-[:FOLLOWS]->(user:User), (user)<-[:HAS_FAVOURITE]-(favorite:Favorite)-[:TARGETS]->(tweet:Tweet) RETURN tweet, favourite.time AS time, user.country AS country } WHERE country = 'se' RETURN DISTINCT tweet ORDER BY time DESC LIMIT 10
  41. 41. Types of subqueries Nested - Run any complete read-only Cypher query - Incoming variables remain in scope: correlated subquery - Arbitrary depth Existential returns true if at least one match found; false otherwise Scalar result is a single value in a single row List result is the list formed by collecting all the values of all rows (single value per row) Updating: simple and conditional updates, executed once per incoming row
  42. 42. (Cypher)-[:WITH]->(Path:Pattern:Queries)
  43. 43. Why? Find complex connections Repetitions of patterns: ( likes.hates )+ Alternatives between patterns rather than just a single relationship type: ( drinks | eats )* Express patterns directly, rather than resorting to using UNION
  44. 44. Example: a sad state of affairs... Find a chain of unreciprocated lovers: PATH PATTERN unreciprocated_love = (a)-[:LOVES]->(b) WHERE NOT EXISTS { (b)-[:LOVES]->(a) } MATCH (you)-/~unreciprocated_love*/->(someone) Named Path Predicate
  45. 45. Relationship Type Predicate ()-/:FOO/-() Node Predicates ()-/(:Alpha {beta:'gamma'})/-() Alternation ()-/:FOO | :BAR | :BAZ/-() Sequence ()-/:FOO :BAR :BAZ/-() Grouping ()-/:FOO | [:BAR :BAZ]/-() Direction ()-/<:FOO :BAR <:BAZ>/->() Repetition ()-/:FOO? :BAR+ :BAZ* :FOO*3.. :BAR*1..5/-()
  46. 46. (:Cypher)-[:IS]->(:Everywhere)
  47. 47. openCypher: Summer of Syntax Multiple graphs Subqueries Path pattern queries (complex pattern matching) Aggregation and grouping MANDATORY MATCH Configurable pattern matching Cypher versioning & Cypher 9
  48. 48. Want to find out more? Join us at the openCypher Meetup! Wednesday, 25 October, 5:30pm - 8pm WeWork Park South at 110 East 28th Street, NY Agenda - Multiple graphs, subqueries, path pattern queries - Connecting research in graph processing to industrial technologies - Property graphs with time
  49. 49. (Cypher)-[:IS]->(:Everywhere) CAPS core alpha source release with multiple graphs out now, production-ready release next year Plus commercial release (from Neo4j): Data lake integration and other sophisticated graph data sources Also: Cypher over Gremlin is in the works! => Cypher everywhere openCypher continues to evolve: Get involved! openCypher.org Upcoming - openCypher booth here at GraphConnect NYC - openCypher meetup tomorrow: opencypher.org/event/2017/10/25/event-oc-meetup/ - Third openCypher implementers meeting: opencypher.org/event/2017/11/13/ocim3/
  50. 50. (:Thank)-[:-]->(:You) stefan.plantikow@neo4j.com, martin.junghanns@neo4j.com, max.kiessling@neo4j.com, petra.selmer@neo4j.com

×