SlideShare une entreprise Scribd logo
1  sur  20
Télécharger pour lire hors ligne
Extended Property Graphs and Cypher on Gradoop
1st openCypher Implementers Meeting
8 February 2017
Walldorf, Germany
Martin Junghanns
University of Leipzig – Database Research Group
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 2
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
2
Gradoop
„An open-source graph dataflow system for
declarative analytics of heterogeneous graph data.“
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 3
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
3
Gradoop
Distributed Graph Storage (Apache HDFS)
Apache Flink Operator Implementation
Distributed Operator Execution (Apache Flink)
Extended Property Graph Model (EPGM)
Graph Dataflow Operators
I/O
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 4
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
4
Extended Property Graphs
• Vertices and directed Edges
• Logical Graphs
• Identifiers
• Type Labels
• Properties
1 2
3
4
5
1 2 3
4
5
2
1
Hobbit
name : Samwise
Orc
name : Azog
Clan
name : Tribes of Moria
founded : 1981
Orc
name : Bolg
Hobbit
name : Frodo
yob : 2968
leaderOf
since : 2790
memberOf
since : 2013
hates
since : 2301
hates
knows
since : 2990
|Area|title:Mordor
|Area|title:Shire
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 5
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
5
Extended Property Graphs
Graph Operators/Transformations
Unary Binary
GraphCollectionLogicalGraph
Equality
Union
Intersection
Difference
Limit
Selection
Pattern Matching
Distinct
Apply
Reduce
Call
Aggregation
Pattern Matching
Transformation
Grouping
Call
Subgraph
Equality
Combination
Overlap
Exclusion
Fusion
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 6
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
6
Extended Property Graphs
3
1 3
4
5
2 3
4
1 2
UDF
LogicalGraph graph3 = readFromHDFS();
LogicalGraph graph4 = graph3.subgraph(
(vertex => vertex.getLabel().equals(‘Green’)),
(edge => edge.getLabel().equals(‘orange’)));
Subgraph
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 7
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
7
Extended Property Graphs
3
1 3
4
5
2
Pattern
4 5
1 3
4
2
Graph Collection
GraphCollection collection = graph3.match(‘(:Green)-[:orange]->(:Orange)’);
Pattern Matching (Single Graph Input)
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 8
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
8
Cypher on Gradoop
„Which two clan leaders hate each other and one
of them knows Frodo over one to ten hops?“
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 9
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
9
Cypher on Gradoop
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 10
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
10
Cypher on Gradoop
PlanTableEntry | type: GRAPH | all-vars: [...] | proc-vars: [...] | attr-vars: [] | est-card: 23 | prediates: () | Plan :
|-FilterEmbeddingsNode{filterPredicate=((c1 != c2) AND (o1 != o2))}
|.|-JoinEmbeddingsNode{joinVariables=[o2], vertexMorphism=H, edgeMorphism=I}
|.|.|-JoinEmbeddingsNode{joinVariables=[o1], vertexMorphism=H, edgeMorphism=I}
|.|.|.|-JoinEmbeddingsNode{joinVariables=[c1], vertexMorphism=H, edgeMorphism=I}
|.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=c1, filterPredicate=((c1.label = Clan)), projectionKeys=[]}
|.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o1', edgeVar='_e0', targetVar='c1', filterPredicate=((_e0.label = leaderOf)), projectionKeys=[]}
|.|.|.|-JoinEmbeddingsNode{joinVariables=[o1], vertexMorphism=H, edgeMorphism=I}
|.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=o1, filterPredicate=((o1.label = Orc)), projectionKeys=[]}
|.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o1', edgeVar='_e1', targetVar='o2', filterPredicate=((_e1.label = hates)), projectionKeys=[]}
|.|.|-JoinEmbeddingsNode{joinVariables=[o2], vertexMorphism=H, edgeMorphism=I}
|.|.|.|-JoinEmbeddingsNode{joinVariables=[h], vertexMorphism=H, edgeMorphism=I}
|.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=h, filterPredicate=((h.label = Hobbit) AND (h.name = Frodo Baggins)), projectionKeys=[]}
|.|.|.|.|-ExpandEmbeddingsNode={startVar='o2', pathVar='_e3', endVar='h', lb=1, ub=10, direction=OUT, vertexMorphism=H, edgeMorphism=I}
|.|.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=o2, filterPredicate=((o2.label = Orc)), projectionKeys=[]}
|.|.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o2', edgeVar='_e3', targetVar='h', filterPredicate=((_e3.label = knows)), projectionKeys=[]}
|.|.|.|-JoinEmbeddingsNode{joinVariables=[c2], vertexMorphism=H, edgeMorphism=I}
|.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=c2, filterPredicate=((c2.label = Clan)), projectionKeys=[]}
|.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o2', edgeVar='_e2', targetVar='c2', filterPredicate=((_e2.label = leaderOf)), projectionKeys=[]}
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 11
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
11
Cypher on Gradoop
0 1 2 3 4 5 6 7 8 9
1 37 5 3 7 8 45 99 12 3
0 1 2
Frodo Baggins 1.22 Saruman
0
45: [4,1,33]
EmbeddingMetaData – Stores information about the embedding content
Mapping : Variable -> ID Column {h: 0, e1: 1, o2: 5, ...}
Mapping : Variable.Property -> Property Column {h.name: 0, h.height: 1, c1.name: 2, ...}
Embedding - Data structure used for intermediate results
Identifiers
Properties
Paths
Embedding
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 12
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
12
Cypher on Gradoop
Filter
Hobbit(name=Frodo Baggins)
name: Frodo Baggins
height: 1.22m
gender: male
city: Bag End
Project
[ ]
h.id h.name h.height …
31 Frodo 1.22 …
h.id
32
id Properties
1 {…}
2 {…}
3 {…}
… …
DataSet<Vertex> DataSet<Embedding>
FlatMap(Vertex -> Embedding)
𝜋ℎ.𝐼𝑑(𝑉′
)𝜎 𝐿𝑎𝑏𝑒𝑙=𝐻𝑜𝑏𝑏𝑖𝑡
∧𝑛𝑎𝑚𝑒=𝐹𝑟𝑜𝑑𝑜
(𝑉)
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 13
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
13
Cypher on Gradoop
c.id _e1.id o1.id
51 11 2
52 12 3
… … …
DataSet<Embedding> DataSet<Embedding>
FlatJoin(lhs, rhs -> combine(lhs, rhs))DataSet<Embedding>
o1.id _e2.id o2.id
2 13 5
3 14 3
… … …
c.id _e1.id o1.id _e2.id o2.id
51 11 2 13 5
52 12 3 14 3
… … …
Combine
Check for vertex/edge isomorphism,
Remove duplicate entries
JoinEmbeddings
Left: (c1:Clan)<-[:hasLeader]-(o1:Orc)
Right: (o1:Orc)-[:hates]->(o2.Orc)
𝐿 ⋈ 𝑜1.𝑖𝑑 𝑅
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 14
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
14
Cypher on Gradoop
ExpandEmbeddings
o2.id
5
DataSet<Embedding>
DataSet<Embedding>
DataSet<Embedding>
_e3.sid _e3.id _e3.tid
5 26 31
31 27 32
32 28 33
o2.id _e3.id h.id
3 [26] 31
3 [26,31,27] 32
3 [26,31,27,32,28] 33
FlatJoin(lhs, rhs ->
combine(lhs, rhs))
BulkIteration
𝐿 ⋈ 𝑜2.𝑖𝑑=_𝑒3.𝑠𝑖𝑑 𝐸
Left: (o2:Orc)
Edge: (o2)-[:knows*1..10]->(h)
𝐸′ ⋈ 𝑒.𝑡𝑖𝑑=_𝑒3.𝑠𝑖𝑑 𝐸
Combine
Check for vertex/edge isomorphism
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 15
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
15
Cypher on Gradoop
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 16
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
16
Cypher on Gradoop
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 17
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
17
Cypher on Gradoop
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 18
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
18
Conclusion
• Implement Cypher Technology Compatibility KIT (TCK) integration tests
• Benchmarking
• Implement and evaluate LDBC benchmarking queries
• Optimizations
• DP-Planner
• Improve cost model (more statistics, Flink optimizer hints)
• Reuse of intermediate results
• Consider graph partitioning
• Support more Cypher features
• e.g. Aggregation and Functions
• Introduce new Cypher features
• e.g. regular path queries
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 19
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
19
Conclusion
• Gradoop on Apache Flink
• Extended Property Graph abstraction on Apache Flink
• Schema flexible: Type Labels and Properties
• Logical Graphs / Graphs Collection
• Graph Transformations for Graphs and Graph collections
• Cypher on Gradoop
• Covering many Cypher features (variable length paths, predicates)
• Query execution engine incl. Greedy cost-based optimizer
• Physical operators mapped to Flink transformations
www.gradoop.com
[1] Junghanns, M.; Petermann, A.; K.; Rahm, E., „Distributed Grouping of Property Graphs with Gradoop“,
Proc. BTW Conf. , 2017.
[2] Petermann, A.; Junghanns, M.; Kemper, S.; Gomez, K.; Teichmann, N.; Rahm, E., „Graph Mining for Complex Data Analytics “,
Proc. ICDM Conf. (Demo), 2016.
[3] Junghanns, M.; Petermann, A.; Teichmann, N.; Gomez, K.; Rahm, E., „Analyzing Extended Property Graphs with Apache Flink“,
Int. Workshop on Network Data Analytics (NDA), SIGMOD, 2016.
[4] Petermann, A.; Junghanns, M., „Scalable Business Intelligence with Graph Collections“,
it – Special Issue on Big Data Analytics, 2016.
[5] Petermann, A.; Junghanns, M.; Müller, M.; Rahm, E., „Graph-based Data Integration and Business Intelligence with BIIIG“,
Proc. VLDB Conf. (Demo), 2014.

Contenu connexe

Similaire à Extended Property Graphs and Cypher on Gradoop

RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
Connected Data World
 
ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jong...
ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jong...ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jong...
ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jong...
Spark Summit
 
Geo script opengeo spring 2013
Geo script opengeo spring 2013Geo script opengeo spring 2013
Geo script opengeo spring 2013
Ilya Rosenfeld
 
Gl tf siggraph-2013
Gl tf siggraph-2013Gl tf siggraph-2013
Gl tf siggraph-2013
Khaled MAMOU
 

Similaire à Extended Property Graphs and Cypher on Gradoop (20)

RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jong...
ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jong...ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jong...
ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jong...
 
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
 
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
 
Graphalytics: A big data benchmark for graph-processing platforms
Graphalytics: A big data benchmark for graph-processing platformsGraphalytics: A big data benchmark for graph-processing platforms
Graphalytics: A big data benchmark for graph-processing platforms
 
Geo script opengeo spring 2013
Geo script opengeo spring 2013Geo script opengeo spring 2013
Geo script opengeo spring 2013
 
NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDA
 
Gl tf siggraph-2013
Gl tf siggraph-2013Gl tf siggraph-2013
Gl tf siggraph-2013
 
R programming for data science
R programming for data scienceR programming for data science
R programming for data science
 
Dgraph: Graph database for production environment
Dgraph:  Graph database for production environmentDgraph:  Graph database for production environment
Dgraph: Graph database for production environment
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
 
GoMR: A MapReduce Framework for Go
GoMR: A MapReduce Framework for GoGoMR: A MapReduce Framework for Go
GoMR: A MapReduce Framework for Go
 
Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark Downscaling
 
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
 
Processing Landsat 8 Multi-Spectral Images with GRASS Tools & the potential o...
Processing Landsat 8 Multi-Spectral Images with GRASS Tools & the potential o...Processing Landsat 8 Multi-Spectral Images with GRASS Tools & the potential o...
Processing Landsat 8 Multi-Spectral Images with GRASS Tools & the potential o...
 
pmux
pmuxpmux
pmux
 
Sparksummit2016 share
Sparksummit2016 shareSparksummit2016 share
Sparksummit2016 share
 
mago3D FOSS4G NA 2018
mago3D FOSS4G NA 2018mago3D FOSS4G NA 2018
mago3D FOSS4G NA 2018
 
Graph ql api gateway
Graph ql api gatewayGraph ql api gateway
Graph ql api gateway
 

Plus de openCypher

Plus de openCypher (20)

Learning Timed Automata with Cypher
Learning Timed Automata with CypherLearning Timed Automata with Cypher
Learning Timed Automata with Cypher
 
Incremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesIncremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher Queries
 
Formal semantics for Cypher queries and updates
Formal semantics for Cypher queries and updatesFormal semantics for Cypher queries and updates
Formal semantics for Cypher queries and updates
 
Cypher.PL: an executable specification of Cypher semantics
Cypher.PL: an executable specification of Cypher semanticsCypher.PL: an executable specification of Cypher semantics
Cypher.PL: an executable specification of Cypher semantics
 
Multiple Graphs: Updatable Views
Multiple Graphs: Updatable ViewsMultiple Graphs: Updatable Views
Multiple Graphs: Updatable Views
 
Micro-Servicing Linked Data
Micro-Servicing Linked DataMicro-Servicing Linked Data
Micro-Servicing Linked Data
 
Graph abstraction
Graph abstractionGraph abstraction
Graph abstraction
 
From Cypher 9 to GQL: Conceptual overview of multiple named graphs and compos...
From Cypher 9 to GQL: Conceptual overview of multiple named graphs and compos...From Cypher 9 to GQL: Conceptual overview of multiple named graphs and compos...
From Cypher 9 to GQL: Conceptual overview of multiple named graphs and compos...
 
Cypher for Gremlin
Cypher for GremlinCypher for Gremlin
Cypher for Gremlin
 
Comparing PGQL, G-Core and Cypher
Comparing PGQL, G-Core and CypherComparing PGQL, G-Core and Cypher
Comparing PGQL, G-Core and Cypher
 
Multiple graphs in openCypher
Multiple graphs in openCypherMultiple graphs in openCypher
Multiple graphs in openCypher
 
Eighth openCypher Implementers Group Meeting: Status Update
Eighth openCypher Implementers Group Meeting: Status UpdateEighth openCypher Implementers Group Meeting: Status Update
Eighth openCypher Implementers Group Meeting: Status Update
 
Cypher for Gremlin
Cypher for GremlinCypher for Gremlin
Cypher for Gremlin
 
Supporting dates and times in Cypher
Supporting dates and times in CypherSupporting dates and times in Cypher
Supporting dates and times in Cypher
 
Seventh openCypher Implementers Group Meeting: Status Update
Seventh openCypher Implementers Group Meeting: Status UpdateSeventh openCypher Implementers Group Meeting: Status Update
Seventh openCypher Implementers Group Meeting: Status Update
 
Academic research on graph processing: connecting recent findings to industri...
Academic research on graph processing: connecting recent findings to industri...Academic research on graph processing: connecting recent findings to industri...
Academic research on graph processing: connecting recent findings to industri...
 
Property Graphs with Time
Property Graphs with TimeProperty Graphs with Time
Property Graphs with Time
 
Cypher.PL: Executable Specification of Cypher written in Prolog
Cypher.PL: Executable Specification of Cypher written in PrologCypher.PL: Executable Specification of Cypher written in Prolog
Cypher.PL: Executable Specification of Cypher written in Prolog
 
Use case: processing multiple graphs
Use case: processing multiple graphsUse case: processing multiple graphs
Use case: processing multiple graphs
 
openCypher Technology Compatibility Kit (TCK)
openCypher Technology Compatibility Kit (TCK)openCypher Technology Compatibility Kit (TCK)
openCypher Technology Compatibility Kit (TCK)
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Extended Property Graphs and Cypher on Gradoop

  • 1. Extended Property Graphs and Cypher on Gradoop 1st openCypher Implementers Meeting 8 February 2017 Walldorf, Germany Martin Junghanns University of Leipzig – Database Research Group
  • 2. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 2 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 2 Gradoop „An open-source graph dataflow system for declarative analytics of heterogeneous graph data.“
  • 3. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 3 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 3 Gradoop Distributed Graph Storage (Apache HDFS) Apache Flink Operator Implementation Distributed Operator Execution (Apache Flink) Extended Property Graph Model (EPGM) Graph Dataflow Operators I/O
  • 4. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 4 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 4 Extended Property Graphs • Vertices and directed Edges • Logical Graphs • Identifiers • Type Labels • Properties 1 2 3 4 5 1 2 3 4 5 2 1 Hobbit name : Samwise Orc name : Azog Clan name : Tribes of Moria founded : 1981 Orc name : Bolg Hobbit name : Frodo yob : 2968 leaderOf since : 2790 memberOf since : 2013 hates since : 2301 hates knows since : 2990 |Area|title:Mordor |Area|title:Shire
  • 5. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 5 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 5 Extended Property Graphs Graph Operators/Transformations Unary Binary GraphCollectionLogicalGraph Equality Union Intersection Difference Limit Selection Pattern Matching Distinct Apply Reduce Call Aggregation Pattern Matching Transformation Grouping Call Subgraph Equality Combination Overlap Exclusion Fusion
  • 6. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 6 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 6 Extended Property Graphs 3 1 3 4 5 2 3 4 1 2 UDF LogicalGraph graph3 = readFromHDFS(); LogicalGraph graph4 = graph3.subgraph( (vertex => vertex.getLabel().equals(‘Green’)), (edge => edge.getLabel().equals(‘orange’))); Subgraph
  • 7. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 7 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 7 Extended Property Graphs 3 1 3 4 5 2 Pattern 4 5 1 3 4 2 Graph Collection GraphCollection collection = graph3.match(‘(:Green)-[:orange]->(:Orange)’); Pattern Matching (Single Graph Input)
  • 8. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 8 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 8 Cypher on Gradoop „Which two clan leaders hate each other and one of them knows Frodo over one to ten hops?“
  • 9. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 9 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 9 Cypher on Gradoop
  • 10. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 10 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 10 Cypher on Gradoop PlanTableEntry | type: GRAPH | all-vars: [...] | proc-vars: [...] | attr-vars: [] | est-card: 23 | prediates: () | Plan : |-FilterEmbeddingsNode{filterPredicate=((c1 != c2) AND (o1 != o2))} |.|-JoinEmbeddingsNode{joinVariables=[o2], vertexMorphism=H, edgeMorphism=I} |.|.|-JoinEmbeddingsNode{joinVariables=[o1], vertexMorphism=H, edgeMorphism=I} |.|.|.|-JoinEmbeddingsNode{joinVariables=[c1], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=c1, filterPredicate=((c1.label = Clan)), projectionKeys=[]} |.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o1', edgeVar='_e0', targetVar='c1', filterPredicate=((_e0.label = leaderOf)), projectionKeys=[]} |.|.|.|-JoinEmbeddingsNode{joinVariables=[o1], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=o1, filterPredicate=((o1.label = Orc)), projectionKeys=[]} |.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o1', edgeVar='_e1', targetVar='o2', filterPredicate=((_e1.label = hates)), projectionKeys=[]} |.|.|-JoinEmbeddingsNode{joinVariables=[o2], vertexMorphism=H, edgeMorphism=I} |.|.|.|-JoinEmbeddingsNode{joinVariables=[h], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=h, filterPredicate=((h.label = Hobbit) AND (h.name = Frodo Baggins)), projectionKeys=[]} |.|.|.|.|-ExpandEmbeddingsNode={startVar='o2', pathVar='_e3', endVar='h', lb=1, ub=10, direction=OUT, vertexMorphism=H, edgeMorphism=I} |.|.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=o2, filterPredicate=((o2.label = Orc)), projectionKeys=[]} |.|.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o2', edgeVar='_e3', targetVar='h', filterPredicate=((_e3.label = knows)), projectionKeys=[]} |.|.|.|-JoinEmbeddingsNode{joinVariables=[c2], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=c2, filterPredicate=((c2.label = Clan)), projectionKeys=[]} |.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o2', edgeVar='_e2', targetVar='c2', filterPredicate=((_e2.label = leaderOf)), projectionKeys=[]}
  • 11. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 11 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 11 Cypher on Gradoop 0 1 2 3 4 5 6 7 8 9 1 37 5 3 7 8 45 99 12 3 0 1 2 Frodo Baggins 1.22 Saruman 0 45: [4,1,33] EmbeddingMetaData – Stores information about the embedding content Mapping : Variable -> ID Column {h: 0, e1: 1, o2: 5, ...} Mapping : Variable.Property -> Property Column {h.name: 0, h.height: 1, c1.name: 2, ...} Embedding - Data structure used for intermediate results Identifiers Properties Paths Embedding
  • 12. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 12 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 12 Cypher on Gradoop Filter Hobbit(name=Frodo Baggins) name: Frodo Baggins height: 1.22m gender: male city: Bag End Project [ ] h.id h.name h.height … 31 Frodo 1.22 … h.id 32 id Properties 1 {…} 2 {…} 3 {…} … … DataSet<Vertex> DataSet<Embedding> FlatMap(Vertex -> Embedding) 𝜋ℎ.𝐼𝑑(𝑉′ )𝜎 𝐿𝑎𝑏𝑒𝑙=𝐻𝑜𝑏𝑏𝑖𝑡 ∧𝑛𝑎𝑚𝑒=𝐹𝑟𝑜𝑑𝑜 (𝑉)
  • 13. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 13 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 13 Cypher on Gradoop c.id _e1.id o1.id 51 11 2 52 12 3 … … … DataSet<Embedding> DataSet<Embedding> FlatJoin(lhs, rhs -> combine(lhs, rhs))DataSet<Embedding> o1.id _e2.id o2.id 2 13 5 3 14 3 … … … c.id _e1.id o1.id _e2.id o2.id 51 11 2 13 5 52 12 3 14 3 … … … Combine Check for vertex/edge isomorphism, Remove duplicate entries JoinEmbeddings Left: (c1:Clan)<-[:hasLeader]-(o1:Orc) Right: (o1:Orc)-[:hates]->(o2.Orc) 𝐿 ⋈ 𝑜1.𝑖𝑑 𝑅
  • 14. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 14 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 14 Cypher on Gradoop ExpandEmbeddings o2.id 5 DataSet<Embedding> DataSet<Embedding> DataSet<Embedding> _e3.sid _e3.id _e3.tid 5 26 31 31 27 32 32 28 33 o2.id _e3.id h.id 3 [26] 31 3 [26,31,27] 32 3 [26,31,27,32,28] 33 FlatJoin(lhs, rhs -> combine(lhs, rhs)) BulkIteration 𝐿 ⋈ 𝑜2.𝑖𝑑=_𝑒3.𝑠𝑖𝑑 𝐸 Left: (o2:Orc) Edge: (o2)-[:knows*1..10]->(h) 𝐸′ ⋈ 𝑒.𝑡𝑖𝑑=_𝑒3.𝑠𝑖𝑑 𝐸 Combine Check for vertex/edge isomorphism
  • 15. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 15 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 15 Cypher on Gradoop
  • 16. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 16 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 16 Cypher on Gradoop
  • 17. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 17 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 17 Cypher on Gradoop
  • 18. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 18 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 18 Conclusion • Implement Cypher Technology Compatibility KIT (TCK) integration tests • Benchmarking • Implement and evaluate LDBC benchmarking queries • Optimizations • DP-Planner • Improve cost model (more statistics, Flink optimizer hints) • Reuse of intermediate results • Consider graph partitioning • Support more Cypher features • e.g. Aggregation and Functions • Introduce new Cypher features • e.g. regular path queries
  • 19. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 19 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 19 Conclusion • Gradoop on Apache Flink • Extended Property Graph abstraction on Apache Flink • Schema flexible: Type Labels and Properties • Logical Graphs / Graphs Collection • Graph Transformations for Graphs and Graph collections • Cypher on Gradoop • Covering many Cypher features (variable length paths, predicates) • Query execution engine incl. Greedy cost-based optimizer • Physical operators mapped to Flink transformations
  • 20. www.gradoop.com [1] Junghanns, M.; Petermann, A.; K.; Rahm, E., „Distributed Grouping of Property Graphs with Gradoop“, Proc. BTW Conf. , 2017. [2] Petermann, A.; Junghanns, M.; Kemper, S.; Gomez, K.; Teichmann, N.; Rahm, E., „Graph Mining for Complex Data Analytics “, Proc. ICDM Conf. (Demo), 2016. [3] Junghanns, M.; Petermann, A.; Teichmann, N.; Gomez, K.; Rahm, E., „Analyzing Extended Property Graphs with Apache Flink“, Int. Workshop on Network Data Analytics (NDA), SIGMOD, 2016. [4] Petermann, A.; Junghanns, M., „Scalable Business Intelligence with Graph Collections“, it – Special Issue on Big Data Analytics, 2016. [5] Petermann, A.; Junghanns, M.; Müller, M.; Rahm, E., „Graph-based Data Integration and Business Intelligence with BIIIG“, Proc. VLDB Conf. (Demo), 2014.