3. 3GRAPH EXPLORATION
Efficiently extracting knowledge from graph data
even if we do not know exactly what we are looking for
Graph Exploration: From Users to Large Graphs.
CIKM 2016, SIGMOD 2017, KDD 2018
6. 6
3,27 × 109
base pairs
Graph Exploration in Biology - Complex Graphs
http://jcs.biologists.org/content/joces/118/21/4947/F3.large.jpg
7. Graph Exploration in Biology - Status Quo
MATCH (p1:Phenotype)-[:HAS]-(a1:Association)-[:HAS]-(snp:Snp)-[:HAS]-(a2:Association)-[:HAS]-(p2:Phenotype)
WHERE p1.name = 'foo1‘ AND p2.name = 'foo2p‘ AND a1.p < 0.01 AND a2.p < 0.01
WITH DISTINCT snpORDER BY snp.sid
RETURN collect(snp.sid)
7
MATCH
(p1:Phenotype)-[:HAS]-(a1:Association)-[:HAS]-(snp:Snp)-[:HAS]-(a2:Association)-[:HAS]-(p2:Phenotype
)
WHERE p1.name = 'foo1'
AND p2.name = ‘foo2'
AND a1.p < 0.01 AND a2.p < 0.01
WITH DISTINCT snp
MATCH (snp)-[:IN]-(pw:PositionWindow)<-[:IN]-(l:Locus)--(g:Gene)
WHERE l.feature= 'gene'
RETURN collect(DISTINCT g.name)
MATCH
(p1:Phenotype)-[:HAS]-(a1:Association)-[:HAS]-(snp:Snp)-[:HAS]-(a2:Association)-[:HAS]-(p2:Phenotype)
WHEREp1.name = 'foo1'
ANDp2.name = ‘foo2'
ANDa1.p < 0.01 AND a2.p < 0.01
WITH DISTINCT snp
MATCH(snp)-[:IN]-(pw:PositionWindow)<-[:IN]-(l:Locus)--(g:Gene)
WHERE l.feature= 'gene'
WITH DISTINCT g ORDER BY g.name
MATCH(g)-[:CODES]-(:Transcript)-[:CODES]-(p:Protein)-[:MEMBER]-(go:Goterm)
WHERE go.namespace= 'biological_process'
WITH DISTINCT go,p
RETURN go.name, count(p) ORDER BY count(p)DESC
LIMIT 10
MATCH
(p1:Phenotype)-[:HAS]-(a1:Association)-[:HAS]-(snp:Snp)-[:HAS]-(a2:Association)-[:HAS]-(p2:Phenotype)
WHERE p1.name = 'foo1'
AND p2.name = ‘foo2'AND a1.p < 0.01 AND a2.p < 0.01
WITH DISTINCT snpMATCH (snp)-[:IN]-(pw:PositionWindow)<-[:IN]-(l:Locus)--(g:Gene)
WHERE l.feature= 'gene'
WITH DISTINCT gORDER BY g.name
MATCH (g)-[:CODES]-(:Transcript)-[:IS]-(ps:Probeset)-[:SIG]-(s:Sample)
WHERE s.name= 'mustafavi'
RETURN DISTINCT g.name
9. Problem
● Given two node sets:
How similar are they in my understanding?
● Example
→ Set of movies I like
→ Set of movies I don’t know
→ Will I like the movies I don’t know?
9
As
Good
As It
Gets
Hell or
High
Water
Pulp
Fiction
The
Matrix
Skyfall
Avatar
?
10. What is a Knowledge Graph?
● (directed) graph G : ⟨V, E, φ, ψ⟩, where
○ V is a set of nodes,
○ E ⊆ V × V is a set of edges,
○ φ : V → LV
is an edge labeling function
and
○ ψ : E → LE
is a node labeling function
We refer to the elements of LV
and LE
as node
labels and edge labels
10
11. What are Meta-Paths?
Graph Path
Graph Schema Meta-Path
A meta-path for a path ⟨n1
, ..., nt
⟩, ni
∈ V , 1 ≤ i ≤ t is a sequence
P : ⟨φ(n1
),ψ (n1
, n2
), ..., ψ (nt−1
, nt
), φ(nt
)⟩ that alternates node- and
edge-types along the path.
11
12. Motivating Example
Q: How famous is Diane Kruger in America?
MATCH(n:Person)
WHERE n.name = “Diane Kruger”
RETURN n
MATCH(m:Movie)
WHERE m.location = “America”
RETURN m
Diane
Kruger
As
Good
As It
Gets
Stand
By MeTop
Gun
Pulp
Fiction A Few
Good
Men
The
Matrix
Up
12
13. How similar are they?
● Similarity depends on
○ expert knowledge
○ connections among nodes
13
15. Approximate Meta-Paths
Problem: How to compute all meta-paths fast?
Approx. Solution: Mine meta-paths using the graph’s schema!
Meta-Paths Computation
Compute schema
15
16. Approximate Meta-Paths
Problem: How to compute all meta-paths fast?
Approx. Solution: Mine meta-paths using the graph’s schema!
Meta-Paths Computation
Meta-Path
approximation
• Extract real and some
non-existent
meta-paths via DFS
on schema
Classifier
• Real meta-paths as
training data
• Predict probability of
meta-path
Exhaustive Meta-Path
computation
• Extract real
meta-paths via DFS
• Early termination
Training data Inference data
Neo4j Graph Algorithm Procedures
Use procedures for schema and meta-paths computation
16
17. Learning a Meta-Path Embedding
Problem: Vector representation required for active
learning and preference prediction.
Meta-Paths Embedding
?
(3 5 1)T
17
18. Learning a Meta-Path Embedding
Problem: Vector representation required for active
learning and preference prediction.
Solution: Embed meta-paths
→ Similar meta-paths should have similar vectors.
Challenges:
- Multiple similarity interpretations for meta-paths
- Large number of meta-paths.
- Shared parts between meta-paths.
Our method: Modified word2vec (and paragraph
vectors) method.
Meta-Paths Embedding
(3 5 1)T
18
19. Learn the Domain Value of all Meta-Paths
- Problem: Users don’t want to rate all meta-paths
→ too many
→ time-consuming
→ tedious and boring
- Solution: Label only a few, but very informative paths
→ explore the space of all meta-paths
→ sequentially query the most uncertain datapoint
⇒ use active learning strategy
Active Learning
✓✗
◎
19
21. Use Learned Preferences for Graph Exploration
Result Explanation
Icons made by Eucalyp from www.flaticon.com is
licensed by CC 3.0 BY
Graph (with
meta-paths)
Domain Knowledge
What is
important in
the graph?
Personalized Exploration
Tool
Similarity Measure
Related Nodes
Stats
21
22. Personalized Exploration Tool
Result Explanation
Transform Nodes to
Vectors
(Graph-Embedding)
Adapt Vectors Using
Domain-Knowledge
Personalized Vector
Space
precomputed
22
23. Personalized Exploration Tool
Result Explanation
23
Problem: Suggest similar nodes with regard to given node sets and
user preferences.
Solution: Learn personalized similarity metric between nodes →
Similar nodes should have similar vector space embeddings.
Challenges:
- Different view of abstraction between meta-path preferences and
node instances
- High computational costs but online adaption of embedding space
needed
Our method: Modified node2vec through bias on already
pretrained embeddings
24. Personalized Exploration Tool
Result Explanation
What nodes
are close to
my selection?How close are
my sets?
Find clusters!
What are
outliers?Personalized Vector
Space
24
25. System Architecture - How does it work with Neo4j?
Neo4j Graph Database
Neo4j Graph Algorithm Procedures
Containing Meta-Paths Computation
Python Backend Server
ReactJS Frontend
Meta-Path
Embedding
Active Learning Explanation
Node selection
Meta-Path
ordering
Result
visualization
25
26. ● Easy to get your code running in neo4j.
● Neo4j-graph-algorithms: efficiency vs convenience.
● Sometimes no stack-trace when an error occurs.
● Great support and community. Always available.
● Cypher: Easy to begin with, hard to master.
(hpi)-[:LIKES]->(neo4j)
What about neo4j?
Meta-Paths Computation
26
27. Outlook
4 research topics:
● Efficient meta-paths computation
● Sensible meta-paths embedding
● Precise active learning of meta-paths ranking
● Explanation and exploration using ranked meta-paths
Open Source!
● Meta-paths computation in neo4j graph algos
● MetaExp Repository
● You can use and improve as well!
https://hpi.de/mueller/metaexp/ 27