SlideShare une entreprise Scribd logo
1  sur  86
Télécharger pour lire hors ligne
© 2022 Neo4j, Inc. All rights reserved.
© 2022 Neo4j, Inc. All rights reserved.
1
Workshop:
Neo4j Graph Data Science
Neo4j, Inc. All rights reserved 2022
Neo4j is a Native Graph Database
2
Neo4j, Inc. All rights reserved 2022
Relational VS Graph models
3
Relational Model Graph Model
KNOWS
KNOWS
KNOWS
ANDREAS
TOBIAS
MICA
DELIA
Person Friend
Person-Friend
ANDREAS
DELIA
TOBIAS
MICA
Neo4j, Inc. All rights reserved 2022
Labeled property graph model components
● Nodes
- Represent objects in the graph
● Relationships
- Relate nodes by type and direction
● Properties
- Name-value pairs that can go
on nodes and relationships
- Can have indexes and composite indexes
(types: String, Number, Long, Date, Spatial, byte
and arrays of those)
● Labels
- Group nodes
- Shape the domain
4
CAR
DRIVES
name: “Dan”
born: May 29, 1970
twitter: “@dan”
name: “Ann”
born: Dec 5, 1975
since:
Jan 10,
2011
brand: “Volvo”
model: “V70”
LOVES
LIVES WITH
O
W
N
S
PERSON PERSON
LOVES
© 2022 Neo4j, Inc. All rights reserved.
5
What is data science?
“Data science is an interdisciplinary
field that uses scientific methods,
processes, algorithms and systems to
extract knowledge and insights from
structured and unstructured data.” -
Wikipedia
Domain Knowledge
© 2022 Neo4j, Inc. All rights reserved.
6
What is Graph data science?
Graph Data Science is a science-
driven approach to gain knowledge
from the relationships and structures
in data, typically to power predictions.
Graph data scientists use
relationships to answer
questions.
© 2022 Neo4j, Inc. All rights reserved.
7
So, When Do I Need Graph Algorithms?
Query (Cypher)
Real-time, local decisioning
and pattern matching
Graph Algorithms
Global analysis
and iterations
You know what you’re looking
for and making a decision
You’re learning the overall
structure of a network, updating
data, and predicting
Local
Patterns
Global
Computation
© 2022 Neo4j, Inc. All rights reserved.
8
Graph Algorithm Categories
Determines the importance of
distinct nodes in the network
Finds optimal paths or evaluates
route availability and quality
Detects group clustering or
partition
Evaluates how alike nodes are by
neighbours and relationships
Pathfinding
& Search
Centrality &
Importance
Community
Detection
Similarity
Heuristic Link
Prediction
Estimates the likelihood of nodes
forming a future relationship
Node Embeddings
& ML
Compute low-dimensional vector
representations of nodes in a graph, and
allow you to train supervised machine
learning models
https://neo4j.com/docs/graph-data-science/current/
© 2022 Neo4j, Inc. All rights reserved.
9
60+ Graph Data Science Techniques in Neo4j
Pathfinding &
Search
• Shortest Path
• Single-Source Shortest Path
• All Pairs Shortest Path
• A* Shortest Path
• Yen’s K Shortest Path
• Minimum Weight Spanning Tree
• K-Spanning Tree (MST)
• Random Walk
• Breadth & Depth First Search
Centrality &
Importance
• Degree Centrality
• Closeness Centrality
• Harmonic Centrality
• Betweenness Centrality & Approx.
• PageRank
• Personalized PageRank
• ArticleRank
• Eigenvector Centrality
• Hyperlink Induced Topic Search (HITS)
• Influence Maximization (Greedy, CELF)
Community
Detection
• Triangle Count
• Local Clustering Coefficient
• Connected Components (Union Find)
• Strongly Connected Components
• Label Propagation
• Louvain Modularity
• K-1 Coloring
• Modularity Optimization
• Speaker Listener Label Propagation
Supervised
Machine Learning
• Node Classification
• Link Prediction
• Node Regression
… and more!
Heuristic Link
Prediction
• Adamic Adar
• Common Neighbors
• Preferential Attachment
• Resource Allocations
• Same Community
• Total Neighbors
Similarity
• Node Similarity
• K-Nearest Neighbors (KNN)
• Jaccard Similarity
• Cosine Similarity
• Pearson Similarity
• Euclidean Distance
• Approximate Nearest Neighbors (ANN)
Graph
Embeddings
• Node2Vec
• FastRP
• FastRPExtended
• GraphSAGE
• Synthetic Graph Generation
• Scale Properties
• Collapse Paths
• One Hot Encoding
• Split Relationships
• Graph Export
• Pregel API (write your own algos)
© 2022 Neo4j, Inc. All rights reserved.
10
How can they be used?
Stand Alone Solution
Find significant patterns and optimal
structures
Use community detection and
similarity scores for recommendations
Machine Learning Pipeline
Use the measures as features to train
an ML model
1st
node
2nd
node
Common
neighbors
Preferential
attachment
Label
1 2 4 15 1
3 4 7 12 1
5 6 1 1 0
10
© 2022 Neo4j, Inc. All rights reserved.
11
Access & deploy GDS
● In addition to the Neo4j Browser, access to the GDS library can be done
using the Neo4j Drivers
© 2022 Neo4j, Inc. All rights reserved.
12
What, Where & Who?
© 2022 Neo4j, Inc. All rights reserved.
13
Which of the colored nodes would be considered the most
‘important'?
© 2022 Neo4j, Inc. All rights reserved.
14
Which of the colored nodes would be considered the most
‘important'?
D has the highest valence
This is the most connected individual in the network. If
importance is how well you are personally known, you
would pick D.
G has the highest closeness centrality (0,52)
Information will disperse through the network quicker
through this individual. If you need to get a message out
rapidly, you would choose G.
I has the highest betweenness centrality (0,59)
This element is an efficient connector to other elements. Risk of
disruption is higher if you lose I.
© 2022 Neo4j, Inc. All rights reserved.
I'm in it for the money
Who will most likely get the highest pay rise?
15
It's the bridging employees.
© 2022 Neo4j, Inc. All rights reserved.
16
Where - Horizontal
What are the Graph Data Science sweet spots?
Fraud
Detection
Disambiguation &
Segmentation
Personalized
Recommendations
Churn
Prediction
Search &
Master Data Mgmt.
Predictive
Maintenance
Cybersecurity
© 2022 Neo4j, Inc. All rights reserved.
Where - Finance
● Synthetic identity fraud
● Fraud rings
● Money laundering
● Recommendations
● Customer segmentation
● Churn prediction
● ...
© 2022 Neo4j, Inc. All rights reserved.
18
Where - Healthcare
● Drug repurposing
● Patient journey
● Contact tracing
● Regulatory compliance
● ...
© 2022 Neo4j, Inc. All rights reserved.
Where - Retail
● Logistics & Routing
● Supply chain
● Recommendations
● Customer segmentation
● ...
© 2022 Neo4j, Inc. All rights reserved.
20
Who - References
• Mostly anonymous users across devices and
sites with ever changing cookies
• 4.4 TB: +14 Bn nodes +20Bn relationships
• +160 Mn rich, unique profiles created
• 612% Increase in visits per profile
• Almost 70% of Credit Card fraud was missed
• Synthetic Identities were biggest challenge
• +1B Nodes and +1B Relationships to analyse
• Graph analytics with queries & algorithms
help find $10’s of millions of fraud in 1st year
Meredith Marketing
to the Anonymous
Financial Fraud
Detection & Recovery
Top 10
Bank
• Early intervention project with 3 yrs of visits,
tests & diagnosis with 10’s of Bn of records
• Finding similarities in patient journeys
• Graph algorithms for identifying
communities & best intervention points
AstraZeneca
Patient Journeys
© 2022 Neo4j, Inc. All rights reserved.
21
Interacting moving parts
© 2022 Neo4j, Inc. All rights reserved.
Describing the problem
22
Graph theory has been around for a while. So have a lot of the graph
algorithms. What you'll find is that the majority of them only works on a
very specific shape of graph ...
© 2022 Neo4j, Inc. All rights reserved.
Neo4j, Inc. All rights reserved 2021
23
Multipartite
• Multiple Node types
• Multiple Relationship types
• Most common graph
• (what we’ve seen so far)
Mercha
nt
Transaction
Bank
N
E
X
T
Client
Phone Email
NI
Numb
er
T
O
PERFORMED
FIRST_TX
LAST_T
X
TO
T
O
H
A
S
_
P
H
O
N
E
H
A
S
_
E
M
A
I
L
HA
S_
NI
_N
UM
BE
R
© 2022 Neo4j, Inc. All rights reserved.
Neo4j, Inc. All rights reserved 2021
24
Bipartite
• Contains nodes that can be
divided into two sets
◦ Such that relationships only
exist between sets but not
within each set.
• Node similarity relies on this
type of graph
Client
Phone Email
NI
Number
H
A
S
_
P
H
O
N
E
H
A
S
_
E
M
A
I
L
H
A
S
_
N
I_
N
U
M
B
E
R
© 2022 Neo4j, Inc. All rights reserved.
Neo4j, Inc. All rights reserved 2021
25
Monopartite
• Contains one node label
and relationship
• Most Graph Data Science
algorithms rely on this type
of graph
Client
T
R
A
N
S
F
E
R
_
T
O
© 2022 Neo4j, Inc. All rights reserved.
26
Why can’t I run my algorithm on a multipartite graph?
What if I try to run an algorithm on this graph?
• How many relationships does each person
have?
• How many relationships does each book
have?
• What is the direction of the relationships in
this graph?
• Can I reach a person node from another
person node, following the directed
relationships?
1 or 2
5 or 6
Person-[:APPEARED_IN]->Book
No!
© 2022 Neo4j, Inc. All rights reserved.
27
Why can’t I run my algorithm on a multipartite graph?
What if I try to run an algorithm on this graph?
• What would an algorithm that used the number
of edges each node has to calculate centrality
conclude?
• What would an algorithm that followed
directed relationships to find communities
conclude?
Books are more important than people
There are seven communities?
© 2022 Neo4j, Inc. All rights reserved.
28
Why can’t I run my algorithm on a multipartite graph?
If you want to find out:
• What person is the most important
• How many communities of people are there,
across all the books?
You need to reshape your graph!
© 2022 Neo4j, Inc. All rights reserved.
29
Graph Catalog
Procedures (part of the GDS library) that let you reshape and subset your
transactional graph so you have the right data in the right shape to run
analytical algorithms.
Mutable in-memory
Workspace
© 2022 Neo4j, Inc. All rights reserved.
Graph Algorithms
30
© 2022 Neo4j, Inc. All rights reserved.
31
Creating the graph projection
Projection will be loaded it into memory
CALL gds.graph.create('GraphProjection', 'Character',{
INTERACTS_WITH:{
type: 'INTERACTS_WITH',
properties: {count: {property: 'count'}}
}
}) YIELD
graphName,nodeCount,relationshipCount,createMillis;
This is a Native Projection. Very efficient but the graph must exist with the same structure in the
database!
© 2022 Neo4j, Inc. All rights reserved.
32
Calling an Algorithm Procedure
Good news! All algorithms in GDS follow the same syntax:
CALL gds[.<tier>].<algorithm>.<execution-mode>[.<estimate>](
graphName: STRING,
configuration: MAP
)
© 2022 Neo4j, Inc. All rights reserved.
33
Tiers of Support
Product supported: Supported by product engineering, tested for
stability, scale, fully optimized
CALL gds.<algorithm>.<execution-mode>[.<estimate>]
Beta: Candidate for product supported tier
CALL gds.beta.<algorithm>.<execution-mode>[.<estimate>]
Alpha: Experimental implementation, may be changed in future.
CALL gds.alpha.<algorithm>.<execution-mode>[.<estimate>]
CALL gds[.<tier>].<algorithm>.<execution-mode>[.<estimate>](
graphName: STRING,
configuration: MAP
)
© 2022 Neo4j, Inc. All rights reserved.
34
Execution Modes
Stream: Stream your results back as Cypher result rows. Generally node id(s) and scores.
CALL gds[.<tier>].<algorithm>.stream[.<estimate>]
Write: Write your results back to Neo4j as node or relationship properties, or new
relationships. Must specify writeProperty
CALL gds[.<tier>].<algorithm>.write[.<estimate>]
Mutate: update the in-memory graph with the results of the algorithm
CALL gds[.<tier>].<algorithm>.mutate[.<estimate>]
Stats: Returns statistics about the algorithm output - percentiles, counts
CALL gds[.<tier>].<algorithm>.stats[.<estimate>]
CALL gds[.<tier>].<algorithm>.<execution-mode>[.<estimate>](
graphName: STRING,
configuration: MAP
)
© 2022 Neo4j, Inc. All rights reserved.
35
Estimation
Estimate lets you estimate the memory requirements for running your
algorithm with the specified configuration -- just like .estimate with
graph catalog operations.
CALL gds.<algorithm>.<execution-mode>.estimate
Note: Only production quality algorithms support
.stats and .estimate
CALL gds[.<tier>].<algorithm>.<execution-mode>[.<estimate>](
graphName: STRING,
configuration: MAP
)
© 2022 Neo4j, Inc. All rights reserved.
36
Common Configuration Parameters
CALL gds[.<tier>].<algorithm>.<execution-mode>[.<estimate>](
graphName: STRING,
configuration: MAP
)
Key Meaning Default
concurrency How many concurrent threads can be used when executing the algo? 4
readConcurrency How many concurrent threads can be used when reading data? concurrency
writeConcurrency How many concurrent threads can be used when writing results? concurrency
relationshipWeightProperty Property containing the weight (must be numeric) null
writeProperty Property name to write back to n/a
© 2022 Neo4j, Inc. All rights reserved.
Graph Embeddings
and Graph Native ML
37
© 2022 Neo4j, Inc. All rights reserved.
Node Embedding
What are node embeddings?
How?
The representation of nodes as low-dimensional vectors that
summarize their graph position, the structure of their local graph
neighborhood as well as any possible node features
Encoder - Decoder Framework
© 2022 Neo4j, Inc. All rights reserved.
Node Embedding
© 2022 Neo4j, Inc. All rights reserved.
Node Embedding
Encode nodes such that similarity in
the embedding space, i.e. cosine
similarity, approximates similarity in
the graph
© 2022 Neo4j, Inc. All rights reserved.
Graph Embeddings in Neo4j
Node2Vec
Random walk based embedding
that can encode structural similarity
or topological proximity.
Easy to understand, interpretable
parameters, plenty of examples
GraphSAGE
Inductive embedding that encodes
properties of neighboring nodes
when learning topology.
Generalizes to unseen graphs, first
method to incorporate properties
FastRP
A super fast linear algebra based
approach to embeddings that can
encode topology or properties.
75,000x faster than Node2Vec
extended to encode properties
© 2022 Neo4j, Inc. All rights reserved.
42
GraphSAGE (SAmpling and AggreGatE)
A
A
010...01001l..001
1 ..n
1001l..001…..
010...n
...01001l..001..
.n
...01001l..001..
.n
A
SAMPLE AGGREGATE PREDICT
● Assumes that nodes in the same neighborhood should have similar representations
● Uses node properties in addition to relationships
● Inductive approach that learns a function to calculate an embedding
© 2022 Neo4j, Inc. All rights reserved.
43
Some final thoughts ...
© 2022 Neo4j, Inc. All rights reserved.
Data Science is COMPLICATED
44
Dozens of libraries,
hundreds of algos
& no docs!
How do we shape
data into a graph
in the first place?
We’ve picked a
library...good
luck learning the
syntax
What? We have
to build the
entire ETL pipeline
for this?
Are the results
right? How do
we get into
production?
Data
Modeling
Which
Algorithms?
Learn
Syntax
Reshape
Data
What
Now?
© 2022 Neo4j, Inc. All rights reserved.
45
SIMPLIFY your experience
Dozens of
libraries,
hundreds of algos
& no docs!
We’ve picked a
library...good
luck learning the
syntax
What? We have
to build the
entire ETL pipeline
for this?
Are the results
right? How do
we get into
production?
Data
Modeling
Which
Algorithms?
Learn
Syntax
Reshape
Data
What
Now?
We have validated
algos, clear docs, &
tutorials
Neo4j syntax is
standardized and
simplified
Seamlessly
reshape data with
1 command
Simply write results
to Neo4j & move to
production
With Neo4j
it’s already a
graph
© 2022 Neo4j, Inc. All rights reserved.
46
Eurovision Song Contest
© 2022 Neo4j, Inc. All rights reserved.
Why that dataset?
● Relatively easy to find
● The domain is generally understood
● The results of our queries and algorithms can be verified
● There are a lot of myths to debunk / confirm … almost everybody in
Europe has at least one of them in their heads.
© 2022 Neo4j, Inc. All rights reserved.
48
Model
That's a monopartite that is!
© 2022 Neo4j, Inc. All rights reserved.
Couple of points
● This is an instance model rather than a classical database model. As we
don't have a schema to generate, we can just as well show some sample
data.
● You could argue that the year should also be a property of the
relationship, rather than part of the type. However, most of the analysis
we'll do today will be year-based.
● The dataset contains data from 1975 to 2018. That data was the easiest
to normalize (the voting system has changed a lot over the years) and
stays clear of recent controversy. Feel free to go 1956 to 2022 afterwards
though, it's really fun.
© 2022 Neo4j, Inc. All rights reserved.
50
Cypher
Hands-on
© 2022 Neo4j, Inc. All rights reserved.
SingFollow along
How this is going to work is that I am going to avoid flipping back and forth
between slides and executing syntax. Instead you are going to execute
syntax!
In the virtual environment https://milano-summit.graphdatabase.ninja:7473/ ...
execute the following guide in the Neo4j Browser:
:play https://metis.graphdatabase.ninja/summit/cypher.html
:play https://metis.graphdatabase.ninja/summit/gds.html
You will find the syntax labeled with numbers, exactly as on the slides. So do
follow along!
© 2022 Neo4j, Inc. All rights reserved.
52
Taking it from the top
In Cypher you MATCH a pattern and then RETURN a result
MATCH (c:Country {name: "Finland"})
RETURN c;
001
Filtering is done with WHERE (this statement does exactly the same)
MATCH (c:Country)
WHERE c.name = "Finland"
RETURN c;
002
© 2022 Neo4j, Inc. All rights reserved.
Using patterns to answer questions
Who won in 1975?
MATCH (c:Country)<-[vote:VOTE_1975_JURY|VOTE_1975_PUBLIC]-()
RETURN c.name, sum(vote.weight) as score
ORDER BY score DESC LIMIT 10;
003
● The Netherlands (with Ding-a-Dong) did and you can check at
https://eurovisionworld.com/eurovision/1975, the data is correct.
● Please take a moment to note down the positions of Finland, Sweden and
Ireland (7, 8, 9), this is going to be useful in a bit.
© 2022 Neo4j, Inc. All rights reserved.
54
One more of those
Who won in 2006?
MATCH (c:Country)<-[vote:VOTE_2006_JURY|VOTE_2006_PUBLIC]-()
RETURN c.name, sum(vote.weight) as score
ORDER BY score DESC LIMIT 10;
004
Finland (Hard Rock Hallelujah) did
(https://eurovisionworld.com/eurovision/2006) … just in case you wondered
what the music was about.
© 2022 Neo4j, Inc. All rights reserved.
Let's up the ante
Does country-X almost always give country-Y points?
That clearly requires a couple of definitions:
● almost always → at least 80% of the time
● a minimum of 15 entries for country-Y (otherwise it's not really significant
… sorry Australia)
● in order to keep the complexity limited the splitting and renaming of
countries is not taken into account (but you could if you wanted to)
● only jury votes are considered
● …
© 2022 Neo4j, Inc. All rights reserved.
56
Let's up the ante
Does country-X almost always give country-Y points?
The approach then becomes:
● First you determine how many times a country competed.
● You keep that result with an intermediate projection (WITH) and filter out
based on the number of entries
● You then determine how many times the other countries voted for that
country
● Use another intermediate projection to filter based on the percentage
● Project the result ordered by relevance
© 2022 Neo4j, Inc. All rights reserved.
Let's up the ante
Does country-X almost always give country-Y points?
MATCH (target:Country)<-[r]-()
WHERE NOT type(r) IN ['SPLIT_INTO','WAS_RENAMED']
AND NOT type(r) CONTAINS 'PUBLIC'
WITH target, count(DISTINCT type(r)) AS totalentries
WHERE totalentries > 15
MATCH (target)<-[r]-(source:Country)
WHERE NOT type(r) IN ['SPLIT_INTO','WAS_RENAMED']
AND NOT type(r) CONTAINS 'PUBLIC'
WITH target, source, count(r) as votes, totalentries
WHERE votes > totalentries * 0.80
RETURN source.name AS `country-X`, target.name as `country-Y`, votes,
totalentries ORDER BY totalentries+votes DESC;
005
© 2022 Neo4j, Inc. All rights reserved.
58
Let's up the ante - Conclusions
Does country-X almost always give country-Y points?
● It does happen
● But it's not as common as some of the myths would have you believe.
© 2022 Neo4j, Inc. All rights reserved.
Biting of more than we can chew
Are there blocks of countries (cliques/cohorts … whatever you want to call
them) that keep votes amongst themselves?
This is much harder to determine
● It requires reciprocity (it's not good enough that X always votes for Y, it
has to go the other way too)
● It needs quite a few countries to collaborate before you see the impact.
● …
It is a long standing myth (?) that the Scandinavian countries do exactly this.
Let's find out …
© 2022 Neo4j, Inc. All rights reserved.
60
Biting of more than we can chew
You can do this with Cypher.
It would get pretty hairy though. If you however reduce the problem to it's
essence, what you want to do is find out if there are voting-communities
that persist over time …
I wonder if there are GDS algorithms that can determine communities …
© 2022 Neo4j, Inc. All rights reserved.
61
60+ Graph Data Science Techniques in Neo4j
Pathfinding &
Search
• Shortest Path
• Single-Source Shortest Path
• All Pairs Shortest Path
• A* Shortest Path
• Yen’s K Shortest Path
• Minimum Weight Spanning Tree
• K-Spanning Tree (MST)
• Random Walk
• Breadth & Depth First Search
Centrality &
Importance
• Degree Centrality
• Closeness Centrality
• Harmonic Centrality
• Betweenness Centrality & Approx.
• PageRank
• Personalized PageRank
• ArticleRank
• Eigenvector Centrality
• Hyperlink Induced Topic Search (HITS)
• Influence Maximization (Greedy, CELF)
Community
Detection
• Triangle Count
• Local Clustering Coefficient
• Connected Components (Union Find)
• Strongly Connected Components
• Label Propagation
• Louvain Modularity
• K-1 Coloring
• Modularity Optimization
• Speaker Listener Label Propagation
Supervised
Machine Learning
• Node Classification
• Link Prediction
… and more!
Heuristic Link
Prediction
• Adamic Adar
• Common Neighbors
• Preferential Attachment
• Resource Allocations
• Same Community
• Total Neighbors
Similarity
• Node Similarity
• K-Nearest Neighbors (KNN)
• Jaccard Similarity
• Cosine Similarity
• Pearson Similarity
• Euclidean Distance
• Approximate Nearest Neighbors (ANN)
Graph
Embeddings
• Node2Vec
• FastRP
• FastRPExtended
• GraphSAGE
• Synthetic Graph Generation
• Scale Properties
• Collapse Paths
• One Hot Encoding
• Split Relationships
• Graph Export
• Pregel API (write your own algos)
© 2022 Neo4j, Inc. All rights reserved.
62
Graph Data Science
Hands-on
… at last …
© 2022 Neo4j, Inc. All rights reserved.
SingFollow along
How this is going to work is that I am going to avoid flipping back and forth
between slides and executing syntax. Instead you are going to execute
syntax!
In the virtual environment https://summit.graphdatabase.ninja:7473/ ... execute
the following guide in the Neo4j Browser:
:play https://metis.graphdatabase.ninja/summit/gds.html
You will find the syntax labeled with numbers, exactly as on the slides. So do
follow along. And oh yes ... one last thing ...
There will be questions!
© 2022 Neo4j, Inc. All rights reserved.
64
Best practice
A typical run of a graph algorithm has the following steps:
1. Know your data. Run some statistics. This will help determine if the
results make sense. Run some estimates. Do you have enough memory?
2. Project the necessary data into the in-memory workspace.
3. Run the algorithm in estimate mode. Run it in stats mode. See 1. for the
reason.
4. Run the algorithm. Handle the results.
5. Remove the projection if it is no longer needed.
© 2022 Neo4j, Inc. All rights reserved.
Best practice
In this session we will focus on 2. and 4. (to save time and reduce complexity)
but please do not forget the other steps once you are doing this on your own.
© 2022 Neo4j, Inc. All rights reserved.
66
Using algorithms to answer questions
Who won in 1975?
This question is asking about the importance of countries in our voting graph.
That's a centrality problem and the best known algorithm for it is
pageranking so let's apply that!
© 2022 Neo4j, Inc. All rights reserved.
Using algorithms to answer questions
Project the relevant data into the in-memory workspace
CALL gds.graph.project("eurosong1975",
"Country",
"VOTE_1975_JURY",
{ relationshipProperties: "weight" }
) YIELD graphName, nodeCount, relationshipCount
RETURN graphName, nodeCount, relationshipCount;
001
Something is not quite right, check https://eurovisionworld.com/eurovision/1975
again, how many countries participated?
© 2022 Neo4j, Inc. All rights reserved.
68
Using algorithms to answer questions
Show an overview of the projections
CALL gds.graph.list();
002
Clean up the projection
CALL gds.graph.drop("eurosong1975");
003
© 2022 Neo4j, Inc. All rights reserved.
Using algorithms to answer questions
And try it in a different way …
CALL gds.graph.project.cypher("eurosong1975",
"MATCH (c:Country) WHERE EXISTS ((c)-[:VOTE_1975_JURY]-())
RETURN id(c) as id, labels(c) as labels",
"MATCH (s:Country)-[r:VOTE_1975_JURY]->(t:Country) RETURN
id(s) as source, id(t) as target, type(r) as type, r.weight
as weight"
) YIELD graphName, nodeCount, relationshipCount
RETURN graphName, nodeCount, relationshipCount;
004
© 2022 Neo4j, Inc. All rights reserved.
70
Using algorithms to answer questions
Native projection VERSUS Cypher projection
● Native projection is very efficient, scales to huge graphs
● Native projection requires that your original graph is completely tailored to
the problems
● Cypher projection is less efficient
● Cypher projection gives you full flexibility (you can even project things
that aren't there)
For our hands-on we'll go with Cypher projections, but do keep above in mind!
© 2022 Neo4j, Inc. All rights reserved.
Using algorithms to answer questions
Streaming the results for 1975
CALL gds.pageRank.stream("eurosong1975", {
maxIterations: 20,
dampingFactor: 0.85,
relationshipWeightProperty: "weight"
}) YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC, name ASC LIMIT 10;
005
Does anybody notice something strange about positions 7, 8 and 9?
© 2022 Neo4j, Inc. All rights reserved.
72
A bit of a rant
Why aren't Finland, Ireland and Sweden in the correct order? Is pageranking
giving us information that a plain score can not? Yes and no.
The way pageranking works is that incoming votes are only part of the story. A
vote gets more importance if it comes from a page that itself has a high score.
Ireland got votes from The Netherlands. The others did not.
The lesson here is that you
● Need to understand your data
● Need to understand the algorithms
© 2022 Neo4j, Inc. All rights reserved.
Let's up the ante
What were the voting communities in 1975?
CALL gds.louvain.stream("eurosong1975", {
relationshipWeightProperty: "weight"
}) YIELD nodeId, communityId
RETURN collect(gds.util.asNode(nodeId).name) AS members,
communityId
ORDER BY communityId DESC
006
Nice, but without looking over all the years there's no way to bust the Scandinavian
myth …
© 2022 Neo4j, Inc. All rights reserved.
74
Let's up the ante
Project the remaining years without televoting
UNWIND range(1976,2015,1) as year
CALL {
WITH year
CALL gds.graph.project.cypher("eurosong" + year,
"MATCH (c:Country) WHERE EXISTS ((c)-[:VOTE_" + year + "_JURY]-()) RETURN id(c)
as id, labels(c) as labels",
"MATCH (s:Country)-[r:VOTE_" + year + "_JURY]->(t:Country) RETURN id(s) as
source, id(t) as target, type(r) as type, r.weight as weight"
) YIELD graphName
RETURN graphName
}
RETURN year, graphName;
007
© 2022 Neo4j, Inc. All rights reserved.
Let's up the ante
Project the remaining years with televoting
UNWIND range(2016,2018,1) as year
CALL {
WITH year
CALL gds.graph.project.cypher("eurosong" + year,
"MATCH (c:Country) WHERE EXISTS ((c)-[:VOTE_" + year + "_JURY]-()) RETURN id(c)
as id, labels(c) as labels",
"MATCH (s:Country)-[r:VOTE_" + year + "_JURY|VOTE_" + year + "_PUBLIC]-
>(t:Country) RETURN id(s) as source, id(t) as target, type(r) as type, r.weight as
weight"
) YIELD graphName
RETURN graphName
}
RETURN year, graphName;
008
© 2022 Neo4j, Inc. All rights reserved.
76
Let's up the ante
Run Louvain in bulk and mutate the in-memory projection
UNWIND range(1975,2018,1) as year
CALL {
WITH year
CALL gds.louvain.mutate("eurosong" + year, {
relationshipWeightProperty: "weight",
mutateProperty: "louvain" + year
}) YIELD nodePropertiesWritten
RETURN nodePropertiesWritten
}
RETURN year, nodePropertiesWritten;
009
© 2022 Neo4j, Inc. All rights reserved.
Mutadis mutandis
There are three main modes (ignoring stats and estimate) to run an algorithm
stream - streams (duh) the results and is typically either used as a test run
(with visual inspection of the results) or when you want to use the results
outside of Neo4j (in a machine learning pipeline for example)
write - modifies the original graph, which can be very useful if you want to
combine analytics with real time use cases
mutate - modifies the in-memory projection, which is typically done when you
have a chain of algorithms where one has to feed into the next
© 2022 Neo4j, Inc. All rights reserved.
78
Let's up the ante talk about embeddings
Before we can finally confirm or debunk the Viking complot there's an image
we saw earlier that I'm betting none of you questioned …
Machine Learning Pipeline
How does that work? An ML pipeline eats features, not graphs. Enter
embeddings …
© 2022 Neo4j, Inc. All rights reserved.
Let's talk about embeddings
An embedding is a vector, a list of numbers, that represents a (part of the)
graph. In Neo4j there are currently of node-embeddings, a node and it's place
in the graph is represented as a list of numbers. Which an ML pipeline can
totally ingest!
There are currently three algorithms that can create node-embeddings
● Fast Random Projection
● GraphSAGE
● Node2Vec
© 2022 Neo4j, Inc. All rights reserved.
80
Let's talk about embeddings
So … you are going to ignore all three and create your own …
In the in-memory projections (one per year) the Country nodes now have an
additional property, louvainXXXX (with XXXX the year) that holds their
community.
I would argue that a node's community is a pretty good indication of the
structure around a node. Combining all of them (for all years) into one list
gives us … a pretty decent embedding (not to mention one that's human
interpretable). Let's do it!
© 2022 Neo4j, Inc. All rights reserved.
Let's up the ante
Create the embedding
UNWIND range(1975,2018,1) as year
CALL gds.graph.streamNodeProperty("eurosong" + year, "louvain" +
year) YIELD nodeId, propertyValue
WITH nodeId, propertyValue, year
WITH nodeId, toInteger(toString(year) + toString(propertyValue)) as
embeddingvalue
WITH nodeId, collect(embeddingvalue) as embedding
MATCH (c:Country) WHERE id(c) = nodeId
SET c.embedding = embedding;
010
© 2022 Neo4j, Inc. All rights reserved.
82
Let's up the ante
Verify the embedding
MATCH (c:Country)
RETURN c.name, c.embedding;
011
© 2022 Neo4j, Inc. All rights reserved.
Let's up the ante
Cleanup, the in-memory projections have served their purpose
UNWIND range(1975,2018,1) as year
CALL {
WITH year
CALL gds.graph.drop("eurosong" + year) YIELD graphName
RETURN graphName
}
RETURN "dropped " + graphName;
012
Just building up the suspense btw … I could totally have ignored this …
© 2022 Neo4j, Inc. All rights reserved.
84
Let's up the ante
Compare the embeddings and infer a SIMILAR relationship
MATCH (c1:Country),(c2:Country)
WHERE id(c1) > id(c2)
AND c1.embedding IS NOT NULL
AND c2.embedding IS NOT NULL
AND gds.similarity.jaccard(c1.embedding, c2.embedding) > 0.60
AND size(c1.embedding) > 1
AND size(c2.embedding) > 1
MERGE (c1)-[:SIMILAR {score: gds.similarity.jaccard(c1.embedding,
c2.embedding)}]->(c2);
013
© 2022 Neo4j, Inc. All rights reserved.
Confirmed or Debunked?
Check the results
MATCH p=(:Country)-[r:SIMILAR]->(:Country) RETURN p;
014
Yes, there is some collusion, but remember, you'd need quite the cluster to
actually influence the results significantly. And it would seem it's not the
Scandinavian countries that have that at the moment.
What's going on between San Marino and Georgia though?
© 2022 Neo4j, Inc. All rights reserved.
© 2022 Neo4j, Inc. All rights reserved.
86
Thank you!
Contact us at
sales@neo4j.com

Contenu connexe

Tendances

Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
Neo4j
 
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Amsterdam - The Neo4j Graph Data Platform Today & TomorrowAmsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Neo4j
 

Tendances (20)

Workshop - Build a Graph Solution
Workshop - Build a Graph SolutionWorkshop - Build a Graph Solution
Workshop - Build a Graph Solution
 
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
 
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data ScienceGet Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
 
Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...
Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...
Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...
 
Workshop Tel Aviv - Graph Data Science
Workshop Tel Aviv - Graph Data ScienceWorkshop Tel Aviv - Graph Data Science
Workshop Tel Aviv - Graph Data Science
 
Kerry Group: How Neo4j graph technology is delivering benefits to Kerry Group...
Kerry Group: How Neo4j graph technology is delivering benefits to Kerry Group...Kerry Group: How Neo4j graph technology is delivering benefits to Kerry Group...
Kerry Group: How Neo4j graph technology is delivering benefits to Kerry Group...
 
A Connections-first Approach to Supply Chain Optimization
A Connections-first Approach to Supply Chain OptimizationA Connections-first Approach to Supply Chain Optimization
A Connections-first Approach to Supply Chain Optimization
 
Optimizing Your Supply Chain with the Neo4j Graph
Optimizing Your Supply Chain with the Neo4j GraphOptimizing Your Supply Chain with the Neo4j Graph
Optimizing Your Supply Chain with the Neo4j Graph
 
Neo4j 4.1 overview
Neo4j 4.1 overviewNeo4j 4.1 overview
Neo4j 4.1 overview
 
Neo4j GraphSummit London - The Path To Success With Graph Database and Data S...
Neo4j GraphSummit London - The Path To Success With Graph Database and Data S...Neo4j GraphSummit London - The Path To Success With Graph Database and Data S...
Neo4j GraphSummit London - The Path To Success With Graph Database and Data S...
 
How Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and BeyondHow Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and Beyond
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases
 
The Knowledge Graph Explosion
The Knowledge Graph ExplosionThe Knowledge Graph Explosion
The Knowledge Graph Explosion
 
Pourquoi Leroy Merlin a besoin d'un Knowledge Graph ?
Pourquoi Leroy Merlin a besoin d'un Knowledge Graph ?Pourquoi Leroy Merlin a besoin d'un Knowledge Graph ?
Pourquoi Leroy Merlin a besoin d'un Knowledge Graph ?
 
Neo4j Graph Use Cases, Bruno Ungermann, Neo4j
Neo4j Graph Use Cases, Bruno Ungermann, Neo4jNeo4j Graph Use Cases, Bruno Ungermann, Neo4j
Neo4j Graph Use Cases, Bruno Ungermann, Neo4j
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
 
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Amsterdam - The Neo4j Graph Data Platform Today & TomorrowAmsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
 
Transforming BT’s Infrastructure Management with Graph Technology
Transforming BT’s Infrastructure Management with Graph TechnologyTransforming BT’s Infrastructure Management with Graph Technology
Transforming BT’s Infrastructure Management with Graph Technology
 
The Art of the Possible with Graph - Sudhir Hasbe - GraphSummit London 14 Nov...
The Art of the Possible with Graph - Sudhir Hasbe - GraphSummit London 14 Nov...The Art of the Possible with Graph - Sudhir Hasbe - GraphSummit London 14 Nov...
The Art of the Possible with Graph - Sudhir Hasbe - GraphSummit London 14 Nov...
 
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
 

Similaire à Workshop - Neo4j Graph Data Science

Graph Data Science with Neo4j: Nordics Webinar
Graph Data Science with Neo4j: Nordics WebinarGraph Data Science with Neo4j: Nordics Webinar
Graph Data Science with Neo4j: Nordics Webinar
Neo4j
 

Similaire à Workshop - Neo4j Graph Data Science (20)

Relationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningRelationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine Learning
 
Einstieg in Neo4j Graph Data Science
Einstieg in Neo4j Graph Data ScienceEinstieg in Neo4j Graph Data Science
Einstieg in Neo4j Graph Data Science
 
Are You Underestimating the Value Within Your Data? A conversation about grap...
Are You Underestimating the Value Within Your Data? A conversation about grap...Are You Underestimating the Value Within Your Data? A conversation about grap...
Are You Underestimating the Value Within Your Data? A conversation about grap...
 
GraphSummit Toronto: Keynote - Innovating with Graphs
GraphSummit Toronto: Keynote - Innovating with Graphs GraphSummit Toronto: Keynote - Innovating with Graphs
GraphSummit Toronto: Keynote - Innovating with Graphs
 
Graphs for Data Science and Machine Learning
Graphs for Data Science and Machine LearningGraphs for Data Science and Machine Learning
Graphs for Data Science and Machine Learning
 
Using Connected Data and Graph Technology to Enhance Machine Learning and Art...
Using Connected Data and Graph Technology to Enhance Machine Learning and Art...Using Connected Data and Graph Technology to Enhance Machine Learning and Art...
Using Connected Data and Graph Technology to Enhance Machine Learning and Art...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4jAI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
 
Graph Data Science: The Secret to Accelerating Innovation with AI/ML
Graph Data Science: The Secret to Accelerating Innovation with AI/MLGraph Data Science: The Secret to Accelerating Innovation with AI/ML
Graph Data Science: The Secret to Accelerating Innovation with AI/ML
 
GraphSummit Toronto: Leveraging Graphs for AI and ML
GraphSummit Toronto: Leveraging Graphs for AI and MLGraphSummit Toronto: Leveraging Graphs for AI and ML
GraphSummit Toronto: Leveraging Graphs for AI and ML
 
El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...
El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...
El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...
 
Introducción a Neo4j
Introducción a Neo4jIntroducción a Neo4j
Introducción a Neo4j
 
Graph Data Science with Neo4j: Nordics Webinar
Graph Data Science with Neo4j: Nordics WebinarGraph Data Science with Neo4j: Nordics Webinar
Graph Data Science with Neo4j: Nordics Webinar
 
Complex Networks: Science, Programming, and Databases
Complex Networks: Science, Programming, and DatabasesComplex Networks: Science, Programming, and Databases
Complex Networks: Science, Programming, and Databases
 
Government GraphSummit: Optimizing the Supply Chain
Government GraphSummit: Optimizing the Supply ChainGovernment GraphSummit: Optimizing the Supply Chain
Government GraphSummit: Optimizing the Supply Chain
 
Neo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time AnalyticsNeo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time Analytics
 
Knowledge Graphs and Generative AI
Knowledge Graphs and Generative AIKnowledge Graphs and Generative AI
Knowledge Graphs and Generative AI
 
La strada verso il successo con i database a grafo, la Graph Data Science e l...
La strada verso il successo con i database a grafo, la Graph Data Science e l...La strada verso il successo con i database a grafo, la Graph Data Science e l...
La strada verso il successo con i database a grafo, la Graph Data Science e l...
 
DataStax | Network Analysis Adventure with DSE Graph, DataStax Studio, and Ti...
DataStax | Network Analysis Adventure with DSE Graph, DataStax Studio, and Ti...DataStax | Network Analysis Adventure with DSE Graph, DataStax Studio, and Ti...
DataStax | Network Analysis Adventure with DSE Graph, DataStax Studio, and Ti...
 
Neo4j Graph Data Science - Webinar
Neo4j Graph Data Science - WebinarNeo4j Graph Data Science - Webinar
Neo4j Graph Data Science - Webinar
 

Plus de Neo4j

Plus de Neo4j (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with Graph
 
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
 

Dernier

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Dernier (20)

Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 

Workshop - Neo4j Graph Data Science

  • 1. © 2022 Neo4j, Inc. All rights reserved. © 2022 Neo4j, Inc. All rights reserved. 1 Workshop: Neo4j Graph Data Science
  • 2. Neo4j, Inc. All rights reserved 2022 Neo4j is a Native Graph Database 2
  • 3. Neo4j, Inc. All rights reserved 2022 Relational VS Graph models 3 Relational Model Graph Model KNOWS KNOWS KNOWS ANDREAS TOBIAS MICA DELIA Person Friend Person-Friend ANDREAS DELIA TOBIAS MICA
  • 4. Neo4j, Inc. All rights reserved 2022 Labeled property graph model components ● Nodes - Represent objects in the graph ● Relationships - Relate nodes by type and direction ● Properties - Name-value pairs that can go on nodes and relationships - Can have indexes and composite indexes (types: String, Number, Long, Date, Spatial, byte and arrays of those) ● Labels - Group nodes - Shape the domain 4 CAR DRIVES name: “Dan” born: May 29, 1970 twitter: “@dan” name: “Ann” born: Dec 5, 1975 since: Jan 10, 2011 brand: “Volvo” model: “V70” LOVES LIVES WITH O W N S PERSON PERSON LOVES
  • 5. © 2022 Neo4j, Inc. All rights reserved. 5 What is data science? “Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data.” - Wikipedia Domain Knowledge
  • 6. © 2022 Neo4j, Inc. All rights reserved. 6 What is Graph data science? Graph Data Science is a science- driven approach to gain knowledge from the relationships and structures in data, typically to power predictions. Graph data scientists use relationships to answer questions.
  • 7. © 2022 Neo4j, Inc. All rights reserved. 7 So, When Do I Need Graph Algorithms? Query (Cypher) Real-time, local decisioning and pattern matching Graph Algorithms Global analysis and iterations You know what you’re looking for and making a decision You’re learning the overall structure of a network, updating data, and predicting Local Patterns Global Computation
  • 8. © 2022 Neo4j, Inc. All rights reserved. 8 Graph Algorithm Categories Determines the importance of distinct nodes in the network Finds optimal paths or evaluates route availability and quality Detects group clustering or partition Evaluates how alike nodes are by neighbours and relationships Pathfinding & Search Centrality & Importance Community Detection Similarity Heuristic Link Prediction Estimates the likelihood of nodes forming a future relationship Node Embeddings & ML Compute low-dimensional vector representations of nodes in a graph, and allow you to train supervised machine learning models https://neo4j.com/docs/graph-data-science/current/
  • 9. © 2022 Neo4j, Inc. All rights reserved. 9 60+ Graph Data Science Techniques in Neo4j Pathfinding & Search • Shortest Path • Single-Source Shortest Path • All Pairs Shortest Path • A* Shortest Path • Yen’s K Shortest Path • Minimum Weight Spanning Tree • K-Spanning Tree (MST) • Random Walk • Breadth & Depth First Search Centrality & Importance • Degree Centrality • Closeness Centrality • Harmonic Centrality • Betweenness Centrality & Approx. • PageRank • Personalized PageRank • ArticleRank • Eigenvector Centrality • Hyperlink Induced Topic Search (HITS) • Influence Maximization (Greedy, CELF) Community Detection • Triangle Count • Local Clustering Coefficient • Connected Components (Union Find) • Strongly Connected Components • Label Propagation • Louvain Modularity • K-1 Coloring • Modularity Optimization • Speaker Listener Label Propagation Supervised Machine Learning • Node Classification • Link Prediction • Node Regression … and more! Heuristic Link Prediction • Adamic Adar • Common Neighbors • Preferential Attachment • Resource Allocations • Same Community • Total Neighbors Similarity • Node Similarity • K-Nearest Neighbors (KNN) • Jaccard Similarity • Cosine Similarity • Pearson Similarity • Euclidean Distance • Approximate Nearest Neighbors (ANN) Graph Embeddings • Node2Vec • FastRP • FastRPExtended • GraphSAGE • Synthetic Graph Generation • Scale Properties • Collapse Paths • One Hot Encoding • Split Relationships • Graph Export • Pregel API (write your own algos)
  • 10. © 2022 Neo4j, Inc. All rights reserved. 10 How can they be used? Stand Alone Solution Find significant patterns and optimal structures Use community detection and similarity scores for recommendations Machine Learning Pipeline Use the measures as features to train an ML model 1st node 2nd node Common neighbors Preferential attachment Label 1 2 4 15 1 3 4 7 12 1 5 6 1 1 0 10
  • 11. © 2022 Neo4j, Inc. All rights reserved. 11 Access & deploy GDS ● In addition to the Neo4j Browser, access to the GDS library can be done using the Neo4j Drivers
  • 12. © 2022 Neo4j, Inc. All rights reserved. 12 What, Where & Who?
  • 13. © 2022 Neo4j, Inc. All rights reserved. 13 Which of the colored nodes would be considered the most ‘important'?
  • 14. © 2022 Neo4j, Inc. All rights reserved. 14 Which of the colored nodes would be considered the most ‘important'? D has the highest valence This is the most connected individual in the network. If importance is how well you are personally known, you would pick D. G has the highest closeness centrality (0,52) Information will disperse through the network quicker through this individual. If you need to get a message out rapidly, you would choose G. I has the highest betweenness centrality (0,59) This element is an efficient connector to other elements. Risk of disruption is higher if you lose I.
  • 15. © 2022 Neo4j, Inc. All rights reserved. I'm in it for the money Who will most likely get the highest pay rise? 15 It's the bridging employees.
  • 16. © 2022 Neo4j, Inc. All rights reserved. 16 Where - Horizontal What are the Graph Data Science sweet spots? Fraud Detection Disambiguation & Segmentation Personalized Recommendations Churn Prediction Search & Master Data Mgmt. Predictive Maintenance Cybersecurity
  • 17. © 2022 Neo4j, Inc. All rights reserved. Where - Finance ● Synthetic identity fraud ● Fraud rings ● Money laundering ● Recommendations ● Customer segmentation ● Churn prediction ● ...
  • 18. © 2022 Neo4j, Inc. All rights reserved. 18 Where - Healthcare ● Drug repurposing ● Patient journey ● Contact tracing ● Regulatory compliance ● ...
  • 19. © 2022 Neo4j, Inc. All rights reserved. Where - Retail ● Logistics & Routing ● Supply chain ● Recommendations ● Customer segmentation ● ...
  • 20. © 2022 Neo4j, Inc. All rights reserved. 20 Who - References • Mostly anonymous users across devices and sites with ever changing cookies • 4.4 TB: +14 Bn nodes +20Bn relationships • +160 Mn rich, unique profiles created • 612% Increase in visits per profile • Almost 70% of Credit Card fraud was missed • Synthetic Identities were biggest challenge • +1B Nodes and +1B Relationships to analyse • Graph analytics with queries & algorithms help find $10’s of millions of fraud in 1st year Meredith Marketing to the Anonymous Financial Fraud Detection & Recovery Top 10 Bank • Early intervention project with 3 yrs of visits, tests & diagnosis with 10’s of Bn of records • Finding similarities in patient journeys • Graph algorithms for identifying communities & best intervention points AstraZeneca Patient Journeys
  • 21. © 2022 Neo4j, Inc. All rights reserved. 21 Interacting moving parts
  • 22. © 2022 Neo4j, Inc. All rights reserved. Describing the problem 22 Graph theory has been around for a while. So have a lot of the graph algorithms. What you'll find is that the majority of them only works on a very specific shape of graph ...
  • 23. © 2022 Neo4j, Inc. All rights reserved. Neo4j, Inc. All rights reserved 2021 23 Multipartite • Multiple Node types • Multiple Relationship types • Most common graph • (what we’ve seen so far) Mercha nt Transaction Bank N E X T Client Phone Email NI Numb er T O PERFORMED FIRST_TX LAST_T X TO T O H A S _ P H O N E H A S _ E M A I L HA S_ NI _N UM BE R
  • 24. © 2022 Neo4j, Inc. All rights reserved. Neo4j, Inc. All rights reserved 2021 24 Bipartite • Contains nodes that can be divided into two sets ◦ Such that relationships only exist between sets but not within each set. • Node similarity relies on this type of graph Client Phone Email NI Number H A S _ P H O N E H A S _ E M A I L H A S _ N I_ N U M B E R
  • 25. © 2022 Neo4j, Inc. All rights reserved. Neo4j, Inc. All rights reserved 2021 25 Monopartite • Contains one node label and relationship • Most Graph Data Science algorithms rely on this type of graph Client T R A N S F E R _ T O
  • 26. © 2022 Neo4j, Inc. All rights reserved. 26 Why can’t I run my algorithm on a multipartite graph? What if I try to run an algorithm on this graph? • How many relationships does each person have? • How many relationships does each book have? • What is the direction of the relationships in this graph? • Can I reach a person node from another person node, following the directed relationships? 1 or 2 5 or 6 Person-[:APPEARED_IN]->Book No!
  • 27. © 2022 Neo4j, Inc. All rights reserved. 27 Why can’t I run my algorithm on a multipartite graph? What if I try to run an algorithm on this graph? • What would an algorithm that used the number of edges each node has to calculate centrality conclude? • What would an algorithm that followed directed relationships to find communities conclude? Books are more important than people There are seven communities?
  • 28. © 2022 Neo4j, Inc. All rights reserved. 28 Why can’t I run my algorithm on a multipartite graph? If you want to find out: • What person is the most important • How many communities of people are there, across all the books? You need to reshape your graph!
  • 29. © 2022 Neo4j, Inc. All rights reserved. 29 Graph Catalog Procedures (part of the GDS library) that let you reshape and subset your transactional graph so you have the right data in the right shape to run analytical algorithms. Mutable in-memory Workspace
  • 30. © 2022 Neo4j, Inc. All rights reserved. Graph Algorithms 30
  • 31. © 2022 Neo4j, Inc. All rights reserved. 31 Creating the graph projection Projection will be loaded it into memory CALL gds.graph.create('GraphProjection', 'Character',{ INTERACTS_WITH:{ type: 'INTERACTS_WITH', properties: {count: {property: 'count'}} } }) YIELD graphName,nodeCount,relationshipCount,createMillis; This is a Native Projection. Very efficient but the graph must exist with the same structure in the database!
  • 32. © 2022 Neo4j, Inc. All rights reserved. 32 Calling an Algorithm Procedure Good news! All algorithms in GDS follow the same syntax: CALL gds[.<tier>].<algorithm>.<execution-mode>[.<estimate>]( graphName: STRING, configuration: MAP )
  • 33. © 2022 Neo4j, Inc. All rights reserved. 33 Tiers of Support Product supported: Supported by product engineering, tested for stability, scale, fully optimized CALL gds.<algorithm>.<execution-mode>[.<estimate>] Beta: Candidate for product supported tier CALL gds.beta.<algorithm>.<execution-mode>[.<estimate>] Alpha: Experimental implementation, may be changed in future. CALL gds.alpha.<algorithm>.<execution-mode>[.<estimate>] CALL gds[.<tier>].<algorithm>.<execution-mode>[.<estimate>]( graphName: STRING, configuration: MAP )
  • 34. © 2022 Neo4j, Inc. All rights reserved. 34 Execution Modes Stream: Stream your results back as Cypher result rows. Generally node id(s) and scores. CALL gds[.<tier>].<algorithm>.stream[.<estimate>] Write: Write your results back to Neo4j as node or relationship properties, or new relationships. Must specify writeProperty CALL gds[.<tier>].<algorithm>.write[.<estimate>] Mutate: update the in-memory graph with the results of the algorithm CALL gds[.<tier>].<algorithm>.mutate[.<estimate>] Stats: Returns statistics about the algorithm output - percentiles, counts CALL gds[.<tier>].<algorithm>.stats[.<estimate>] CALL gds[.<tier>].<algorithm>.<execution-mode>[.<estimate>]( graphName: STRING, configuration: MAP )
  • 35. © 2022 Neo4j, Inc. All rights reserved. 35 Estimation Estimate lets you estimate the memory requirements for running your algorithm with the specified configuration -- just like .estimate with graph catalog operations. CALL gds.<algorithm>.<execution-mode>.estimate Note: Only production quality algorithms support .stats and .estimate CALL gds[.<tier>].<algorithm>.<execution-mode>[.<estimate>]( graphName: STRING, configuration: MAP )
  • 36. © 2022 Neo4j, Inc. All rights reserved. 36 Common Configuration Parameters CALL gds[.<tier>].<algorithm>.<execution-mode>[.<estimate>]( graphName: STRING, configuration: MAP ) Key Meaning Default concurrency How many concurrent threads can be used when executing the algo? 4 readConcurrency How many concurrent threads can be used when reading data? concurrency writeConcurrency How many concurrent threads can be used when writing results? concurrency relationshipWeightProperty Property containing the weight (must be numeric) null writeProperty Property name to write back to n/a
  • 37. © 2022 Neo4j, Inc. All rights reserved. Graph Embeddings and Graph Native ML 37
  • 38. © 2022 Neo4j, Inc. All rights reserved. Node Embedding What are node embeddings? How? The representation of nodes as low-dimensional vectors that summarize their graph position, the structure of their local graph neighborhood as well as any possible node features Encoder - Decoder Framework
  • 39. © 2022 Neo4j, Inc. All rights reserved. Node Embedding
  • 40. © 2022 Neo4j, Inc. All rights reserved. Node Embedding Encode nodes such that similarity in the embedding space, i.e. cosine similarity, approximates similarity in the graph
  • 41. © 2022 Neo4j, Inc. All rights reserved. Graph Embeddings in Neo4j Node2Vec Random walk based embedding that can encode structural similarity or topological proximity. Easy to understand, interpretable parameters, plenty of examples GraphSAGE Inductive embedding that encodes properties of neighboring nodes when learning topology. Generalizes to unseen graphs, first method to incorporate properties FastRP A super fast linear algebra based approach to embeddings that can encode topology or properties. 75,000x faster than Node2Vec extended to encode properties
  • 42. © 2022 Neo4j, Inc. All rights reserved. 42 GraphSAGE (SAmpling and AggreGatE) A A 010...01001l..001 1 ..n 1001l..001….. 010...n ...01001l..001.. .n ...01001l..001.. .n A SAMPLE AGGREGATE PREDICT ● Assumes that nodes in the same neighborhood should have similar representations ● Uses node properties in addition to relationships ● Inductive approach that learns a function to calculate an embedding
  • 43. © 2022 Neo4j, Inc. All rights reserved. 43 Some final thoughts ...
  • 44. © 2022 Neo4j, Inc. All rights reserved. Data Science is COMPLICATED 44 Dozens of libraries, hundreds of algos & no docs! How do we shape data into a graph in the first place? We’ve picked a library...good luck learning the syntax What? We have to build the entire ETL pipeline for this? Are the results right? How do we get into production? Data Modeling Which Algorithms? Learn Syntax Reshape Data What Now?
  • 45. © 2022 Neo4j, Inc. All rights reserved. 45 SIMPLIFY your experience Dozens of libraries, hundreds of algos & no docs! We’ve picked a library...good luck learning the syntax What? We have to build the entire ETL pipeline for this? Are the results right? How do we get into production? Data Modeling Which Algorithms? Learn Syntax Reshape Data What Now? We have validated algos, clear docs, & tutorials Neo4j syntax is standardized and simplified Seamlessly reshape data with 1 command Simply write results to Neo4j & move to production With Neo4j it’s already a graph
  • 46. © 2022 Neo4j, Inc. All rights reserved. 46 Eurovision Song Contest
  • 47. © 2022 Neo4j, Inc. All rights reserved. Why that dataset? ● Relatively easy to find ● The domain is generally understood ● The results of our queries and algorithms can be verified ● There are a lot of myths to debunk / confirm … almost everybody in Europe has at least one of them in their heads.
  • 48. © 2022 Neo4j, Inc. All rights reserved. 48 Model That's a monopartite that is!
  • 49. © 2022 Neo4j, Inc. All rights reserved. Couple of points ● This is an instance model rather than a classical database model. As we don't have a schema to generate, we can just as well show some sample data. ● You could argue that the year should also be a property of the relationship, rather than part of the type. However, most of the analysis we'll do today will be year-based. ● The dataset contains data from 1975 to 2018. That data was the easiest to normalize (the voting system has changed a lot over the years) and stays clear of recent controversy. Feel free to go 1956 to 2022 afterwards though, it's really fun.
  • 50. © 2022 Neo4j, Inc. All rights reserved. 50 Cypher Hands-on
  • 51. © 2022 Neo4j, Inc. All rights reserved. SingFollow along How this is going to work is that I am going to avoid flipping back and forth between slides and executing syntax. Instead you are going to execute syntax! In the virtual environment https://milano-summit.graphdatabase.ninja:7473/ ... execute the following guide in the Neo4j Browser: :play https://metis.graphdatabase.ninja/summit/cypher.html :play https://metis.graphdatabase.ninja/summit/gds.html You will find the syntax labeled with numbers, exactly as on the slides. So do follow along!
  • 52. © 2022 Neo4j, Inc. All rights reserved. 52 Taking it from the top In Cypher you MATCH a pattern and then RETURN a result MATCH (c:Country {name: "Finland"}) RETURN c; 001 Filtering is done with WHERE (this statement does exactly the same) MATCH (c:Country) WHERE c.name = "Finland" RETURN c; 002
  • 53. © 2022 Neo4j, Inc. All rights reserved. Using patterns to answer questions Who won in 1975? MATCH (c:Country)<-[vote:VOTE_1975_JURY|VOTE_1975_PUBLIC]-() RETURN c.name, sum(vote.weight) as score ORDER BY score DESC LIMIT 10; 003 ● The Netherlands (with Ding-a-Dong) did and you can check at https://eurovisionworld.com/eurovision/1975, the data is correct. ● Please take a moment to note down the positions of Finland, Sweden and Ireland (7, 8, 9), this is going to be useful in a bit.
  • 54. © 2022 Neo4j, Inc. All rights reserved. 54 One more of those Who won in 2006? MATCH (c:Country)<-[vote:VOTE_2006_JURY|VOTE_2006_PUBLIC]-() RETURN c.name, sum(vote.weight) as score ORDER BY score DESC LIMIT 10; 004 Finland (Hard Rock Hallelujah) did (https://eurovisionworld.com/eurovision/2006) … just in case you wondered what the music was about.
  • 55. © 2022 Neo4j, Inc. All rights reserved. Let's up the ante Does country-X almost always give country-Y points? That clearly requires a couple of definitions: ● almost always → at least 80% of the time ● a minimum of 15 entries for country-Y (otherwise it's not really significant … sorry Australia) ● in order to keep the complexity limited the splitting and renaming of countries is not taken into account (but you could if you wanted to) ● only jury votes are considered ● …
  • 56. © 2022 Neo4j, Inc. All rights reserved. 56 Let's up the ante Does country-X almost always give country-Y points? The approach then becomes: ● First you determine how many times a country competed. ● You keep that result with an intermediate projection (WITH) and filter out based on the number of entries ● You then determine how many times the other countries voted for that country ● Use another intermediate projection to filter based on the percentage ● Project the result ordered by relevance
  • 57. © 2022 Neo4j, Inc. All rights reserved. Let's up the ante Does country-X almost always give country-Y points? MATCH (target:Country)<-[r]-() WHERE NOT type(r) IN ['SPLIT_INTO','WAS_RENAMED'] AND NOT type(r) CONTAINS 'PUBLIC' WITH target, count(DISTINCT type(r)) AS totalentries WHERE totalentries > 15 MATCH (target)<-[r]-(source:Country) WHERE NOT type(r) IN ['SPLIT_INTO','WAS_RENAMED'] AND NOT type(r) CONTAINS 'PUBLIC' WITH target, source, count(r) as votes, totalentries WHERE votes > totalentries * 0.80 RETURN source.name AS `country-X`, target.name as `country-Y`, votes, totalentries ORDER BY totalentries+votes DESC; 005
  • 58. © 2022 Neo4j, Inc. All rights reserved. 58 Let's up the ante - Conclusions Does country-X almost always give country-Y points? ● It does happen ● But it's not as common as some of the myths would have you believe.
  • 59. © 2022 Neo4j, Inc. All rights reserved. Biting of more than we can chew Are there blocks of countries (cliques/cohorts … whatever you want to call them) that keep votes amongst themselves? This is much harder to determine ● It requires reciprocity (it's not good enough that X always votes for Y, it has to go the other way too) ● It needs quite a few countries to collaborate before you see the impact. ● … It is a long standing myth (?) that the Scandinavian countries do exactly this. Let's find out …
  • 60. © 2022 Neo4j, Inc. All rights reserved. 60 Biting of more than we can chew You can do this with Cypher. It would get pretty hairy though. If you however reduce the problem to it's essence, what you want to do is find out if there are voting-communities that persist over time … I wonder if there are GDS algorithms that can determine communities …
  • 61. © 2022 Neo4j, Inc. All rights reserved. 61 60+ Graph Data Science Techniques in Neo4j Pathfinding & Search • Shortest Path • Single-Source Shortest Path • All Pairs Shortest Path • A* Shortest Path • Yen’s K Shortest Path • Minimum Weight Spanning Tree • K-Spanning Tree (MST) • Random Walk • Breadth & Depth First Search Centrality & Importance • Degree Centrality • Closeness Centrality • Harmonic Centrality • Betweenness Centrality & Approx. • PageRank • Personalized PageRank • ArticleRank • Eigenvector Centrality • Hyperlink Induced Topic Search (HITS) • Influence Maximization (Greedy, CELF) Community Detection • Triangle Count • Local Clustering Coefficient • Connected Components (Union Find) • Strongly Connected Components • Label Propagation • Louvain Modularity • K-1 Coloring • Modularity Optimization • Speaker Listener Label Propagation Supervised Machine Learning • Node Classification • Link Prediction … and more! Heuristic Link Prediction • Adamic Adar • Common Neighbors • Preferential Attachment • Resource Allocations • Same Community • Total Neighbors Similarity • Node Similarity • K-Nearest Neighbors (KNN) • Jaccard Similarity • Cosine Similarity • Pearson Similarity • Euclidean Distance • Approximate Nearest Neighbors (ANN) Graph Embeddings • Node2Vec • FastRP • FastRPExtended • GraphSAGE • Synthetic Graph Generation • Scale Properties • Collapse Paths • One Hot Encoding • Split Relationships • Graph Export • Pregel API (write your own algos)
  • 62. © 2022 Neo4j, Inc. All rights reserved. 62 Graph Data Science Hands-on … at last …
  • 63. © 2022 Neo4j, Inc. All rights reserved. SingFollow along How this is going to work is that I am going to avoid flipping back and forth between slides and executing syntax. Instead you are going to execute syntax! In the virtual environment https://summit.graphdatabase.ninja:7473/ ... execute the following guide in the Neo4j Browser: :play https://metis.graphdatabase.ninja/summit/gds.html You will find the syntax labeled with numbers, exactly as on the slides. So do follow along. And oh yes ... one last thing ... There will be questions!
  • 64. © 2022 Neo4j, Inc. All rights reserved. 64 Best practice A typical run of a graph algorithm has the following steps: 1. Know your data. Run some statistics. This will help determine if the results make sense. Run some estimates. Do you have enough memory? 2. Project the necessary data into the in-memory workspace. 3. Run the algorithm in estimate mode. Run it in stats mode. See 1. for the reason. 4. Run the algorithm. Handle the results. 5. Remove the projection if it is no longer needed.
  • 65. © 2022 Neo4j, Inc. All rights reserved. Best practice In this session we will focus on 2. and 4. (to save time and reduce complexity) but please do not forget the other steps once you are doing this on your own.
  • 66. © 2022 Neo4j, Inc. All rights reserved. 66 Using algorithms to answer questions Who won in 1975? This question is asking about the importance of countries in our voting graph. That's a centrality problem and the best known algorithm for it is pageranking so let's apply that!
  • 67. © 2022 Neo4j, Inc. All rights reserved. Using algorithms to answer questions Project the relevant data into the in-memory workspace CALL gds.graph.project("eurosong1975", "Country", "VOTE_1975_JURY", { relationshipProperties: "weight" } ) YIELD graphName, nodeCount, relationshipCount RETURN graphName, nodeCount, relationshipCount; 001 Something is not quite right, check https://eurovisionworld.com/eurovision/1975 again, how many countries participated?
  • 68. © 2022 Neo4j, Inc. All rights reserved. 68 Using algorithms to answer questions Show an overview of the projections CALL gds.graph.list(); 002 Clean up the projection CALL gds.graph.drop("eurosong1975"); 003
  • 69. © 2022 Neo4j, Inc. All rights reserved. Using algorithms to answer questions And try it in a different way … CALL gds.graph.project.cypher("eurosong1975", "MATCH (c:Country) WHERE EXISTS ((c)-[:VOTE_1975_JURY]-()) RETURN id(c) as id, labels(c) as labels", "MATCH (s:Country)-[r:VOTE_1975_JURY]->(t:Country) RETURN id(s) as source, id(t) as target, type(r) as type, r.weight as weight" ) YIELD graphName, nodeCount, relationshipCount RETURN graphName, nodeCount, relationshipCount; 004
  • 70. © 2022 Neo4j, Inc. All rights reserved. 70 Using algorithms to answer questions Native projection VERSUS Cypher projection ● Native projection is very efficient, scales to huge graphs ● Native projection requires that your original graph is completely tailored to the problems ● Cypher projection is less efficient ● Cypher projection gives you full flexibility (you can even project things that aren't there) For our hands-on we'll go with Cypher projections, but do keep above in mind!
  • 71. © 2022 Neo4j, Inc. All rights reserved. Using algorithms to answer questions Streaming the results for 1975 CALL gds.pageRank.stream("eurosong1975", { maxIterations: 20, dampingFactor: 0.85, relationshipWeightProperty: "weight" }) YIELD nodeId, score RETURN gds.util.asNode(nodeId).name AS name, score ORDER BY score DESC, name ASC LIMIT 10; 005 Does anybody notice something strange about positions 7, 8 and 9?
  • 72. © 2022 Neo4j, Inc. All rights reserved. 72 A bit of a rant Why aren't Finland, Ireland and Sweden in the correct order? Is pageranking giving us information that a plain score can not? Yes and no. The way pageranking works is that incoming votes are only part of the story. A vote gets more importance if it comes from a page that itself has a high score. Ireland got votes from The Netherlands. The others did not. The lesson here is that you ● Need to understand your data ● Need to understand the algorithms
  • 73. © 2022 Neo4j, Inc. All rights reserved. Let's up the ante What were the voting communities in 1975? CALL gds.louvain.stream("eurosong1975", { relationshipWeightProperty: "weight" }) YIELD nodeId, communityId RETURN collect(gds.util.asNode(nodeId).name) AS members, communityId ORDER BY communityId DESC 006 Nice, but without looking over all the years there's no way to bust the Scandinavian myth …
  • 74. © 2022 Neo4j, Inc. All rights reserved. 74 Let's up the ante Project the remaining years without televoting UNWIND range(1976,2015,1) as year CALL { WITH year CALL gds.graph.project.cypher("eurosong" + year, "MATCH (c:Country) WHERE EXISTS ((c)-[:VOTE_" + year + "_JURY]-()) RETURN id(c) as id, labels(c) as labels", "MATCH (s:Country)-[r:VOTE_" + year + "_JURY]->(t:Country) RETURN id(s) as source, id(t) as target, type(r) as type, r.weight as weight" ) YIELD graphName RETURN graphName } RETURN year, graphName; 007
  • 75. © 2022 Neo4j, Inc. All rights reserved. Let's up the ante Project the remaining years with televoting UNWIND range(2016,2018,1) as year CALL { WITH year CALL gds.graph.project.cypher("eurosong" + year, "MATCH (c:Country) WHERE EXISTS ((c)-[:VOTE_" + year + "_JURY]-()) RETURN id(c) as id, labels(c) as labels", "MATCH (s:Country)-[r:VOTE_" + year + "_JURY|VOTE_" + year + "_PUBLIC]- >(t:Country) RETURN id(s) as source, id(t) as target, type(r) as type, r.weight as weight" ) YIELD graphName RETURN graphName } RETURN year, graphName; 008
  • 76. © 2022 Neo4j, Inc. All rights reserved. 76 Let's up the ante Run Louvain in bulk and mutate the in-memory projection UNWIND range(1975,2018,1) as year CALL { WITH year CALL gds.louvain.mutate("eurosong" + year, { relationshipWeightProperty: "weight", mutateProperty: "louvain" + year }) YIELD nodePropertiesWritten RETURN nodePropertiesWritten } RETURN year, nodePropertiesWritten; 009
  • 77. © 2022 Neo4j, Inc. All rights reserved. Mutadis mutandis There are three main modes (ignoring stats and estimate) to run an algorithm stream - streams (duh) the results and is typically either used as a test run (with visual inspection of the results) or when you want to use the results outside of Neo4j (in a machine learning pipeline for example) write - modifies the original graph, which can be very useful if you want to combine analytics with real time use cases mutate - modifies the in-memory projection, which is typically done when you have a chain of algorithms where one has to feed into the next
  • 78. © 2022 Neo4j, Inc. All rights reserved. 78 Let's up the ante talk about embeddings Before we can finally confirm or debunk the Viking complot there's an image we saw earlier that I'm betting none of you questioned … Machine Learning Pipeline How does that work? An ML pipeline eats features, not graphs. Enter embeddings …
  • 79. © 2022 Neo4j, Inc. All rights reserved. Let's talk about embeddings An embedding is a vector, a list of numbers, that represents a (part of the) graph. In Neo4j there are currently of node-embeddings, a node and it's place in the graph is represented as a list of numbers. Which an ML pipeline can totally ingest! There are currently three algorithms that can create node-embeddings ● Fast Random Projection ● GraphSAGE ● Node2Vec
  • 80. © 2022 Neo4j, Inc. All rights reserved. 80 Let's talk about embeddings So … you are going to ignore all three and create your own … In the in-memory projections (one per year) the Country nodes now have an additional property, louvainXXXX (with XXXX the year) that holds their community. I would argue that a node's community is a pretty good indication of the structure around a node. Combining all of them (for all years) into one list gives us … a pretty decent embedding (not to mention one that's human interpretable). Let's do it!
  • 81. © 2022 Neo4j, Inc. All rights reserved. Let's up the ante Create the embedding UNWIND range(1975,2018,1) as year CALL gds.graph.streamNodeProperty("eurosong" + year, "louvain" + year) YIELD nodeId, propertyValue WITH nodeId, propertyValue, year WITH nodeId, toInteger(toString(year) + toString(propertyValue)) as embeddingvalue WITH nodeId, collect(embeddingvalue) as embedding MATCH (c:Country) WHERE id(c) = nodeId SET c.embedding = embedding; 010
  • 82. © 2022 Neo4j, Inc. All rights reserved. 82 Let's up the ante Verify the embedding MATCH (c:Country) RETURN c.name, c.embedding; 011
  • 83. © 2022 Neo4j, Inc. All rights reserved. Let's up the ante Cleanup, the in-memory projections have served their purpose UNWIND range(1975,2018,1) as year CALL { WITH year CALL gds.graph.drop("eurosong" + year) YIELD graphName RETURN graphName } RETURN "dropped " + graphName; 012 Just building up the suspense btw … I could totally have ignored this …
  • 84. © 2022 Neo4j, Inc. All rights reserved. 84 Let's up the ante Compare the embeddings and infer a SIMILAR relationship MATCH (c1:Country),(c2:Country) WHERE id(c1) > id(c2) AND c1.embedding IS NOT NULL AND c2.embedding IS NOT NULL AND gds.similarity.jaccard(c1.embedding, c2.embedding) > 0.60 AND size(c1.embedding) > 1 AND size(c2.embedding) > 1 MERGE (c1)-[:SIMILAR {score: gds.similarity.jaccard(c1.embedding, c2.embedding)}]->(c2); 013
  • 85. © 2022 Neo4j, Inc. All rights reserved. Confirmed or Debunked? Check the results MATCH p=(:Country)-[r:SIMILAR]->(:Country) RETURN p; 014 Yes, there is some collusion, but remember, you'd need quite the cluster to actually influence the results significantly. And it would seem it's not the Scandinavian countries that have that at the moment. What's going on between San Marino and Georgia though?
  • 86. © 2022 Neo4j, Inc. All rights reserved. © 2022 Neo4j, Inc. All rights reserved. 86 Thank you! Contact us at sales@neo4j.com