Relationships are highly predictive of behavior, yet most data science models overlook this information because it's difficult to extract network structure for use in machine learning (ML).
With graphs, relationships are embedded in the data itself, making it practical to add these predictive capabilities to your existing practices.
That’s why we’re presenting and demoing the use of graph-native ML to make breakthrough predictions. This will cover:
- Different approaches to graph feature engineering, from queries and algorithms to embeddings
- How ML techniques leverage everything from classical network science to deep learning and graph convolutional neural networks
- How to generate representations of your graph using graph embeddings, create ML models for link prediction or node classification, and apply these models to add missing information to an existing graph/incoming data
- Why no-code visualization and prototyping is important
why an Opensea Clone Script might be your perfect match.pdf
Relationships Matter: Using Connected Data for Better Machine Learning
1. Neo4j, Inc. All rights reserved 2021
Neo4j, Inc. All rights reserved 2021
1
Relationships Matter:
Using Connected Data for Better Machine Learning
Alicia Frame, PhD
Director of Product Management, Neo4j
Stuart Laurie
Senior Solutions Architect, Neo4j
2. Neo4j, Inc. All rights reserved 2021
20 of the top 25 financial firms
7 of the top 10 retailers
7 of the top 10 software vendors
Neo4j: The Graph Company
Neo4j is the creator of:
• The world’s leading graph database
• The first graph data science platform
• The most flexible graph data model
• The easiest-to-use graph query language
Thousands of Organizations Use Neo4j
2
Silicon Valley
London
Munich
Paris
Malmö
3. Neo4j, Inc. All rights reserved 2021
3
Node
Represents an entity in the graph
Relationship
Connect nodes to each other
Property
Describes a node or relationship:
e.g. name, age, weight etc
Wait, what’s a graph?
MICA
ANDRE
Name: “Andre”
Born: May 29, 1970
Twitter: “@dan”
Name: “Mica”
Born: Dec 5, 1975
CAR
Brand “Volvo”
Model: “V70”
Since:
Jan 10, 2011
LOVES
LOVES
LOVES
LIVES WITH
O
W
N
S
D
R
I
V
E
S
4. Neo4j, Inc. All rights reserved 2021
Networks of People Transaction Networks
Bought
B
ou
gh
t
V
i
e
w
e
d
R
e
t
u
r
n
e
d
Bought
Knowledge Networks
Pl
ay
s
Lives_in
In_sport
Likes
F
a
n
_
o
f
Plays_for
Risk management,
Supply chain, Orders,
Payments, etc.
Employees, Customers,
Suppliers, Partners,
Influencers, etc.
Enterprise content,
Domain specific content,
eCommerce content, etc
K
n
o
w
s
Knows
Knows
K
n
o
w
s
4
Everything is Naturally Connected
5. Neo4j, Inc. All rights reserved 2021
Neo4j, Inc. All rights reserved 2021
5
Relationships
are the strongest
predictors of behavior
But You Can’t Analyse
What You Can’t See
● Most data science techniques
ignore relationships
● It’s painful to manually engineer
connected features from tabular
data
● Graphs are built on
relationships, so…
● You don’t have to guess at the
correlations: with graphs,
relationships are built in
James Fowler
6. Neo4j, Inc. All rights reserved 2021
6
6 Top 10 Tech Trends in Data and Analytics, 16 Feb 2021
According to Garner, “Graphs form
the foundation of modern D&A,
with capabilities to enhance and
improve user collaboration, ML models
and explainable AI.
The recent Gartner AI in Organizations
Survey demonstrates that graph
techniques are increasingly
prevalent as AI maturity grows,
going from 13% adoption when AI
maturity is lowest to 48% when
maturity is highest.”
AI Research Papers
Featuring Graph
Source: Dimensions Knowledge System
4x
Increase in
traffic to
Neo4j GDS
page in
2H-2020
Analytics & Data Science Interest
Exploding in Neo4j Community
+4.8m
Views on
the graph
algorithms
short video
+193k
downloads
7. Neo4j, Inc. All rights reserved 2021
7
Queries
Find the patterns you know exist.
Machine Learning
Uncover trends and make
predictions
Visualization
Explore, collaborate, and explain
Graphs & Data Science
Analytics
Feature
Engineering
Data
Exploration
Graph
Data
Science
Queries
Machine Learning Visualization
8. Neo4j, Inc. All rights reserved 2021
8
Graphs & Data Science
Knowledge Graphs
Graph Algorithms
Graph Native
Machine Learning
Find the patterns you’re
looking for in connected data
Use unsupervised machine
learning techniques to
identify associations,
anomalies, and trends.
Use embeddings to learn the
features in your graph that
you don’t even know are
important yet.
Train in-graph supervise ML
models to predict links,
labels, and missing data.
9. Neo4j, Inc. All rights reserved 2021
Neo4j’s Graph Data Science Framework
Neo4j Graph Data
Science Library
Neo4j
Database
Neo4j
Bloom
Scalable Graph Algorithms &
Analytics Workspace
Native Graph Creation &
Persistence
Visual Graph
Exploration & Prototyping
10. Neo4j, Inc. All rights reserved 2021
Robust Graph Algorithms & ML methods
● Compute metrics about the topology and connectivity
● Build predictive models to enhance your graph
● Highly parallelized and scale to 10’s of billions of nodes
10
The Neo4j GDS Library
Mutable In-Memory
Workspace
Computational Graph
Native Graph Store
Efficient & Flexible Analytics Workspace
● Automatically reshapes transactional graphs into
an in-memory analytics graph
● Optimized for global traversals and aggregation
● Create workflows and layer algorithms
● Store and manage predictive models in the
model catalog
11. Neo4j, Inc. All rights reserved 2021
11
55+ Graph Data Science Techniques in Neo4j
Pathfinding &
Search
• Shortest Path
• Single-Source Shortest Path
• All Pairs Shortest Path
• A* Shortest Path
• Yen’s K Shortest Path
• Minimum Weight Spanning Tree
• K-Spanning Tree (MST)
• Random Walk
• Breadth & Depth First Search
Centrality &
Importance
• Degree Centrality
• Closeness Centrality
• Harmonic Centrality
• Betweenness Centrality & Approx.
• PageRank
• Personalized PageRank
• ArticleRank
• Eigenvector Centrality
• Hyperlink Induced Topic Search (HITS)
• Influence Maximization (Greedy, CELF)
Community
Detection
• Triangle Count
• Local Clustering Coefficient
• Connected Components (Union Find)
• Strongly Connected Components
• Label Propagation
• Louvain Modularity
• K-1 Coloring
• Modularity Optimization
• Speaker Listener Label Propagation
Supervised
Machine Learning
• Node Classification
• Link Prediction
… and more!
Heuristic Link
Prediction
• Adamic Adar
• Common Neighbors
• Preferential Attachment
• Resource Allocations
• Same Community
• Total Neighbors
Similarity
• Node Similarity
• K-Nearest Neighbors (KNN)
• Jaccard Similarity
• Cosine Similarity
• Pearson Similarity
• Euclidean Distance
• Approximate Nearest Neighbors (ANN)
Graph
Embeddings
• Node2Vec
• FastRP
• FastRPExtended
• GraphSAGE
• Synthetic Graph Generation
• Scale Properties
• Collapse Paths
• One Hot Encoding
• Split Relationships
• Graph Export
• Pregel API (write your own algos)
12. Neo4j, Inc. All rights reserved 2021
Neo4j, Inc. All rights reserved 2021
12
What’s New?
13. Neo4j, Inc. All rights reserved 2021
13
GDS 1.6: GA May 27
Compatible with Neo4j 4.x series:
• Seven algorithms graduated
to the fully supported product
tier
• All ML models now support
model persistence
• Major improvements to our
embeddings
• New capabilities like graph
filtering, scalers, and more
● Article Rank, Eigenvector
Centrality, Degree Centrality
● Pathfinding
Product tier
algos
● Project subgraphs based on
existing properties in the
in-memory graph
Subgraph
Projections
● Up to 3 models in CE
● Model persistence for node
classification, link prediction
Machine
Learning
● Improvements to node
classification, link prediction
● Scaling & normalization
ML maturity
● New algorithms for influence
maximization thanks to
@xkitsios
Community
Contributions
14. Neo4j, Inc. All rights reserved 2021
Machine Learning Improvements
Community Edition users now have up to 3 trained models 🎉
…. But that’s not all:
• We’d added gds.alpha.scaleProperties
, supporting min-max, max, mean,
log, standard score, L1 and L2 Norm scaling for properties
• NodeClassification and LinkPrediction now support stream and write
modes, and their models can be saved, published and restored
• Node2Vec has been promoted to the beta tier - significantly faster,
supports weights, seeding, and mutate mode
15. Neo4j, Inc. All rights reserved 2021
Subgraph Projections
You can now create a new in-memory graph by filtering based on properties in
your existing one with gds.beta.graph.subgraph:
• Use native projections and subset your graph,
instead of using expensive cypher projections
• Pre-process your data for faster execution, for
example calculating degree centrality and removing
high/low degree nodes, or running WCC and
creating graphs for each component
• Chain algorithms together by filtering on
properties, like running Louvain and then
calculating nodeSimilarity for each community
node.class = 1
Degree > 1
Louvain Community ID = 4
16. Neo4j, Inc. All rights reserved 2021
Influence Maximization Algorithms
Finding the nodes in a graph that can trigger
cascading changes:
• Who do I market to, to drive the most adoption?
• Which blogs should I read to get news first?
• Who should you test to get early warning of an outbreak?
… or: Given a network with n nodes and given a “spreading” or propagation process
on that network, choose a “seed set” s, of size k<n to maximize the number of nodes
in the network that are ultimately influenced
17. Neo4j, Inc. All rights reserved 2021
Influence Maximization Algorithms
Finding the nodes in a graph that can trigger
cascading changes:
• Who do I market to, to drive the most adoption?
• Which blogs should I read to get news first?
• Who should you test to get early warning of an outbreak?
This is a combinatorial optimization problem - computationally complex!
● Greedy method: polynomial time approximation
● CELF method: faster than greedy on realistic network sizes and structures
18. Neo4j, Inc. All rights reserved 2021
Influence Maximization Algorithms
Finding the nodes in a graph that can trigger
cascading changes:
• Who do I market to, to drive the most adoption?
• Which blogs should I read to get news first?
• Who should you test to get early warning of an outbreak?
These algorithms were contributed by community member @xkitsios 💕
19. Neo4j, Inc. All rights reserved 2021
Neo4j, Inc. All rights reserved 2021
Real World Use Cases
19
20. Neo4j, Inc. All rights reserved 2021
20
Accelerate Innovation using Neo4j Graph Data Science
From Simple to Highly Sophisticated Data Science
Uranus is the third
biggest planet
R&D: Better health
outcomes through
machine learning on
patient journeys
Disambiguation
with graph
algorithms at scale
Analytics to improve reliability
by predicting problems in a
supply-chain knowledge graph
Analysis Repeatability
Analysis
Complexity
Full Production
Simple, Ad Hoc
High
Analytics
Data Science
21. Neo4j, Inc. All rights reserved 2021
21
• Challenge: Difficulty finding at-fault
components via ad hoc analytics on a
vertically integrated supply chain
• Solution: Uses a knowledge graph to model
and analyze their complex products
• Results:
○ Quickly pinpoint root causes of
problems
○ Reduced query times from two
minutes to seconds
○ Anti-recommendation using graph
algorithms to identify and eliminate
bad combinations of components
Boston Scientific
Finding At-Fault Components
22. Neo4j, Inc. All rights reserved 2021
22
• Challenge: It’s hard to make
recommendations to anonymous users
• Solution: Connect first and third party
cookies using graph algorithms to create
unique profiles
• Results:
○ Converted 14B anonymous data
points into 163M user profiles
○ Drove 612% increase in web
traffic
Meredith Corp
Identifying the Anonymous
23. Neo4j, Inc. All rights reserved 2021
23
AstraZeneca
Patient Journey
“We used graph algorithms to find
patients that had specific journey
types and patterns and then find
others that are close and similar.”
Joseph Roemer
Global Commercial IT Insight & Analytics Sr. Director
AstraZeneca
● Challenge: How to best intervene sooner for
complex diseases that develop over years
● Solution: Neo4j knowledge graph of 3 yrs of
visits, tests, & diagnosis with 10’s Bn of
records. Using graph algorithms and
machine learning together.
● Results:
○ Identified journey archetypes and
patterns using graph feature
engineering as input to ML
○ Revealed journey similarities over
time with community detection
○ Found influential touch-points in the
journey using graph algorithms
24. Neo4j, Inc. All rights reserved 2021
Neo4j, Inc. All rights reserved 2021
Demo
24
25. Neo4j, Inc. All rights reserved 2021
25
Graph-Native ML Workflows inside Neo4j
Graph-Native
Feature
Engineering
Train
Predictive
Model
Queries
Algorithms
Embeddings
1. Model Type
2. Property
Selection
3. Train & Test
4. Model
Selection
Apply Model to
Existing / New
Data
Use Predictions
for Decisions
Use Predictions
to Enhance
the Graph
Publish & Share
Store Model in
Database
26. Neo4j, Inc. All rights reserved 2021
Neo4j, Inc. All rights reserved 2021
26
Resources
Graph Resources
● Video: Advantages of Graph Technology
● Whitepaper: AI & Graph Technology: Enhancing AI with Context &
Connections
● Whitepaper: Financial Fraud Detection with Graph Data Science
● Case Study: Meredith Corporation
Neo4j BookShelf
● Graph Databases For Dummies
● Graph Data Science For Dummies
● O’Reilly Graph Algorithms
27. Neo4j, Inc. All rights reserved 2021
Neo4j, Inc. All rights reserved 2021
27
Resources
Get Started
● Sandbox: https://neo4j.com/sandbox/
● Guides: neo4j.com/developer/graph-data-science/
● GitHub: github.com/neo4j/graph-data-science
28. Neo4j, Inc. All rights reserved 2021
Neo4j, Inc. All rights reserved 2021
28
Thank you!
Contact us at
sales@neo4j.com