SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
WIFI SSID:SparkAISummit | Password: UnifiedAnalytics
Amy Hodler, Neo4j
Improve ML Predictions using
Connected Features
#Neo4j
#GraphAnalytics
#UnifiedAnalytics #SparkAISummit
The Next 20 Minutes
• Graphs for Predictions
• Link Prediction
• Neo4j + Spark Workflow
#UnifiedAnalytics #SparkAISummit #Neo4j #GraphAnalytics
Amy E. Hodler
Graph Analytics & AI Program Manager, Neo4j
Amy.Hodler@neo4j.com @amyhodler
neo4j.com/
graph-algorithms-book
Chapter 8: Link Prediction
Spark & Neo4j
What in Common is Predictive?
Relationships Are often
the Strongest Predictors of Behavior
“Increasingly we're learning that you can make
better predictions about people by getting all the
information from their friends and their friends’
friends than you can from the information you
have about the person themselves”
Graph Data Science Use Cases
ML
How Graph Technology is Changing AI
4:30 PM Room 2002
Connected Features
Features for ML:
Feature Extraction
Feature Extraction is how when we change the shape or format of
the data to be usable in a machine learning pipeline. For example,
from a graph, we extract the relevant subset of the data into a
tabular format for model building.
Features for ML:
Feature Engineering
Feature Engineering is how we combine and process the data to
create new, more meaningful features, such as clustering or
connectivity metrics.
Influence
Connectivity
Communities
Relationships
Features for ML:
Feature Selection
Feature Selection is how we reduce the number of features used
in a model to a relevant subset. This can be done algorithmically or
based on domain expertise, but the objective is to maximize the
predictive power of your model while minimizing overfitting.
Stop Throwing Away Data You Already Have
Decisions
$
Better Decisions
Machine Learning Pipeline Machine Learning Pipeline
Link Prediction
Can we infer which new interactions are likely to occur
in the future?
#UnifiedAnalytics #SparkAISummit
+ 50 years of biomedical
data integrated in a
knowledge graph
Predicting new uses for
drugs by using the graph
structure to create features
for link prediction
16
het.io
#UnifiedAnalytics #SparkAISummit
het.io
17
Link Prediction Methods
Algorithm Measures
Run targeted algorithms and score
outcomes
Set a threshold value used to
predict a link between nodes
Machine Learning
Use the measures as features to
train an ML model
Community
Detection
Link
Prediction
Similarity
1st
Node
2nd
Node
Common
Neighbors
Preferential
Attachment
label
1 2 4 15 1
3 4 7 12 1
5 6 1 1 0
Example:
Predicting Collaboration
Predicting Collaboration with a
Graph Enhanced ML Model
• Citation Network Dataset - Research Dataset
– Used a subset with 52K papers, 80K authors, 140K author
relationships and 29K citation relationships
– “ArnetMiner: Extraction and Mining of Academic Social
Networks”, by J. Tang et al
• Neo4j
– Create a co-authorship graph and connected feature engineering
• Spark and MLlib
– Train and test our model using a random forest classifier
Our Link Prediction Workflow
Import Data
Create Co-Author
Graph
Extract Data &
Store as Graph
Explore, Clean,
Modify
Prepare for
Machine Learning
Train
Models
Evaluate
Results
Productionize
Identified sparse
feature areas
Feature
Engineering:
New graphy
features
Train / Test Split
Resample:
Downsampled for
proportional
representation
Precision,
Accuracy, Recall
ROC Curve &
AUC
Model Selection:
Random Forest
Ensemble
method
Graph Algorithms Used for
Feature Engineering (few examples)
Preferential Attachment measure the
closeness of nodes based on shared neighbors
Common Neighbors measures the number of
possible neighbors (triadic closure)
Illustration from be.amazd.com/link-prediction/
Triangle counting and clustering coefficients
measure the density of connections around nodes
Louvain Modularity identifies interacting
communities and hierarchies
Graph Algorithms Used for
Feature Engineering (few examples)
Training Our Model
This is one decision tree in
our Random Forest used as a
binary classifier to learn how
to classify a pair: predicting
either linked or not linked.
OMG I’m Good!
Data Leakage!
We had to go back and use time-
based splits for train/test datasets
Did you get really high accuracy
on your first run without tuning?
Results
FirstModel
Results
FirstModelLastModel
Feature Influence for Tuning
To compute feature
importance, the random forest
algorithm in Spark averages
the reduction in impurity
across all trees in the forest
Feature rankings are in
comparison to the group of
features evaluated
Resources
#UnifiedAnalytics #SparkAISummit #Neo4j #GraphAnalytics
Code/Repositories:
This example from O’Reilly book
bit.ly/2FPgGVV (ML Folder)
Python notebook:
github.com/AliciaFrame/
Public-Python-Notebooks
neo4j.com/
graph-algorithms-book
Chapter 8: Link Prediction
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT
Amy.Hodler@neo4j.com
Extra for Q&A
#UnifiedAnalytics #SparkAISummit #Neo4j #GraphAnalytics
Resources
Spark Community
• spark.apache.org/community.html
• users@spark.apache.org
#UnifiedAnalytics #SparkAISummit #Neo4j #GraphAnalytics
Code/Repositories
This example from O’Reilly Book:
bit.ly/2FPgGVV (ML Folder)
Python notebook:
github.com/AliciaFrame/
Public-Python-Notebooks
Neo4j Community
• neo4j.com/developer/
• neo4j.com/developer/graph-algorithms/
• community.neo4j.com
CAR
DRIVES
name: “Dan”
born: May 29, 1970
twitter: “@dan”
name: “Ann”
born: Dec 5, 1975
since:
Jan 10, 2011
brand: “Volvo”
model: “V70”
Latitude: 37.5629900°
Longitude: -122.3255300°
Nodes
• Can have Labels to classify nodes
• Labels have native indexes
Relationships
• Relate nodes by type and direction
Properties
• Attributes of Nodes & Relationships
• Stored as Name/Value pairs
• Can have indexes and composite indexes
• Visibility security by user/role
Neo4j Invented the Labeled Property Graph Model
MARRIED TO
LIVES WITH
OW
NS
PERSON PERSON
33
ML Model - Random Forest
neo4j.com/graph-algorithms-book
Free O’Reilly Book
Spark and Neo4j Examples
Chapter 8: Machine Learning
Visit the Neo4j Booth

Contenu connexe

Tendances

How to Build a Fraud Detection Solution with Neo4j
How to Build a Fraud Detection Solution with Neo4jHow to Build a Fraud Detection Solution with Neo4j
How to Build a Fraud Detection Solution with Neo4jNeo4j
 
The Case for Graphs in Supply Chains
The Case for Graphs in Supply ChainsThe Case for Graphs in Supply Chains
The Case for Graphs in Supply ChainsNeo4j
 
The Knowledge Graph Explosion
The Knowledge Graph ExplosionThe Knowledge Graph Explosion
The Knowledge Graph ExplosionNeo4j
 
Data-centric design and the knowledge graph
Data-centric design and the knowledge graphData-centric design and the knowledge graph
Data-centric design and the knowledge graphAlan Morrison
 
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Amsterdam - The Neo4j Graph Data Platform Today & TomorrowAmsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Amsterdam - The Neo4j Graph Data Platform Today & TomorrowNeo4j
 
Workshop - Neo4j Graph Data Science
Workshop - Neo4j Graph Data ScienceWorkshop - Neo4j Graph Data Science
Workshop - Neo4j Graph Data ScienceNeo4j
 
Introduction to Graph Databases.pdf
Introduction to Graph Databases.pdfIntroduction to Graph Databases.pdf
Introduction to Graph Databases.pdfNeo4j
 
Graph-Based Customer Journey Analytics with Neo4j
Graph-Based Customer Journey Analytics with Neo4jGraph-Based Customer Journey Analytics with Neo4j
Graph-Based Customer Journey Analytics with Neo4jNeo4j
 
Knowledge Graphs for Supply Chain Operations.pdf
Knowledge Graphs for Supply Chain Operations.pdfKnowledge Graphs for Supply Chain Operations.pdf
Knowledge Graphs for Supply Chain Operations.pdfVaticle
 
Knowledge Graphs and Generative AI
Knowledge Graphs and Generative AIKnowledge Graphs and Generative AI
Knowledge Graphs and Generative AINeo4j
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4jNeo4j
 
Neo4j GraphSummit London - The Path To Success With Graph Database and Data S...
Neo4j GraphSummit London - The Path To Success With Graph Database and Data S...Neo4j GraphSummit London - The Path To Success With Graph Database and Data S...
Neo4j GraphSummit London - The Path To Success With Graph Database and Data S...Neo4j
 
지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)
지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)
지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)Myungjin Lee
 
Graph Data Modeling Best Practices(Eric_Monk).pptx
Graph Data Modeling Best Practices(Eric_Monk).pptxGraph Data Modeling Best Practices(Eric_Monk).pptx
Graph Data Modeling Best Practices(Eric_Monk).pptxNeo4j
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaDatabricks
 
Banking Circle: Money Laundering Beware: A Modern Approach to AML with Machin...
Banking Circle: Money Laundering Beware: A Modern Approach to AML with Machin...Banking Circle: Money Laundering Beware: A Modern Approach to AML with Machin...
Banking Circle: Money Laundering Beware: A Modern Approach to AML with Machin...Neo4j
 
Kerry Group: How Neo4j graph technology is delivering benefits to Kerry Group...
Kerry Group: How Neo4j graph technology is delivering benefits to Kerry Group...Kerry Group: How Neo4j graph technology is delivering benefits to Kerry Group...
Kerry Group: How Neo4j graph technology is delivering benefits to Kerry Group...Neo4j
 
Fighting financial fraud at Danske Bank with artificial intelligence
Fighting financial fraud at Danske Bank with artificial intelligenceFighting financial fraud at Danske Bank with artificial intelligence
Fighting financial fraud at Danske Bank with artificial intelligenceRon Bodkin
 
Workshop Introduction to Neo4j
Workshop Introduction to Neo4jWorkshop Introduction to Neo4j
Workshop Introduction to Neo4jNeo4j
 

Tendances (20)

How to Build a Fraud Detection Solution with Neo4j
How to Build a Fraud Detection Solution with Neo4jHow to Build a Fraud Detection Solution with Neo4j
How to Build a Fraud Detection Solution with Neo4j
 
The Case for Graphs in Supply Chains
The Case for Graphs in Supply ChainsThe Case for Graphs in Supply Chains
The Case for Graphs in Supply Chains
 
The Knowledge Graph Explosion
The Knowledge Graph ExplosionThe Knowledge Graph Explosion
The Knowledge Graph Explosion
 
Data-centric design and the knowledge graph
Data-centric design and the knowledge graphData-centric design and the knowledge graph
Data-centric design and the knowledge graph
 
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Amsterdam - The Neo4j Graph Data Platform Today & TomorrowAmsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
 
Workshop - Neo4j Graph Data Science
Workshop - Neo4j Graph Data ScienceWorkshop - Neo4j Graph Data Science
Workshop - Neo4j Graph Data Science
 
Introduction to Graph Databases.pdf
Introduction to Graph Databases.pdfIntroduction to Graph Databases.pdf
Introduction to Graph Databases.pdf
 
Graph-Based Customer Journey Analytics with Neo4j
Graph-Based Customer Journey Analytics with Neo4jGraph-Based Customer Journey Analytics with Neo4j
Graph-Based Customer Journey Analytics with Neo4j
 
Fraud Analytics
Fraud AnalyticsFraud Analytics
Fraud Analytics
 
Knowledge Graphs for Supply Chain Operations.pdf
Knowledge Graphs for Supply Chain Operations.pdfKnowledge Graphs for Supply Chain Operations.pdf
Knowledge Graphs for Supply Chain Operations.pdf
 
Knowledge Graphs and Generative AI
Knowledge Graphs and Generative AIKnowledge Graphs and Generative AI
Knowledge Graphs and Generative AI
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 
Neo4j GraphSummit London - The Path To Success With Graph Database and Data S...
Neo4j GraphSummit London - The Path To Success With Graph Database and Data S...Neo4j GraphSummit London - The Path To Success With Graph Database and Data S...
Neo4j GraphSummit London - The Path To Success With Graph Database and Data S...
 
지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)
지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)
지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)
 
Graph Data Modeling Best Practices(Eric_Monk).pptx
Graph Data Modeling Best Practices(Eric_Monk).pptxGraph Data Modeling Best Practices(Eric_Monk).pptx
Graph Data Modeling Best Practices(Eric_Monk).pptx
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks Delta
 
Banking Circle: Money Laundering Beware: A Modern Approach to AML with Machin...
Banking Circle: Money Laundering Beware: A Modern Approach to AML with Machin...Banking Circle: Money Laundering Beware: A Modern Approach to AML with Machin...
Banking Circle: Money Laundering Beware: A Modern Approach to AML with Machin...
 
Kerry Group: How Neo4j graph technology is delivering benefits to Kerry Group...
Kerry Group: How Neo4j graph technology is delivering benefits to Kerry Group...Kerry Group: How Neo4j graph technology is delivering benefits to Kerry Group...
Kerry Group: How Neo4j graph technology is delivering benefits to Kerry Group...
 
Fighting financial fraud at Danske Bank with artificial intelligence
Fighting financial fraud at Danske Bank with artificial intelligenceFighting financial fraud at Danske Bank with artificial intelligence
Fighting financial fraud at Danske Bank with artificial intelligence
 
Workshop Introduction to Neo4j
Workshop Introduction to Neo4jWorkshop Introduction to Neo4j
Workshop Introduction to Neo4j
 

Similaire à Improve ML Predictions using Connected Feature Extraction

Predicting Influence and Communities Using Graph Algorithms
Predicting Influence and Communities Using Graph AlgorithmsPredicting Influence and Communities Using Graph Algorithms
Predicting Influence and Communities Using Graph AlgorithmsDatabricks
 
Improve ml predictions using graph algorithms (webinar july 23_19).pptx
Improve ml predictions using graph algorithms (webinar july 23_19).pptxImprove ml predictions using graph algorithms (webinar july 23_19).pptx
Improve ml predictions using graph algorithms (webinar july 23_19).pptxNeo4j
 
3. Relationships Matter: Using Connected Data for Better Machine Learning
3. Relationships Matter: Using Connected Data for Better Machine Learning3. Relationships Matter: Using Connected Data for Better Machine Learning
3. Relationships Matter: Using Connected Data for Better Machine LearningNeo4j
 
Graph Data Science: The Secret to Accelerating Innovation with AI/ML
Graph Data Science: The Secret to Accelerating Innovation with AI/MLGraph Data Science: The Secret to Accelerating Innovation with AI/ML
Graph Data Science: The Secret to Accelerating Innovation with AI/MLNeo4j
 
Leveraging Graphs for Better AI
Leveraging Graphs for Better AILeveraging Graphs for Better AI
Leveraging Graphs for Better AINeo4j
 
Leveraging Graphs for Better AI
Leveraging Graphs for Better AILeveraging Graphs for Better AI
Leveraging Graphs for Better AINeo4j
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnApache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnDatabricks
 
Graph Algorithms for Developers
Graph Algorithms for DevelopersGraph Algorithms for Developers
Graph Algorithms for DevelopersNeo4j
 
GraphTour 2020 - Graphs & AI: A Path for Data Science
GraphTour 2020 - Graphs & AI: A Path for Data ScienceGraphTour 2020 - Graphs & AI: A Path for Data Science
GraphTour 2020 - Graphs & AI: A Path for Data ScienceNeo4j
 
How Graphs Enhance AI
How Graphs Enhance AIHow Graphs Enhance AI
How Graphs Enhance AINeo4j
 
Using Connected Data and Graph Technology to Enhance Machine Learning and Art...
Using Connected Data and Graph Technology to Enhance Machine Learning and Art...Using Connected Data and Graph Technology to Enhance Machine Learning and Art...
Using Connected Data and Graph Technology to Enhance Machine Learning and Art...Neo4j
 
Studying Software Engineering Patterns for Designing Machine Learning Systems
Studying Software Engineering Patterns for Designing Machine Learning SystemsStudying Software Engineering Patterns for Designing Machine Learning Systems
Studying Software Engineering Patterns for Designing Machine Learning SystemsHironori Washizaki
 
PERFORMANCE EVALUATION OF SOCIAL NETWORK ANALYSIS ALGORITHMS USING DISTRIBUTE...
PERFORMANCE EVALUATION OF SOCIAL NETWORK ANALYSIS ALGORITHMS USING DISTRIBUTE...PERFORMANCE EVALUATION OF SOCIAL NETWORK ANALYSIS ALGORITHMS USING DISTRIBUTE...
PERFORMANCE EVALUATION OF SOCIAL NETWORK ANALYSIS ALGORITHMS USING DISTRIBUTE...Journal For Research
 
GraphSummit Toronto: Leveraging Graphs for AI and ML
GraphSummit Toronto: Leveraging Graphs for AI and MLGraphSummit Toronto: Leveraging Graphs for AI and ML
GraphSummit Toronto: Leveraging Graphs for AI and MLNeo4j
 
Relationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningRelationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningNeo4j
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezBig Data Spain
 

Similaire à Improve ML Predictions using Connected Feature Extraction (20)

Predicting Influence and Communities Using Graph Algorithms
Predicting Influence and Communities Using Graph AlgorithmsPredicting Influence and Communities Using Graph Algorithms
Predicting Influence and Communities Using Graph Algorithms
 
Improve ml predictions using graph algorithms (webinar july 23_19).pptx
Improve ml predictions using graph algorithms (webinar july 23_19).pptxImprove ml predictions using graph algorithms (webinar july 23_19).pptx
Improve ml predictions using graph algorithms (webinar july 23_19).pptx
 
3. Relationships Matter: Using Connected Data for Better Machine Learning
3. Relationships Matter: Using Connected Data for Better Machine Learning3. Relationships Matter: Using Connected Data for Better Machine Learning
3. Relationships Matter: Using Connected Data for Better Machine Learning
 
Graph Data Science: The Secret to Accelerating Innovation with AI/ML
Graph Data Science: The Secret to Accelerating Innovation with AI/MLGraph Data Science: The Secret to Accelerating Innovation with AI/ML
Graph Data Science: The Secret to Accelerating Innovation with AI/ML
 
Leveraging Graphs for Better AI
Leveraging Graphs for Better AILeveraging Graphs for Better AI
Leveraging Graphs for Better AI
 
ODSC APAC 2022 - Explainable AI
ODSC APAC 2022 - Explainable AIODSC APAC 2022 - Explainable AI
ODSC APAC 2022 - Explainable AI
 
Leveraging Graphs for Better AI
Leveraging Graphs for Better AILeveraging Graphs for Better AI
Leveraging Graphs for Better AI
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnApache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
 
Graph Algorithms for Developers
Graph Algorithms for DevelopersGraph Algorithms for Developers
Graph Algorithms for Developers
 
GraphTour 2020 - Graphs & AI: A Path for Data Science
GraphTour 2020 - Graphs & AI: A Path for Data ScienceGraphTour 2020 - Graphs & AI: A Path for Data Science
GraphTour 2020 - Graphs & AI: A Path for Data Science
 
Ml product page
Ml product pageMl product page
Ml product page
 
Ml product page
Ml product pageMl product page
Ml product page
 
How Graphs Enhance AI
How Graphs Enhance AIHow Graphs Enhance AI
How Graphs Enhance AI
 
Using Connected Data and Graph Technology to Enhance Machine Learning and Art...
Using Connected Data and Graph Technology to Enhance Machine Learning and Art...Using Connected Data and Graph Technology to Enhance Machine Learning and Art...
Using Connected Data and Graph Technology to Enhance Machine Learning and Art...
 
Machine learning
 Machine learning Machine learning
Machine learning
 
Studying Software Engineering Patterns for Designing Machine Learning Systems
Studying Software Engineering Patterns for Designing Machine Learning SystemsStudying Software Engineering Patterns for Designing Machine Learning Systems
Studying Software Engineering Patterns for Designing Machine Learning Systems
 
PERFORMANCE EVALUATION OF SOCIAL NETWORK ANALYSIS ALGORITHMS USING DISTRIBUTE...
PERFORMANCE EVALUATION OF SOCIAL NETWORK ANALYSIS ALGORITHMS USING DISTRIBUTE...PERFORMANCE EVALUATION OF SOCIAL NETWORK ANALYSIS ALGORITHMS USING DISTRIBUTE...
PERFORMANCE EVALUATION OF SOCIAL NETWORK ANALYSIS ALGORITHMS USING DISTRIBUTE...
 
GraphSummit Toronto: Leveraging Graphs for AI and ML
GraphSummit Toronto: Leveraging Graphs for AI and MLGraphSummit Toronto: Leveraging Graphs for AI and ML
GraphSummit Toronto: Leveraging Graphs for AI and ML
 
Relationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningRelationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine Learning
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 

Plus de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Plus de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Dernier

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 

Dernier (20)

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 

Improve ML Predictions using Connected Feature Extraction

  • 1. WIFI SSID:SparkAISummit | Password: UnifiedAnalytics
  • 2. Amy Hodler, Neo4j Improve ML Predictions using Connected Features #Neo4j #GraphAnalytics #UnifiedAnalytics #SparkAISummit
  • 3. The Next 20 Minutes • Graphs for Predictions • Link Prediction • Neo4j + Spark Workflow #UnifiedAnalytics #SparkAISummit #Neo4j #GraphAnalytics Amy E. Hodler Graph Analytics & AI Program Manager, Neo4j Amy.Hodler@neo4j.com @amyhodler neo4j.com/ graph-algorithms-book Chapter 8: Link Prediction Spark & Neo4j
  • 4. What in Common is Predictive?
  • 5. Relationships Are often the Strongest Predictors of Behavior “Increasingly we're learning that you can make better predictions about people by getting all the information from their friends and their friends’ friends than you can from the information you have about the person themselves”
  • 6.
  • 7. Graph Data Science Use Cases
  • 8. ML How Graph Technology is Changing AI 4:30 PM Room 2002
  • 10. Features for ML: Feature Extraction Feature Extraction is how when we change the shape or format of the data to be usable in a machine learning pipeline. For example, from a graph, we extract the relevant subset of the data into a tabular format for model building.
  • 11. Features for ML: Feature Engineering Feature Engineering is how we combine and process the data to create new, more meaningful features, such as clustering or connectivity metrics. Influence Connectivity Communities Relationships
  • 12. Features for ML: Feature Selection Feature Selection is how we reduce the number of features used in a model to a relevant subset. This can be done algorithmically or based on domain expertise, but the objective is to maximize the predictive power of your model while minimizing overfitting.
  • 13. Stop Throwing Away Data You Already Have Decisions $ Better Decisions Machine Learning Pipeline Machine Learning Pipeline
  • 15. Can we infer which new interactions are likely to occur in the future?
  • 16. #UnifiedAnalytics #SparkAISummit + 50 years of biomedical data integrated in a knowledge graph Predicting new uses for drugs by using the graph structure to create features for link prediction 16 het.io
  • 18. Link Prediction Methods Algorithm Measures Run targeted algorithms and score outcomes Set a threshold value used to predict a link between nodes Machine Learning Use the measures as features to train an ML model Community Detection Link Prediction Similarity 1st Node 2nd Node Common Neighbors Preferential Attachment label 1 2 4 15 1 3 4 7 12 1 5 6 1 1 0
  • 20. Predicting Collaboration with a Graph Enhanced ML Model • Citation Network Dataset - Research Dataset – Used a subset with 52K papers, 80K authors, 140K author relationships and 29K citation relationships – “ArnetMiner: Extraction and Mining of Academic Social Networks”, by J. Tang et al • Neo4j – Create a co-authorship graph and connected feature engineering • Spark and MLlib – Train and test our model using a random forest classifier
  • 21. Our Link Prediction Workflow Import Data Create Co-Author Graph Extract Data & Store as Graph Explore, Clean, Modify Prepare for Machine Learning Train Models Evaluate Results Productionize Identified sparse feature areas Feature Engineering: New graphy features Train / Test Split Resample: Downsampled for proportional representation Precision, Accuracy, Recall ROC Curve & AUC Model Selection: Random Forest Ensemble method
  • 22. Graph Algorithms Used for Feature Engineering (few examples) Preferential Attachment measure the closeness of nodes based on shared neighbors Common Neighbors measures the number of possible neighbors (triadic closure) Illustration from be.amazd.com/link-prediction/
  • 23. Triangle counting and clustering coefficients measure the density of connections around nodes Louvain Modularity identifies interacting communities and hierarchies Graph Algorithms Used for Feature Engineering (few examples)
  • 24. Training Our Model This is one decision tree in our Random Forest used as a binary classifier to learn how to classify a pair: predicting either linked or not linked.
  • 25. OMG I’m Good! Data Leakage! We had to go back and use time- based splits for train/test datasets Did you get really high accuracy on your first run without tuning?
  • 28. Feature Influence for Tuning To compute feature importance, the random forest algorithm in Spark averages the reduction in impurity across all trees in the forest Feature rankings are in comparison to the group of features evaluated
  • 29. Resources #UnifiedAnalytics #SparkAISummit #Neo4j #GraphAnalytics Code/Repositories: This example from O’Reilly book bit.ly/2FPgGVV (ML Folder) Python notebook: github.com/AliciaFrame/ Public-Python-Notebooks neo4j.com/ graph-algorithms-book Chapter 8: Link Prediction
  • 30. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT Amy.Hodler@neo4j.com
  • 31. Extra for Q&A #UnifiedAnalytics #SparkAISummit #Neo4j #GraphAnalytics
  • 32. Resources Spark Community • spark.apache.org/community.html • users@spark.apache.org #UnifiedAnalytics #SparkAISummit #Neo4j #GraphAnalytics Code/Repositories This example from O’Reilly Book: bit.ly/2FPgGVV (ML Folder) Python notebook: github.com/AliciaFrame/ Public-Python-Notebooks Neo4j Community • neo4j.com/developer/ • neo4j.com/developer/graph-algorithms/ • community.neo4j.com
  • 33. CAR DRIVES name: “Dan” born: May 29, 1970 twitter: “@dan” name: “Ann” born: Dec 5, 1975 since: Jan 10, 2011 brand: “Volvo” model: “V70” Latitude: 37.5629900° Longitude: -122.3255300° Nodes • Can have Labels to classify nodes • Labels have native indexes Relationships • Relate nodes by type and direction Properties • Attributes of Nodes & Relationships • Stored as Name/Value pairs • Can have indexes and composite indexes • Visibility security by user/role Neo4j Invented the Labeled Property Graph Model MARRIED TO LIVES WITH OW NS PERSON PERSON 33
  • 34. ML Model - Random Forest
  • 35.
  • 36. neo4j.com/graph-algorithms-book Free O’Reilly Book Spark and Neo4j Examples Chapter 8: Machine Learning Visit the Neo4j Booth