SlideShare une entreprise Scribd logo
1  sur  45
Télécharger pour lire hors ligne
Thomas Cook
Sales Director, AnzoGraph DB
e: thomas.cook@cambridgesemantics.com
w: www.anzograph.com
Knowledge Graphs for Machine Learning
and Data Science
#DCAF 2020
Feb 6, 2020
Data Continues to Grow
AI and ML Adoption Grows for Better & Faster Insights
Need for:
– Automated Data Preparation & Better Understanding
– Explainable AI & ML with Provenance
– Improved Algorithms & Analytics
– Cost Efficient Operations
Context
Knowledge
Graphs
&
Graph
Analytics
Knowledge Graphs to Automate
Data Preparation & Improve Common Understanding
©2019 Cambridge Semantics Inc. All rights reserved.
The Data Preparation Problem
Data Access
● Manual ETL coding
● Practicalities limit the # of
sources and types of data
Data Processing
● Laborious discovery, profiling
and selection
● Use of rules and coding for
harmonization & cleansing
Feature Engineering
● Manual coding to transform
data
● Manual feature engineering
& selection
1 Cleaning Big Data, Forbes Magazine
70-80% of time spent in Data Preparation & Feature Engineering
4
Viewed as the “least enjoyable” part of work by 76% of data scientists1
Structured Data
Automated Deployment and Operations
Storage and Compute Integration
MODEL
Graph Data Model
• Lift Data into
Data Fabric
• Design Ontologies
• Connect Data
Models
ON-BOARD
Ingest & Map
• Automated ETL
• Collaborative
Mapping
• Metadata
Capture
Enterprise
Data Sources
Machine
Learning and AI
Enterprise
Search
“Last Mile”
Analytics Tools
Metadata Catalog
Semantic-based Metadata Management, Governance and Lineage
Cloud or On-Prem Data Storage Infrastructure
Data Storage Layer
Ingest
BLEND
GraphMarts
• Combine and Align
Related Data Sets
• In-memory MPP
OLAP Query Engine
• Data Layers
ACCESS
Hi-Res Analytics
• Analyze All
Data Together
• Fast, Iterative Queries
Ad Hoc, What if
• Code Free or API
Graphical Application Interface
Anzo - The Modern Data Discovery and Integration Layer for the Enterprise Data Fabric
©2019 Cambridge Semantics Inc. All rights reserved.
Automated Data Ingestion & Cataloging
Unstructured Data
Notes, Docs, Emails,
Articles
Structured Data
Relational, CSV,
HDFS, External
Data Feeds
CatalogIngest
NLP, Text Analytics,
Sentiment Analysis Data Catalog
Semantic
Layer
Data Harmonization – Structured or Unstructured
• Harmonize Many Data Sources
• Automated Unstructured Data Extraction &
Categorization
Data Wrangling Capability
• Profile, Rules… Manage & Clean incoming data
• Setup Re-usable Data Wrangling Jobs
• Provenance to Manage Data
Data Catalog
• Explore, Secure & Manage Dataset Assets
Result: Cleaner, quality data, faster & from many
more sources
6
©2018 Cambridge Semantics Inc. All rights reserved.
A big web of data
understandable at
the data level
©2019 Cambridge Semantics Inc. All rights reserved.
Allow for Easy Understanding & Handling of Data
Rules to Link &
Conform Data
Raw Data
Business Ready
Datasets Create Data
Layers
Build Graph
Marts
©2019 Cambridge Semantics Inc. All rights reserved.
Expedite & Optimize Feature Engineering
Use visual interface with no coding
for feature selection
• Query Knowledge Graph and generate
features
• Conduct data transformation using a library
of functions
• Compute new derived features
• De-normalize data
• Aggregate ranges
• Convert numeric values to alpha values
• Pivot values
and much more!
©2019 Cambridge Semantics Inc. All rights reserved.
Operationalizing of Machine Learning Models
10
Explainable
Insights
Manage Data Sets with
Provenance & Data Lineage
• Anzo retains end-to-end
data lineage
• Track transformations
Easy to Export Data to
ML & Data Science Tools
• Use Odata/REST APIs,
SQL ODBC/JDBC
• Export to R, Python,
downstream systems, …
Deploy & Continuously
Improve Model Performance
• Set up deployment pipelines with
learnings to help in feature selection
• Horizontally scale runtime environment
• Can be auto-deployed behind the
firewall or on the Cloud
Using Knowledge Graphs with
Graph Analytics Database as
Scalable Infrastructure for ML & Data Science
“Graph analytics will grow in the next few years
due to the need to ask complex questions across
complex data, which is not always practical or
even possible at scale using SQL queries”
…Gartner – Top 10 Data and Analytics Technology Trends for 2019
What it is:
● Fast, Scalable Graph Database
○ In-Memory Massively Parallel Processing
(MPP) ACID-Compliant Graph Database
○ Supports RDF & Labelled Property Graphs
What it does:
○ Fast Data Loading
○ Fast Query
○ Rich Analytics
■ Graph Algorithms
■ BI/DW Analytics
■ Inferencing
■ Data Science/Feature Engineering
Algorithms
■ Define-Your-Own Analytics
○ Linear Database Scaling
○ Persist data on cheap storage
Based on Open Standards
• Built on RDF & SPARQL 1.1 standards
• LPG with the RDF* /SPARQL*
• LPG with Cypher (in 2020)
Deploy on-prem or cloud
• Kubernetes/Helm on-demand cloud
deployment
• AWS, Google and Azure
AnzoGraph™ DB
Awards
Select Customers
217 X
AnzoGraph DB when compared
to Neo4j on and industry
standard
TPC-H & Graph 500
benchmarks
113 X
AnzoGraph’s LUBM
benchmark performance over
previous fastest result
30 X
AnzoGraph’s performance on
graph algorithms over SPARK
SQL and SPARK with
GraphFrames
Benchmarks
©2019 Cambridge Semantics Inc. All rights reserved.
Graph OLAP Built for Analytics at Scale and Speed
SQL OLAP vs Graph OLAP
SQL OLAP Graph OLAP
On-line Analytics at Massive Parallel
Processing (MPP) Scale with SQL Database
Example
Netezza
Amazon Redshift
Analytics
• Warehouse-Style BI Analytics
On-line Analytics at Massive Parallel Processing (MPP)
Scale with Native Graph Database
Example
AnzoGraph DB
Analytics
• Warehouse-Style BI Analytics
• Graph Algorithms
• Inferencing
• Data Science Functions
©2019 Cambridge Semantics Inc. All rights reserved.
Graph OLAP Built for Analytics at Scale and Speed
Graph OLTP vs Graph OLAP
Graph OLTP Graph OLAP
Transactional databases
• Built for building transactional
applications & individual
transactions
• Scales vertically
Example
Neo4j
AWS Neptune
Analytical databases
• Built for analytics and to deal with scale &
performance
• Deep Link analysis
• Analytics on the population
• Scales horizontally
• Can complement Graph OLTP systems
Example
AnzoGraph DB
Page
Labelled Property Graphs facilitates Analytics
isA: <Man>
birthday: 09/17/1975
isA: <Woman>
Birthday: 4/23/1979
isA: <Place>
has: Water
has: Trees
partOf: <TheMountain>
Person
: Jill
Person
: Jack
Place:
The
Hill
friendOf
WentUp
WentUp
metAt=<TheHill>
metDate=07/04/2018
Date=07/04/2018
Date=07/04/2018
Today with RDF* and SPARQL*
• Relationships can be described as
clearly as any LPG database
RDF*/SPARQL* extensions to the
standard make W3C open standards
databases even more capable
Page
Algorithms and Analytical Capabilities
Graph Patterns
Negation
Property Paths
BIND
Aggregates
Basic Federated Query
ORDER BY and offsets
Functions on Strings
Functions on Numerics
Functions on Dates and
Times
Hash Functions
Basic Graph Patterns
Count/Avg
Min/Max
GroupConcat
Sample
Page Rank
Shortest Path
All Path
Label Propagation
Weakly Connected
Components
K neighborhood
Counting Triangles
Inferences (RDFS+)
Labeled Property
Graphs (RDF*)
Window Aggregates
Advanced Grouping
Sets
Named Views
Named Queries
Conditional
Expressions
User-Defined
Extensions
SPARQL 1.1
Standards
AnzoGraph® DB
Extras
Graph Algorithms
and Inferencing
Data Science
Extensions (UDX)
Distributions
● Bernoulli
● Binomial
● Chi-squared
● Exponential
● Hypergeometric
● Laplace
● Log Normal
● Logarithmic Series
● Negative Binomial
● Normal
Correlations
● Pearson
Entropy
● Cross Entropy
● Differential Entropy
Page
User-defined Extensions (UDXs):
Allows users to extend AnzoGraph DB functionality for custom usage
User-Defined
Functions
(UDF)
Create and register custom analytic functions, such as functions that
concatenate values or convert integers to alternate currencies.
User-Defined
Aggregates
(UDA)
Create and register aggregate functions, such as functions that
compute the arithmetic mean or calculate the average number from
a list of maximum and minimum values.
User-Defined
Services
(UDS)
Create and register services that create local SPARQL endpoints.
User-Defined
Tables (UDT)
Create and register a function that is repeatedly invoked within a
query to generate the rows of a table on-the-fly.
Data
Science
Functions
User-
defined
Functions
(UDX)
Functions you can build in JAVA or C++
©2019 Cambridge Semantics Inc. All rights reserved.
Execute Supervised & Unsupervised ML with Graph Algorithms
Graph Algorithm
• PageRank
• Shortest Path
• K-neighbors
• All Paths
• Counting Triangles
• Weakly Connected
Components
• Label Propagation
• Triangle Enumeration
• Triangle Counting
• Clustering Coefficient
and more!
Who is the most influential person in your
customer list?
What’s the most important item relating to a
search of your knowledge graph?
What is the shortest path to your destination
across a route?
What’s the optimal path for packets to travel
across your network
source: Wikipedia
Try functions via SPARQL in Zeppelin or python Jupyter Notebooks
©2019 Cambridge Semantics Inc. All rights reserved.
Graph Algorithms produce additional Features to train ML Models
Graph Algorithms
source: Wikipedia
Try functions via SPARQL in Zeppelin or python Jupyter Notebooks
©2019 Cambridge Semantics Inc. All rights reserved.
Execute Inferencing using RDFS+ and OWL 2 RL
Person:
Jack
Person:
Jill
Is Married
Inference
Is Married
Person:
Jack
Person:
Sam
Knows
Inference
Knows
AnzoGraph allows you to insert inferred triples into
the specified target graph
If Jack is married to Jill, then you can
definitely infer that Jill is married to Jack
Jack knows Sam, but Sam may not know Jack.
Here, the inference is less clear
Both cases are supported.
Try functions via SPARQL in Zeppelin or python Jupyter Notebooks
©2019 Cambridge Semantics Inc. All rights reserved.
ELT for Data Engineering
•Wrangling, Blending, Munging, Transformations, Enrichment, Views
•Use statistical functions, transformations or enrichment to get the data
into the form needed for the downstream ML pipeline
INSERT {
graph <myNewGraph> {
?s a <Person>;
<fullname> ?fullname
}
}
USING <myOldGraph>
WHERE {
?s a <Person>;
<firstname> ?fname;
<lastname> ?lname;
BIND(CONCAT(?fname, “ “, ?lname) as ?fullname)
}
©2019 Cambridge Semantics Inc. All rights reserved.
ELT for Data Engineering
Materialized Views – good for heavy calculations - perform once - use many times
CREATE MATERIALIZED VIEW <ages> AS
CONSTRUCT { ?person <age> ?age . }
WHERE { GRAPH <tickit> {
{ SELECT ?person ((YEAR(?date))-(YEAR(xsd:dateTime(?birthdate))) AS ?age)
WHERE {
?person <birthday> ?birthdate .
BIND(xsd:dateTime(NOW()) AS ?date)
}
}
}
}
©2019 Cambridge Semantics Inc. All rights reserved.
ELT for Data Engineering
•Enrichment
Add new features from federated call using SERVICE call to
Linked Open Data Cloud or other internal SPARQL endpoints
Example:
Look up address and geocodes for company, census population data,
crime rate, demographics, etc. All these can be new features to fed
into ML pipeline
©2019 Cambridge Semantics Inc. All rights reserved.
ML Step #1: Data Prep: Data Discovery and Feature Engineering
2.1 Bernoulli Distribution Determines the probability of Success or Failure (or Yes or No).
2.2 Binomial Distribution Determines the probability of success versus failure.
2.3 Chi-squared Distribution Determines the relationship between two categorical variables.
2.4 Exponential Distribution Determines the probability of event occurrence in time interval when past event number is unknown
2.5 Hypergeometric Distribution Determines the probability of success versus failure of a specific scenario.
2.6 Laplace Distribution Determines the probability of intervals.
2.7 Log Normal Distribution To model certain instances, such as the change in price distribution of a stock or commodity positions.
2.8 Logarithmic Series Distribution Determines the probability of occurrence of events like claim frequencies in insurance companies.
2.9 Negative Binomial Distribution Determines the probability of success versus failure.
2.10 Normal Distribution Model and determines probabilities of all natural and social data.
2.11 Poisson Distribution Determines the probability that a certain number of events will occur in a specific time period.
2.12 Skellam Distribution Determines the probability of two independent variables.
2.13 Beta-binomial Distribution Model number of successes in n binomial trials when probability of success p is a Beta random variable.
2.14 Continuous Uniform Distribution Assigns equal probability to all values between its minimum and maximum.
2.15 Discrete Uniform Distribution Determines the probability of finite number of outcomes equally likely to happen.
2.16 Student’s t-Distribution Determines the probability when sample size is small.
2.17 Weibull Distribution Used to assess product reliability, analyse life data and model failure times.
©2019 Cambridge Semantics Inc. All rights reserved.
ML Step #1: Data Discovery and Feature Engineering
Correlations
3.1 Pearson Correlation Coefficient Determines the positive, negative or no relationship between two variables.
3.2 Matthews Correlation Coefficient Determines the positive, negative or no relationship between two binary variables (0 & 1).
3.3 Spearman’s Rank Correlation Coefficient Measures the strength of a linear relationship between paired data.
5.1 Principal Component Analysis Reduces the dimensionality of large data sets and making predictive models.
6.1 Geometric Mean Determines the average growth rates.
6.2 Skewness Metric Calculates Pearson’s coefficient of skewness on Numeric Values.
6.3 T-Digest Metric Determines the percentile and quantile values accurately.
Feature Exploration
Profiling Metrics
RDF*/SPARQL*
Real World Example: Airline Delay Data Analysis
Page
Labelled Property Graphs facilitates Analytics
isA: <Man>
birthday: 09/17/1975
isA: <Woman>
Birthday: 4/23/1979
isA: <Place>
has: Water
has: Trees
partOf: <TheMountain>
Person
: Jill
Person
: Jack
Place:
The
Hill
friendOf
WentUp
WentUp
metAt=<TheHill>
metDate=07/04/2018
Date=07/04/2018
Date=07/04/2018
Today with RDF* and SPARQL*
• Relationships can be described as
clearly as any LPG database
RDF*/SPARQL* extensions to the
standard make W3C open standards
databases even more capable
https://www.transtats.bts.gov/ot_delay/OT_DelayCause1.asp?pn=1
Public Flight Delay Data Analysis
YEAR
MONTH
DAY
DAY_OF_WEEK
AIRLINE
FLIGHT_NUMBER
TAIL_NUMBER
ORIGIN_AIRPORT
DESTINATION_AIRPORT
SCHEDULED_DEPARTURE
DEPARTURE_TIME
DEPARTURE_DELAY
TAXI_OUT
WHEELS_OFF
SCHEDULED_TIME
ELAPSED_TIME
Input CSV – 32 Columns - 5,819,080 records
ELAPSED_TIME
AIR_TIME
DISTANCE
WHEELS_ON
TAXI_IN
SCHEDULED_ARRIVAL
ARRIVAL_TIME
ARRIVAL_DELAY
DIVERTED
CANCELLED
CANCELLATION_REASON
AIR_SYSTEM_DELAY
SECURITY_DELAY
AIRLINE_DELAY
LATE_AIRCRAFT_DELAY
WEATHER_DELAY
Conversion from CSV to Graph – Defining Triples
Flight
Airport
Airport
FlightDeparture
FlightArrival
DESTINATION
FlightAirport
Airport
Conversion from CSV to Graph
Flight
AirportAirport
FlightDeparture FlightArrival
DESTINATION
Nodes have types and properties
Flight
YEAR
MONTH
DAY
DAY_OF_WEEK
AIRLINE
FLIGHT_NUMBER
TAIL_NUMBER
ORIGIN_AIRPORT
DESTINATION_AIRPORT
….
Node Type: Flight
Node Properties:
Airline,
Flight Number,
Tail Number,
etc
*Note: Types can also be called Labels, as in Labeled Property
Graphs or LPG
With RDF* edges can also have properties
AirportAirport
DESTINATION
DISTANCE = 187
AIRPORT_CODE = ‘BOS”
Edge Property:
DISTANCE
AIRPORT_CODE = ‘JFK”
Page
...
TABLE <s3://csi-notebook-datasets/Flight_Dataset/flights10k.csv>
('ContentType'='text/CSV','Schema'=',H,YEAR:int,MONTH:int,DAY:int,DAY_OF_WEEK:int,AIRLINE:c
har,FLIGHT_NUMBER:char,TAIL_NUMBER:char,ORIGIN_AIRPORT:char,DESTINATION_AIRPORT:char,SCHEDU
LED_DEPARTURE:char,DEPARTURE_TIME:char,DEPARTURE_DELAY:int,TAXI_OUT:int,WHEELS_OFF:char,SCH
EDULED_TIME:int,ELAPSED_TIME:int,AIR_TIME:int,DISTANCE:int,WHEELS_ON:char,TAXI_IN:int,SCHED
ULED_ARRIVAL:char,ARRIVAL_TIME:char,ARRIVAL_DELAY:int,DIVERTED:int,CANCELLED:int,CANCELLATI
ON_REASON:char,AIR_SYSTEM_DELAY:int,SECURITY_DELAY:int,AIRLINE_DELAY:int,LATE_AIRCRAFT_DELA
Y:int,WEATHER_DELAY:int')
Loading CSV with TABLE expression
Specify CSV file
name and location
CSV Column names
and Data Types
Page
INSERT { GRAPH <airline_flight_network> {
?OriginIRI a <Airport> ;
<AIRPORT_CODE> ?ORIGIN_AIRPORT .
<< ?OriginIRI <DESTINATION> ?DestinationIRI >> <DISTANCE> ?DISTANCE .
?DestinationIRI a <Airport> ;
<AIRPORT_CODE> ?DESTINATION_AIRPORT .
<< ?DestinationIRI <DESTINATION> ?OriginIRI >> <DISTANCE> ?DISTANCE .
?FlightIRI a <Flight> ;
<YEAR> ?YEAR ;
<MONTH> ?MONTH ;
<DAY> ?DAY ;
<DAY_OF_WEEK> ?DAY_OF_WEEK ;
<AIRLINE> ?AIRLINE;
<FLIGHT_NUMBER> ?FLIGHT_NUMBER;
<TAIL_NUMBER> ?TAIL_NUMBER;
<ORIGIN_AIRPORT> ?ORIGIN_AIRPORT;
<DESTINATION_AIRPORT> ?DESTINATION_AIRPORT;
Conversion from CSV to RDF* Triples via SPARQL
Node Type: Airport
Node Type: Airport
Node Type: Flight
Node Properties:
Flight Number,
Tail Number,
etc
Edge Property:
Distance
Flight Delay Data
Airport Info
Census Data
FAA Aircraft Registrations
Integration and ELT
Combining additional data sets
Flight
AirportAirport
FlightDeparture FlightArrival
DESTINATION
CityState
Aircraft
Airline
Country
Airline
Aircraft
CityState
Country
FAA Airline Census Data
Flight Delay
Now we are ready to ask questions like:
BI-Style Analytics
#1 Longest flight segments by distance from Boston (BOS)
#2 Airports less the 400 mi from Boston (BOS) - Network Viewer output
#3 Longest distances between two airports
#4 Longest flights by elapsed time
#5 Airlines with the longest average delays
#6 Airlines with the most flights
#7 Longest 2 segments reachable from Boston and the distances of each segment
#8 Which segments have the longest average departure delays
Graph Algorithms
#9 Page Rank - Graph Algorithm - Show most well-connected airports based on page rank algorithm
#10 Shortest Path Graph Algorithm - show shortest paths and # of segments (hops) from AUS
select * from <airline_flight_network> where
{ SERVICE <csi:shortest_path> {
[]
<csi:binding-source-vertex> ?source_vertex_variable_name ;
<csi:binding-vertex> ?node ;
<csi:binding-predecessor> ?predecessor_variable_name ;
<csi:binding-distance> ?distance ;
<csi:graph> <airline_flight_network> ;
<csi:source-vertex> <BOS> ;
<csi:destination-vertex> <HNL> ;
<csi:edge-label> <DESTINATION> ;
<csi:weighted> true .
}
}
Shortest Path graph algorithm leverages RDF*/SPARQL*
AIRLINE DEMO
©2019 Cambridge Semantics Inc. All rights reserved.
Scalability
Graph OLAP – Horizontally Scalable
Have more data. Need better performance. Add more servers
Deploy on VMs or bare metal with a TAR file that
is compatible with CentOS
Automated deployment in the Cloud.
Available in the AWS Marketplace & others soon
60 Day Full Feature Free Trial. Download or Cloud Deployment. Visit booth or AnzoGraph.com
Download AnzoGraph DB Free Edition Today! http://AnzoGraph.com
THANK YOU

Contenu connexe

Tendances

The Data Platform for Today’s Intelligent Applications
The Data Platform for Today’s Intelligent ApplicationsThe Data Platform for Today’s Intelligent Applications
The Data Platform for Today’s Intelligent Applications
Neo4j
 

Tendances (20)

How Graph Data Science can turbocharge your Knowledge Graph
How Graph Data Science can turbocharge your Knowledge GraphHow Graph Data Science can turbocharge your Knowledge Graph
How Graph Data Science can turbocharge your Knowledge Graph
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
 
Graph Databases for Master Data Management
Graph Databases for Master Data ManagementGraph Databases for Master Data Management
Graph Databases for Master Data Management
 
Demystifying Graph Neural Networks
Demystifying Graph Neural NetworksDemystifying Graph Neural Networks
Demystifying Graph Neural Networks
 
Introduction to Knowledge Graphs
Introduction to Knowledge GraphsIntroduction to Knowledge Graphs
Introduction to Knowledge Graphs
 
The Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j OverviewThe Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j Overview
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020
 
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data ScienceGet Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
 
Building a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and OntologiesBuilding a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and Ontologies
 
A Universe of Knowledge Graphs
A Universe of Knowledge GraphsA Universe of Knowledge Graphs
A Universe of Knowledge Graphs
 
Neo4j GraphSummit London March 2023 Emil Eifrem Keynote.pptx
Neo4j GraphSummit London March 2023 Emil Eifrem Keynote.pptxNeo4j GraphSummit London March 2023 Emil Eifrem Keynote.pptx
Neo4j GraphSummit London March 2023 Emil Eifrem Keynote.pptx
 
Neo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptx
Neo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptxNeo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptx
Neo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptx
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Building a modern data stack to maintain an efficient and safe electrical grid
Building a modern data stack to maintain an efficient and safe electrical gridBuilding a modern data stack to maintain an efficient and safe electrical grid
Building a modern data stack to maintain an efficient and safe electrical grid
 
World Health Organisation - Knowledge Representation and Reasoning for Public...
World Health Organisation - Knowledge Representation and Reasoning for Public...World Health Organisation - Knowledge Representation and Reasoning for Public...
World Health Organisation - Knowledge Representation and Reasoning for Public...
 
The Data Platform for Today’s Intelligent Applications
The Data Platform for Today’s Intelligent ApplicationsThe Data Platform for Today’s Intelligent Applications
The Data Platform for Today’s Intelligent Applications
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data Lake
 
Optimizing Your Supply Chain with the Neo4j Graph
Optimizing Your Supply Chain with the Neo4j GraphOptimizing Your Supply Chain with the Neo4j Graph
Optimizing Your Supply Chain with the Neo4j Graph
 
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
 

Similaire à Knowledge Graph for Machine Learning and Data Science

Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 

Similaire à Knowledge Graph for Machine Learning and Data Science (20)

Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
Neo4j GraphDay Seattle- Sept19- in the enterprise
Neo4j GraphDay Seattle- Sept19-  in the enterpriseNeo4j GraphDay Seattle- Sept19-  in the enterprise
Neo4j GraphDay Seattle- Sept19- in the enterprise
 
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J ConferenceNodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Roadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph StrategyRoadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph Strategy
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLPreparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML
 
Predictions for the Future of Graph Database
Predictions for the Future of Graph DatabasePredictions for the Future of Graph Database
Predictions for the Future of Graph Database
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
 
Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AI
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
 
Graph Analytics on Data from Meetup.com
Graph Analytics on Data from Meetup.comGraph Analytics on Data from Meetup.com
Graph Analytics on Data from Meetup.com
 

Plus de Cambridge Semantics

Plus de Cambridge Semantics (20)

Risk Analytics Using Knowledge Graphs / FIBO with Deep Learning
Risk Analytics Using Knowledge Graphs / FIBO with Deep LearningRisk Analytics Using Knowledge Graphs / FIBO with Deep Learning
Risk Analytics Using Knowledge Graphs / FIBO with Deep Learning
 
Using Machine Teaching in Text Analysis: Case Study on Using Machine Teaching...
Using Machine Teaching in Text Analysis: Case Study on Using Machine Teaching...Using Machine Teaching in Text Analysis: Case Study on Using Machine Teaching...
Using Machine Teaching in Text Analysis: Case Study on Using Machine Teaching...
 
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...
 
Graph-driven Data Integration: Accelerating and Automating Data Delivery for ...
Graph-driven Data Integration: Accelerating and Automating Data Delivery for ...Graph-driven Data Integration: Accelerating and Automating Data Delivery for ...
Graph-driven Data Integration: Accelerating and Automating Data Delivery for ...
 
Fireside Chat with Bloor Research: State of the Graph Database Market 2020
Fireside Chat with Bloor Research: State of the Graph Database Market 2020Fireside Chat with Bloor Research: State of the Graph Database Market 2020
Fireside Chat with Bloor Research: State of the Graph Database Market 2020
 
The Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge GraphThe Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge Graph
 
Introduction to RDF*
Introduction to RDF*Introduction to RDF*
Introduction to RDF*
 
AnzoGraph DB - SPARQL 101
AnzoGraph DB - SPARQL 101AnzoGraph DB - SPARQL 101
AnzoGraph DB - SPARQL 101
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
Healthcare and Life Sciences: Two Industries Separated by Common Data
Healthcare and Life Sciences: Two Industries Separated by Common DataHealthcare and Life Sciences: Two Industries Separated by Common Data
Healthcare and Life Sciences: Two Industries Separated by Common Data
 
Scalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and HowScalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and How
 
Modern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in InsuranceModern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in Insurance
 
Sustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive AnalyticsSustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive Analytics
 
Big Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data DemocratizationBig Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data Democratization
 
Modern Data Discovery and Integration in Retail Banking
Modern Data Discovery and Integration in Retail BankingModern Data Discovery and Integration in Retail Banking
Modern Data Discovery and Integration in Retail Banking
 
Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?
 
Going Beyond Rows and Columns with Graph Analytics
Going Beyond Rows and Columns with Graph AnalyticsGoing Beyond Rows and Columns with Graph Analytics
Going Beyond Rows and Columns with Graph Analytics
 
Accelerate Pharma R&D with Cross-Study Analytics
Accelerate Pharma R&D with Cross-Study AnalyticsAccelerate Pharma R&D with Cross-Study Analytics
Accelerate Pharma R&D with Cross-Study Analytics
 
Large Scale Graph Analytics with RDF and LPG Parallel Processing
Large Scale Graph Analytics with RDF and LPG Parallel ProcessingLarge Scale Graph Analytics with RDF and LPG Parallel Processing
Large Scale Graph Analytics with RDF and LPG Parallel Processing
 
Accelerate Digital Transformation with an Enterprise Big Data Fabric
Accelerate Digital Transformation with an Enterprise Big Data FabricAccelerate Digital Transformation with an Enterprise Big Data Fabric
Accelerate Digital Transformation with an Enterprise Big Data Fabric
 

Dernier

Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 

Dernier (20)

Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 

Knowledge Graph for Machine Learning and Data Science

  • 1. Thomas Cook Sales Director, AnzoGraph DB e: thomas.cook@cambridgesemantics.com w: www.anzograph.com Knowledge Graphs for Machine Learning and Data Science #DCAF 2020 Feb 6, 2020
  • 2. Data Continues to Grow AI and ML Adoption Grows for Better & Faster Insights Need for: – Automated Data Preparation & Better Understanding – Explainable AI & ML with Provenance – Improved Algorithms & Analytics – Cost Efficient Operations Context Knowledge Graphs & Graph Analytics
  • 3. Knowledge Graphs to Automate Data Preparation & Improve Common Understanding
  • 4. ©2019 Cambridge Semantics Inc. All rights reserved. The Data Preparation Problem Data Access ● Manual ETL coding ● Practicalities limit the # of sources and types of data Data Processing ● Laborious discovery, profiling and selection ● Use of rules and coding for harmonization & cleansing Feature Engineering ● Manual coding to transform data ● Manual feature engineering & selection 1 Cleaning Big Data, Forbes Magazine 70-80% of time spent in Data Preparation & Feature Engineering 4 Viewed as the “least enjoyable” part of work by 76% of data scientists1 Structured Data
  • 5. Automated Deployment and Operations Storage and Compute Integration MODEL Graph Data Model • Lift Data into Data Fabric • Design Ontologies • Connect Data Models ON-BOARD Ingest & Map • Automated ETL • Collaborative Mapping • Metadata Capture Enterprise Data Sources Machine Learning and AI Enterprise Search “Last Mile” Analytics Tools Metadata Catalog Semantic-based Metadata Management, Governance and Lineage Cloud or On-Prem Data Storage Infrastructure Data Storage Layer Ingest BLEND GraphMarts • Combine and Align Related Data Sets • In-memory MPP OLAP Query Engine • Data Layers ACCESS Hi-Res Analytics • Analyze All Data Together • Fast, Iterative Queries Ad Hoc, What if • Code Free or API Graphical Application Interface Anzo - The Modern Data Discovery and Integration Layer for the Enterprise Data Fabric
  • 6. ©2019 Cambridge Semantics Inc. All rights reserved. Automated Data Ingestion & Cataloging Unstructured Data Notes, Docs, Emails, Articles Structured Data Relational, CSV, HDFS, External Data Feeds CatalogIngest NLP, Text Analytics, Sentiment Analysis Data Catalog Semantic Layer Data Harmonization – Structured or Unstructured • Harmonize Many Data Sources • Automated Unstructured Data Extraction & Categorization Data Wrangling Capability • Profile, Rules… Manage & Clean incoming data • Setup Re-usable Data Wrangling Jobs • Provenance to Manage Data Data Catalog • Explore, Secure & Manage Dataset Assets Result: Cleaner, quality data, faster & from many more sources 6
  • 7. ©2018 Cambridge Semantics Inc. All rights reserved. A big web of data understandable at the data level
  • 8. ©2019 Cambridge Semantics Inc. All rights reserved. Allow for Easy Understanding & Handling of Data Rules to Link & Conform Data Raw Data Business Ready Datasets Create Data Layers Build Graph Marts
  • 9. ©2019 Cambridge Semantics Inc. All rights reserved. Expedite & Optimize Feature Engineering Use visual interface with no coding for feature selection • Query Knowledge Graph and generate features • Conduct data transformation using a library of functions • Compute new derived features • De-normalize data • Aggregate ranges • Convert numeric values to alpha values • Pivot values and much more!
  • 10. ©2019 Cambridge Semantics Inc. All rights reserved. Operationalizing of Machine Learning Models 10 Explainable Insights Manage Data Sets with Provenance & Data Lineage • Anzo retains end-to-end data lineage • Track transformations Easy to Export Data to ML & Data Science Tools • Use Odata/REST APIs, SQL ODBC/JDBC • Export to R, Python, downstream systems, … Deploy & Continuously Improve Model Performance • Set up deployment pipelines with learnings to help in feature selection • Horizontally scale runtime environment • Can be auto-deployed behind the firewall or on the Cloud
  • 11. Using Knowledge Graphs with Graph Analytics Database as Scalable Infrastructure for ML & Data Science
  • 12. “Graph analytics will grow in the next few years due to the need to ask complex questions across complex data, which is not always practical or even possible at scale using SQL queries” …Gartner – Top 10 Data and Analytics Technology Trends for 2019
  • 13. What it is: ● Fast, Scalable Graph Database ○ In-Memory Massively Parallel Processing (MPP) ACID-Compliant Graph Database ○ Supports RDF & Labelled Property Graphs What it does: ○ Fast Data Loading ○ Fast Query ○ Rich Analytics ■ Graph Algorithms ■ BI/DW Analytics ■ Inferencing ■ Data Science/Feature Engineering Algorithms ■ Define-Your-Own Analytics ○ Linear Database Scaling ○ Persist data on cheap storage Based on Open Standards • Built on RDF & SPARQL 1.1 standards • LPG with the RDF* /SPARQL* • LPG with Cypher (in 2020) Deploy on-prem or cloud • Kubernetes/Helm on-demand cloud deployment • AWS, Google and Azure AnzoGraph™ DB Awards Select Customers
  • 14. 217 X AnzoGraph DB when compared to Neo4j on and industry standard TPC-H & Graph 500 benchmarks 113 X AnzoGraph’s LUBM benchmark performance over previous fastest result 30 X AnzoGraph’s performance on graph algorithms over SPARK SQL and SPARK with GraphFrames Benchmarks
  • 15. ©2019 Cambridge Semantics Inc. All rights reserved. Graph OLAP Built for Analytics at Scale and Speed SQL OLAP vs Graph OLAP SQL OLAP Graph OLAP On-line Analytics at Massive Parallel Processing (MPP) Scale with SQL Database Example Netezza Amazon Redshift Analytics • Warehouse-Style BI Analytics On-line Analytics at Massive Parallel Processing (MPP) Scale with Native Graph Database Example AnzoGraph DB Analytics • Warehouse-Style BI Analytics • Graph Algorithms • Inferencing • Data Science Functions
  • 16. ©2019 Cambridge Semantics Inc. All rights reserved. Graph OLAP Built for Analytics at Scale and Speed Graph OLTP vs Graph OLAP Graph OLTP Graph OLAP Transactional databases • Built for building transactional applications & individual transactions • Scales vertically Example Neo4j AWS Neptune Analytical databases • Built for analytics and to deal with scale & performance • Deep Link analysis • Analytics on the population • Scales horizontally • Can complement Graph OLTP systems Example AnzoGraph DB
  • 17. Page Labelled Property Graphs facilitates Analytics isA: <Man> birthday: 09/17/1975 isA: <Woman> Birthday: 4/23/1979 isA: <Place> has: Water has: Trees partOf: <TheMountain> Person : Jill Person : Jack Place: The Hill friendOf WentUp WentUp metAt=<TheHill> metDate=07/04/2018 Date=07/04/2018 Date=07/04/2018 Today with RDF* and SPARQL* • Relationships can be described as clearly as any LPG database RDF*/SPARQL* extensions to the standard make W3C open standards databases even more capable
  • 18. Page Algorithms and Analytical Capabilities Graph Patterns Negation Property Paths BIND Aggregates Basic Federated Query ORDER BY and offsets Functions on Strings Functions on Numerics Functions on Dates and Times Hash Functions Basic Graph Patterns Count/Avg Min/Max GroupConcat Sample Page Rank Shortest Path All Path Label Propagation Weakly Connected Components K neighborhood Counting Triangles Inferences (RDFS+) Labeled Property Graphs (RDF*) Window Aggregates Advanced Grouping Sets Named Views Named Queries Conditional Expressions User-Defined Extensions SPARQL 1.1 Standards AnzoGraph® DB Extras Graph Algorithms and Inferencing Data Science Extensions (UDX) Distributions ● Bernoulli ● Binomial ● Chi-squared ● Exponential ● Hypergeometric ● Laplace ● Log Normal ● Logarithmic Series ● Negative Binomial ● Normal Correlations ● Pearson Entropy ● Cross Entropy ● Differential Entropy
  • 19. Page User-defined Extensions (UDXs): Allows users to extend AnzoGraph DB functionality for custom usage User-Defined Functions (UDF) Create and register custom analytic functions, such as functions that concatenate values or convert integers to alternate currencies. User-Defined Aggregates (UDA) Create and register aggregate functions, such as functions that compute the arithmetic mean or calculate the average number from a list of maximum and minimum values. User-Defined Services (UDS) Create and register services that create local SPARQL endpoints. User-Defined Tables (UDT) Create and register a function that is repeatedly invoked within a query to generate the rows of a table on-the-fly. Data Science Functions User- defined Functions (UDX) Functions you can build in JAVA or C++
  • 20. ©2019 Cambridge Semantics Inc. All rights reserved. Execute Supervised & Unsupervised ML with Graph Algorithms Graph Algorithm • PageRank • Shortest Path • K-neighbors • All Paths • Counting Triangles • Weakly Connected Components • Label Propagation • Triangle Enumeration • Triangle Counting • Clustering Coefficient and more! Who is the most influential person in your customer list? What’s the most important item relating to a search of your knowledge graph? What is the shortest path to your destination across a route? What’s the optimal path for packets to travel across your network source: Wikipedia Try functions via SPARQL in Zeppelin or python Jupyter Notebooks
  • 21. ©2019 Cambridge Semantics Inc. All rights reserved. Graph Algorithms produce additional Features to train ML Models Graph Algorithms source: Wikipedia Try functions via SPARQL in Zeppelin or python Jupyter Notebooks
  • 22. ©2019 Cambridge Semantics Inc. All rights reserved. Execute Inferencing using RDFS+ and OWL 2 RL Person: Jack Person: Jill Is Married Inference Is Married Person: Jack Person: Sam Knows Inference Knows AnzoGraph allows you to insert inferred triples into the specified target graph If Jack is married to Jill, then you can definitely infer that Jill is married to Jack Jack knows Sam, but Sam may not know Jack. Here, the inference is less clear Both cases are supported. Try functions via SPARQL in Zeppelin or python Jupyter Notebooks
  • 23. ©2019 Cambridge Semantics Inc. All rights reserved. ELT for Data Engineering •Wrangling, Blending, Munging, Transformations, Enrichment, Views •Use statistical functions, transformations or enrichment to get the data into the form needed for the downstream ML pipeline INSERT { graph <myNewGraph> { ?s a <Person>; <fullname> ?fullname } } USING <myOldGraph> WHERE { ?s a <Person>; <firstname> ?fname; <lastname> ?lname; BIND(CONCAT(?fname, “ “, ?lname) as ?fullname) }
  • 24. ©2019 Cambridge Semantics Inc. All rights reserved. ELT for Data Engineering Materialized Views – good for heavy calculations - perform once - use many times CREATE MATERIALIZED VIEW <ages> AS CONSTRUCT { ?person <age> ?age . } WHERE { GRAPH <tickit> { { SELECT ?person ((YEAR(?date))-(YEAR(xsd:dateTime(?birthdate))) AS ?age) WHERE { ?person <birthday> ?birthdate . BIND(xsd:dateTime(NOW()) AS ?date) } } } }
  • 25. ©2019 Cambridge Semantics Inc. All rights reserved. ELT for Data Engineering •Enrichment Add new features from federated call using SERVICE call to Linked Open Data Cloud or other internal SPARQL endpoints Example: Look up address and geocodes for company, census population data, crime rate, demographics, etc. All these can be new features to fed into ML pipeline
  • 26. ©2019 Cambridge Semantics Inc. All rights reserved. ML Step #1: Data Prep: Data Discovery and Feature Engineering 2.1 Bernoulli Distribution Determines the probability of Success or Failure (or Yes or No). 2.2 Binomial Distribution Determines the probability of success versus failure. 2.3 Chi-squared Distribution Determines the relationship between two categorical variables. 2.4 Exponential Distribution Determines the probability of event occurrence in time interval when past event number is unknown 2.5 Hypergeometric Distribution Determines the probability of success versus failure of a specific scenario. 2.6 Laplace Distribution Determines the probability of intervals. 2.7 Log Normal Distribution To model certain instances, such as the change in price distribution of a stock or commodity positions. 2.8 Logarithmic Series Distribution Determines the probability of occurrence of events like claim frequencies in insurance companies. 2.9 Negative Binomial Distribution Determines the probability of success versus failure. 2.10 Normal Distribution Model and determines probabilities of all natural and social data. 2.11 Poisson Distribution Determines the probability that a certain number of events will occur in a specific time period. 2.12 Skellam Distribution Determines the probability of two independent variables. 2.13 Beta-binomial Distribution Model number of successes in n binomial trials when probability of success p is a Beta random variable. 2.14 Continuous Uniform Distribution Assigns equal probability to all values between its minimum and maximum. 2.15 Discrete Uniform Distribution Determines the probability of finite number of outcomes equally likely to happen. 2.16 Student’s t-Distribution Determines the probability when sample size is small. 2.17 Weibull Distribution Used to assess product reliability, analyse life data and model failure times.
  • 27. ©2019 Cambridge Semantics Inc. All rights reserved. ML Step #1: Data Discovery and Feature Engineering Correlations 3.1 Pearson Correlation Coefficient Determines the positive, negative or no relationship between two variables. 3.2 Matthews Correlation Coefficient Determines the positive, negative or no relationship between two binary variables (0 & 1). 3.3 Spearman’s Rank Correlation Coefficient Measures the strength of a linear relationship between paired data. 5.1 Principal Component Analysis Reduces the dimensionality of large data sets and making predictive models. 6.1 Geometric Mean Determines the average growth rates. 6.2 Skewness Metric Calculates Pearson’s coefficient of skewness on Numeric Values. 6.3 T-Digest Metric Determines the percentile and quantile values accurately. Feature Exploration Profiling Metrics
  • 28. RDF*/SPARQL* Real World Example: Airline Delay Data Analysis
  • 29. Page Labelled Property Graphs facilitates Analytics isA: <Man> birthday: 09/17/1975 isA: <Woman> Birthday: 4/23/1979 isA: <Place> has: Water has: Trees partOf: <TheMountain> Person : Jill Person : Jack Place: The Hill friendOf WentUp WentUp metAt=<TheHill> metDate=07/04/2018 Date=07/04/2018 Date=07/04/2018 Today with RDF* and SPARQL* • Relationships can be described as clearly as any LPG database RDF*/SPARQL* extensions to the standard make W3C open standards databases even more capable
  • 31. YEAR MONTH DAY DAY_OF_WEEK AIRLINE FLIGHT_NUMBER TAIL_NUMBER ORIGIN_AIRPORT DESTINATION_AIRPORT SCHEDULED_DEPARTURE DEPARTURE_TIME DEPARTURE_DELAY TAXI_OUT WHEELS_OFF SCHEDULED_TIME ELAPSED_TIME Input CSV – 32 Columns - 5,819,080 records ELAPSED_TIME AIR_TIME DISTANCE WHEELS_ON TAXI_IN SCHEDULED_ARRIVAL ARRIVAL_TIME ARRIVAL_DELAY DIVERTED CANCELLED CANCELLATION_REASON AIR_SYSTEM_DELAY SECURITY_DELAY AIRLINE_DELAY LATE_AIRCRAFT_DELAY WEATHER_DELAY
  • 32. Conversion from CSV to Graph – Defining Triples Flight Airport Airport FlightDeparture FlightArrival DESTINATION FlightAirport Airport
  • 33. Conversion from CSV to Graph Flight AirportAirport FlightDeparture FlightArrival DESTINATION
  • 34. Nodes have types and properties Flight YEAR MONTH DAY DAY_OF_WEEK AIRLINE FLIGHT_NUMBER TAIL_NUMBER ORIGIN_AIRPORT DESTINATION_AIRPORT …. Node Type: Flight Node Properties: Airline, Flight Number, Tail Number, etc *Note: Types can also be called Labels, as in Labeled Property Graphs or LPG
  • 35. With RDF* edges can also have properties AirportAirport DESTINATION DISTANCE = 187 AIRPORT_CODE = ‘BOS” Edge Property: DISTANCE AIRPORT_CODE = ‘JFK”
  • 37. Page INSERT { GRAPH <airline_flight_network> { ?OriginIRI a <Airport> ; <AIRPORT_CODE> ?ORIGIN_AIRPORT . << ?OriginIRI <DESTINATION> ?DestinationIRI >> <DISTANCE> ?DISTANCE . ?DestinationIRI a <Airport> ; <AIRPORT_CODE> ?DESTINATION_AIRPORT . << ?DestinationIRI <DESTINATION> ?OriginIRI >> <DISTANCE> ?DISTANCE . ?FlightIRI a <Flight> ; <YEAR> ?YEAR ; <MONTH> ?MONTH ; <DAY> ?DAY ; <DAY_OF_WEEK> ?DAY_OF_WEEK ; <AIRLINE> ?AIRLINE; <FLIGHT_NUMBER> ?FLIGHT_NUMBER; <TAIL_NUMBER> ?TAIL_NUMBER; <ORIGIN_AIRPORT> ?ORIGIN_AIRPORT; <DESTINATION_AIRPORT> ?DESTINATION_AIRPORT; Conversion from CSV to RDF* Triples via SPARQL Node Type: Airport Node Type: Airport Node Type: Flight Node Properties: Flight Number, Tail Number, etc Edge Property: Distance
  • 38. Flight Delay Data Airport Info Census Data FAA Aircraft Registrations Integration and ELT
  • 39. Combining additional data sets Flight AirportAirport FlightDeparture FlightArrival DESTINATION CityState Aircraft Airline Country Airline Aircraft CityState Country FAA Airline Census Data Flight Delay
  • 40. Now we are ready to ask questions like: BI-Style Analytics #1 Longest flight segments by distance from Boston (BOS) #2 Airports less the 400 mi from Boston (BOS) - Network Viewer output #3 Longest distances between two airports #4 Longest flights by elapsed time #5 Airlines with the longest average delays #6 Airlines with the most flights #7 Longest 2 segments reachable from Boston and the distances of each segment #8 Which segments have the longest average departure delays Graph Algorithms #9 Page Rank - Graph Algorithm - Show most well-connected airports based on page rank algorithm #10 Shortest Path Graph Algorithm - show shortest paths and # of segments (hops) from AUS
  • 41. select * from <airline_flight_network> where { SERVICE <csi:shortest_path> { [] <csi:binding-source-vertex> ?source_vertex_variable_name ; <csi:binding-vertex> ?node ; <csi:binding-predecessor> ?predecessor_variable_name ; <csi:binding-distance> ?distance ; <csi:graph> <airline_flight_network> ; <csi:source-vertex> <BOS> ; <csi:destination-vertex> <HNL> ; <csi:edge-label> <DESTINATION> ; <csi:weighted> true . } } Shortest Path graph algorithm leverages RDF*/SPARQL*
  • 43. ©2019 Cambridge Semantics Inc. All rights reserved. Scalability Graph OLAP – Horizontally Scalable Have more data. Need better performance. Add more servers Deploy on VMs or bare metal with a TAR file that is compatible with CentOS Automated deployment in the Cloud. Available in the AWS Marketplace & others soon 60 Day Full Feature Free Trial. Download or Cloud Deployment. Visit booth or AnzoGraph.com
  • 44. Download AnzoGraph DB Free Edition Today! http://AnzoGraph.com