SlideShare une entreprise Scribd logo
1  sur  27
Télécharger pour lire hors ligne
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
© Copyright 2011 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
@fsheer
Fadi Maali
RDF Analytics… SPARQL and Beyond…
fadi.maali@deri.org
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Why analytics (1/2)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Why analytics (2/2)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Appetite Whetting (1/3)
Google accurately detects Flu trend ahead of the U.S.
Center for Disease Control.
http://www.google.org/flutrends/about/how.html
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
http://www.dailymail.co.uk/sciencetech/article-2120416/Twitter-predicts-stock-prices-
accurately-investment-tactic-say-scientists.html
Appetite Whetting (2/3)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Appetite Whetting (3/3)
http://www.nature.com/srep/2011/111215/srep00196/full/srep00196.html
Flavor pyramids for North American and East Asian
cuisines
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Data Science and RDF
Ø  Can we do “data science” using RDF data?
§  Do we have the data?
§  Do we have the tools?
Ø  Why should we use RDF?
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
RDF Characteristics
§  Graph data model
§  Clearly defined semantics
§  Support Web-scale distributed publication
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Available RDF Data
§  Freebase has 1.2 billion triples (Google)
§  The LOD Cloud has more than 31 billion triples
§  Embedded RDF data: schema.org, Drupal…
http://lod-cloud.net/
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Available RDF Tools
In this presentation we focus on the standard SPARQL:
q  W3C Recommendation
q  Supports Querying, transforming and updating RDF
data
q  Large number of available implementations
q  Define a communication protocol
q  427 public SPARQL endpoints
registered on the DataHub*
* http://sw.deri.org/~aidanh/docs/epmonitorISWC.pdf
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
RDF Data… a graph
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SELECT	
  ?name	
  
WHERE{	
  
	
  	
  ?p	
  :name	
  ?name	
  .	
  
}ORDER	
  BY	
  ?name	
  
SPARQL… Simple queries
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SELECT	
  ?gender	
  (COUNT(*)	
  AS	
  ?count)	
  
WHERE{	
  
	
  	
  ?p	
  :gender	
  ?gender	
  
}	
  GROUP	
  BY	
  ?gender	
  
SPARQL… BI queries
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SELECT	
  ?gender	
  (COUNT(*)	
  AS	
  ?count)	
  
WHERE{	
  
	
  	
  ?p	
  :gender	
  ?gender	
  
}	
  GROUP	
  BY	
  ?gender	
  
SPARQL… BI queries
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SELECT	
  ?name	
  (COUNT(?n)	
  AS	
  ?neighbours)	
  
WHERE{	
  
	
  	
  ?p	
  :knows	
  ?n	
  .	
  
	
  	
  ?p	
  :name>	
  ?name	
  .	
  
}	
  GROUP	
  BY	
  ?p	
  ?name	
  ORDER	
  BY	
  desc(?neighbours)	
  
SPARQL… BI queries
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SELECT	
  ?name	
  (COUNT(?n)	
  AS	
  ?neighbours)	
  
WHERE{	
  
	
  	
  ?p	
  :knows	
  ?n	
  .	
  
	
  	
  ?p	
  :name>	
  ?name	
  .	
  
}	
  GROUP	
  BY	
  ?p	
  ?name	
  ORDER	
  BY	
  desc(?neighbours)	
  
SPARQL… BI queries
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SPARQL… BI queries
Ø  How influential a person is within a social network
Ø  How a road is within an urban network
Ø  How central an employee in an enterprise
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SPARQL… Graph measure
Can we use SPARQL to compute shortest paths in
the graph?
Short answer: NO!
Long answer: Let’s try!
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SELECT	
  ?v1	
  ?v2	
  (MIN(?l)	
  AS	
  ?shortestPath)	
  
WHERE{	
  
	
  	
  {	
  
	
  	
  	
  	
  ?v1	
  :knows	
  ?v2	
  BIND	
  (1	
  AS	
  ?l)	
  
	
  	
  }	
  UNION	
  	
  
	
  	
  {	
  
	
  	
  	
  	
  ?v1	
  :knows{2}	
  ?v2	
  BIND	
  (2	
  AS	
  ?l)	
  
	
  	
  }	
  UNION	
  	
  
	
  	
  {	
  
	
  	
  	
  	
  ?v1	
  :knows{3}	
  ?v2	
  BIND	
  (3	
  AS	
  ?l)	
  
	
  	
  }	
  	
  
	
  	
  FILTER	
  (?v1	
  !=	
  ?v2)	
  
}	
  GROUP	
  BY	
  ?v1	
  ?v2	
  
SPARQL… graph measure
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SPARQL… graph measure
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SPARQL… graph measure
Ø  finding directions between physical locations
Ø  finding the most direct way to contact a person
Ø  finding the min-delay communication path
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SPARQL… clustering
Can we do clustering using SPARQL? YES!
Peer-pressure algorithm implemented using (almost
only) SPARQL*
* http://yarcdata.com/blog/?p=318
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
DROP	
  GRAPH	
  <urn:ga/g/xjz1>	
  ;	
  	
  
CREATE	
  GRAPH	
  <urn:ga/g/xjz1>;	
  	
  
INSERT	
  {GRAPH	
  <urn:ga/g/xjz1>	
  {?s	
  :cluster	
  ?clus3}}	
  WHERE	
  {	
  	
  
	
  	
  SELECT	
  ?s	
  (SAMPLE(?clus)	
  AS	
  ?clus3)	
  {	
  
	
  	
  	
  	
  {	
  SELECT	
  ?s	
  (MAX(?clusCt)	
  AS	
  ?maxClusCt)	
  	
  
	
  	
  	
  	
  	
  	
  {	
  SELECT	
  ?s	
  ?clus	
  (COUNT(?clus)	
  AS	
  ?clusCt)	
  WHERE	
  {	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ?s	
  :knows	
  ?o	
  .	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  GRAPH	
  <urn:ga/g/xjz0>	
  {	
  ?o	
  :cluster	
  ?clus	
  }	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  }	
  GROUP	
  BY	
  ?s	
  ?clus	
  	
  
	
  	
  	
  	
  	
  	
  }	
  GROUP	
  BY	
  ?s	
  	
  
	
  	
  	
  	
  }	
  	
  
	
  	
  	
  	
  {	
  SELECT	
  ?s	
  ?clus	
  (COUNT(?clus)	
  AS	
  ?clusCt)	
  WHERE	
  {	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  ?s	
  :knows	
  ?o	
  .	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  GRAPH	
  <urn:ga/g/xjz0>	
  {	
  ?o	
  :cluster	
  ?clus	
  }	
  	
  
	
  	
  	
  	
  	
  	
  }	
  GROUP	
  BY	
  ?s	
  ?clus	
  	
  
	
  	
  	
  	
  }	
  FILTER	
  (?clusCt	
  =	
  ?maxClusCt)	
  	
  
	
  	
  }	
  GROUP	
  BY	
  ?s	
  	
  
}	
  
SPARQL… clustering
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
DROP	
  GRAPH	
  <urn:ga/g/xjz1>	
  ;	
  	
  
CREATE	
  GRAPH	
  <urn:ga/g/xjz1>;	
  	
  
INSERT	
  {GRAPH	
  <urn:ga/g/xjz1>	
  {?s	
  :cluster	
  ?clus3}}	
  WHERE	
  {	
  	
  
	
  	
  SELECT	
  ?s	
  (SAMPLE(?clus)	
  AS	
  ?clus3)	
  {	
  
	
  	
  	
  	
  {	
  SELECT	
  ?s	
  (MAX(?clusCt)	
  AS	
  ?maxClusCt)	
  	
  
	
  	
  	
  	
  	
  	
  {	
  SELECT	
  ?s	
  ?clus	
  (COUNT(?clus)	
  AS	
  ?clusCt)	
  WHERE	
  {	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ?s	
  :knows	
  ?o	
  .	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  GRAPH	
  <urn:ga/g/xjz0>	
  {	
  ?o	
  :cluster	
  ?clus	
  }	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  }	
  GROUP	
  BY	
  ?s	
  ?clus	
  	
  
	
  	
  	
  	
  	
  	
  }	
  GROUP	
  BY	
  ?s	
  	
  
	
  	
  	
  	
  }	
  	
  
	
  	
  	
  	
  {	
  SELECT	
  ?s	
  ?clus	
  (COUNT(?clus)	
  AS	
  ?clusCt)	
  WHERE	
  {	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  ?s	
  :knows	
  ?o	
  .	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  GRAPH	
  <urn:ga/g/xjz0>	
  {	
  ?o	
  :cluster	
  ?clus	
  }	
  	
  
	
  	
  	
  	
  	
  	
  }	
  GROUP	
  BY	
  ?s	
  ?clus	
  	
  
	
  	
  	
  	
  }	
  FILTER	
  (?clusCt	
  =	
  ?maxClusCt)	
  	
  
	
  	
  }	
  GROUP	
  BY	
  ?s	
  	
  
}	
  
SPARQL… clustering
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SPARQL Expressivity
Ø  BI-like operations (rollup and drilldown)
Ø  Graph Measures
Ø  Iterative algorithms (Clustering)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SPARQL Scalability…
One approach is to use a scale-out architecture… think
MapReduce or Hadoop
q  Translate SPARQL into MapReduce
q  Process RDF data directly in MapReduce
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
All examples used in this presentation and equivalent of some
of them using Pig Latin is available at:
https://github.com/fadmaa/rdf-analytics
Conclusion
Ø  Can we do “data science” using RDF data?
§  Do we have the data? YES
§  Do we have the tools? Almost
v  Is SPARQL expressive enough? Almost
v  Does it scale? Yes… in principle, No in practice
v  Is it usable/easy? Not really

Contenu connexe

Tendances

Yosemite part-4 webinar-final
Yosemite part-4 webinar-finalYosemite part-4 webinar-final
Yosemite part-4 webinar-finalDATAVERSITY
 
Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data LakeFishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data LakeArangoDB Database
 
Semantics as the Basis of Advanced Cognitive Computing
Semantics as the Basis of Advanced Cognitive ComputingSemantics as the Basis of Advanced Cognitive Computing
Semantics as the Basis of Advanced Cognitive ComputingSemantic Web Company
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Simplilearn
 
Introduction into R for historians (part 1: introduction)
Introduction into R for historians (part 1: introduction)Introduction into R for historians (part 1: introduction)
Introduction into R for historians (part 1: introduction)Richard Zijdeman
 
Using Knowledge Graphs to Predict Customer Needs and Improve Quality
Using Knowledge Graphs to Predict Customer Needs and Improve QualityUsing Knowledge Graphs to Predict Customer Needs and Improve Quality
Using Knowledge Graphs to Predict Customer Needs and Improve QualityNeo4j
 
IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Auto...
IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Auto...IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Auto...
IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Auto...Dr. Haxel Consult
 
Session 2.1 ontological representation of the telecom domain for advanced a...
Session 2.1   ontological representation of the telecom domain for advanced a...Session 2.1   ontological representation of the telecom domain for advanced a...
Session 2.1 ontological representation of the telecom domain for advanced a...semanticsconference
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for freeAjay Ohri
 

Tendances (10)

Building DBpedia Japanese and Linked Data Cloud in Japanese
Building DBpedia Japanese and Linked Data Cloud in JapaneseBuilding DBpedia Japanese and Linked Data Cloud in Japanese
Building DBpedia Japanese and Linked Data Cloud in Japanese
 
Yosemite part-4 webinar-final
Yosemite part-4 webinar-finalYosemite part-4 webinar-final
Yosemite part-4 webinar-final
 
Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data LakeFishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake
 
Semantics as the Basis of Advanced Cognitive Computing
Semantics as the Basis of Advanced Cognitive ComputingSemantics as the Basis of Advanced Cognitive Computing
Semantics as the Basis of Advanced Cognitive Computing
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
 
Introduction into R for historians (part 1: introduction)
Introduction into R for historians (part 1: introduction)Introduction into R for historians (part 1: introduction)
Introduction into R for historians (part 1: introduction)
 
Using Knowledge Graphs to Predict Customer Needs and Improve Quality
Using Knowledge Graphs to Predict Customer Needs and Improve QualityUsing Knowledge Graphs to Predict Customer Needs and Improve Quality
Using Knowledge Graphs to Predict Customer Needs and Improve Quality
 
IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Auto...
IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Auto...IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Auto...
IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Auto...
 
Session 2.1 ontological representation of the telecom domain for advanced a...
Session 2.1   ontological representation of the telecom domain for advanced a...Session 2.1   ontological representation of the telecom domain for advanced a...
Session 2.1 ontological representation of the telecom domain for advanced a...
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for free
 

Similaire à RDF Analytics... SPARQL and Beyond

Visualize open data with Plone - eea.daviz PLOG 2013
Visualize open data with Plone - eea.daviz PLOG 2013Visualize open data with Plone - eea.daviz PLOG 2013
Visualize open data with Plone - eea.daviz PLOG 2013Antonio De Marinis
 
Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015dhiguero
 
Linked Data in Learning Analytics Tools
Linked Data in Learning Analytics ToolsLinked Data in Learning Analytics Tools
Linked Data in Learning Analytics ToolsMathieu d'Aquin
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezBig Data Spain
 
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonApache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonChristian Perone
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQLjeykottalam
 
MLconf NYC Shan Shan Huang
MLconf NYC Shan Shan HuangMLconf NYC Shan Shan Huang
MLconf NYC Shan Shan HuangMLconf
 
Three Functional Programming Technologies for Big Data
Three Functional Programming Technologies for Big DataThree Functional Programming Technologies for Big Data
Three Functional Programming Technologies for Big DataDynamical Software, Inc.
 
A Little SPARQL in your Analytics
A Little SPARQL in your AnalyticsA Little SPARQL in your Analytics
A Little SPARQL in your AnalyticsDr. Neil Brittliff
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and OutTravis Oliphant
 
Object Oriented Software Design Principles
Object Oriented Software Design PrinciplesObject Oriented Software Design Principles
Object Oriented Software Design PrinciplesindikaMaligaspe
 
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionChetan Khatri
 
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & AlluxioAlluxio, Inc.
 
Design for X: Exploring Product Design with Apache Spark and GraphLab
Design for X: Exploring Product Design with Apache Spark and GraphLabDesign for X: Exploring Product Design with Apache Spark and GraphLab
Design for X: Exploring Product Design with Apache Spark and GraphLabAmanda Casari
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Jason Dai
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with SparkKrishna Sankar
 
Briefing on the Modern ML Stack with R
 Briefing on the Modern ML Stack with R Briefing on the Modern ML Stack with R
Briefing on the Modern ML Stack with RDatabricks
 
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowChetan Khatri
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookKeiichiro Ono
 
Internals of Speeding up PySpark with Arrow
 Internals of Speeding up PySpark with Arrow Internals of Speeding up PySpark with Arrow
Internals of Speeding up PySpark with ArrowDatabricks
 

Similaire à RDF Analytics... SPARQL and Beyond (20)

Visualize open data with Plone - eea.daviz PLOG 2013
Visualize open data with Plone - eea.daviz PLOG 2013Visualize open data with Plone - eea.daviz PLOG 2013
Visualize open data with Plone - eea.daviz PLOG 2013
 
Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015
 
Linked Data in Learning Analytics Tools
Linked Data in Learning Analytics ToolsLinked Data in Learning Analytics Tools
Linked Data in Learning Analytics Tools
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonApache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQL
 
MLconf NYC Shan Shan Huang
MLconf NYC Shan Shan HuangMLconf NYC Shan Shan Huang
MLconf NYC Shan Shan Huang
 
Three Functional Programming Technologies for Big Data
Three Functional Programming Technologies for Big DataThree Functional Programming Technologies for Big Data
Three Functional Programming Technologies for Big Data
 
A Little SPARQL in your Analytics
A Little SPARQL in your AnalyticsA Little SPARQL in your Analytics
A Little SPARQL in your Analytics
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and Out
 
Object Oriented Software Design Principles
Object Oriented Software Design PrinciplesObject Oriented Software Design Principles
Object Oriented Software Design Principles
 
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
 
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
 
Design for X: Exploring Product Design with Apache Spark and GraphLab
Design for X: Exploring Product Design with Apache Spark and GraphLabDesign for X: Exploring Product Design with Apache Spark and GraphLab
Design for X: Exploring Product Design with Apache Spark and GraphLab
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with Spark
 
Briefing on the Modern ML Stack with R
 Briefing on the Modern ML Stack with R Briefing on the Modern ML Stack with R
Briefing on the Modern ML Stack with R
 
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter Notebook
 
Internals of Speeding up PySpark with Arrow
 Internals of Speeding up PySpark with Arrow Internals of Speeding up PySpark with Arrow
Internals of Speeding up PySpark with Arrow
 

Plus de Fadi Maali

Gagg: A graph Aggregation Operator
Gagg: A graph Aggregation OperatorGagg: A graph Aggregation Operator
Gagg: A graph Aggregation OperatorFadi Maali
 
Linked Data lifecycle
Linked Data lifecycleLinked Data lifecycle
Linked Data lifecycleFadi Maali
 
Self-service Linked Government Data
Self-service Linked Government DataSelf-service Linked Government Data
Self-service Linked Government DataFadi Maali
 
Dcat - Machine Accessible Data Catalogues
Dcat - Machine Accessible Data CataloguesDcat - Machine Accessible Data Catalogues
Dcat - Machine Accessible Data CataloguesFadi Maali
 
Open data showcase
Open data showcaseOpen data showcase
Open data showcaseFadi Maali
 
Employing Google Refine to publish Linked Data
Employing Google Refine to publish Linked DataEmploying Google Refine to publish Linked Data
Employing Google Refine to publish Linked DataFadi Maali
 
Government data catalogues interoperability
Government data catalogues interoperabilityGovernment data catalogues interoperability
Government data catalogues interoperabilityFadi Maali
 

Plus de Fadi Maali (7)

Gagg: A graph Aggregation Operator
Gagg: A graph Aggregation OperatorGagg: A graph Aggregation Operator
Gagg: A graph Aggregation Operator
 
Linked Data lifecycle
Linked Data lifecycleLinked Data lifecycle
Linked Data lifecycle
 
Self-service Linked Government Data
Self-service Linked Government DataSelf-service Linked Government Data
Self-service Linked Government Data
 
Dcat - Machine Accessible Data Catalogues
Dcat - Machine Accessible Data CataloguesDcat - Machine Accessible Data Catalogues
Dcat - Machine Accessible Data Catalogues
 
Open data showcase
Open data showcaseOpen data showcase
Open data showcase
 
Employing Google Refine to publish Linked Data
Employing Google Refine to publish Linked DataEmploying Google Refine to publish Linked Data
Employing Google Refine to publish Linked Data
 
Government data catalogues interoperability
Government data catalogues interoperabilityGovernment data catalogues interoperability
Government data catalogues interoperability
 

Dernier

On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIShubhangi Sonawane
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesShubhangi Sonawane
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 

Dernier (20)

On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 

RDF Analytics... SPARQL and Beyond

  • 1. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge © Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge @fsheer Fadi Maali RDF Analytics… SPARQL and Beyond… fadi.maali@deri.org
  • 2. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Why analytics (1/2)
  • 3. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Why analytics (2/2)
  • 4. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Appetite Whetting (1/3) Google accurately detects Flu trend ahead of the U.S. Center for Disease Control. http://www.google.org/flutrends/about/how.html
  • 5. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge http://www.dailymail.co.uk/sciencetech/article-2120416/Twitter-predicts-stock-prices- accurately-investment-tactic-say-scientists.html Appetite Whetting (2/3)
  • 6. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Appetite Whetting (3/3) http://www.nature.com/srep/2011/111215/srep00196/full/srep00196.html Flavor pyramids for North American and East Asian cuisines
  • 7. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Data Science and RDF Ø  Can we do “data science” using RDF data? §  Do we have the data? §  Do we have the tools? Ø  Why should we use RDF?
  • 8. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge RDF Characteristics §  Graph data model §  Clearly defined semantics §  Support Web-scale distributed publication
  • 9. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Available RDF Data §  Freebase has 1.2 billion triples (Google) §  The LOD Cloud has more than 31 billion triples §  Embedded RDF data: schema.org, Drupal… http://lod-cloud.net/
  • 10. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Available RDF Tools In this presentation we focus on the standard SPARQL: q  W3C Recommendation q  Supports Querying, transforming and updating RDF data q  Large number of available implementations q  Define a communication protocol q  427 public SPARQL endpoints registered on the DataHub* * http://sw.deri.org/~aidanh/docs/epmonitorISWC.pdf
  • 11. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge RDF Data… a graph
  • 12. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SELECT  ?name   WHERE{      ?p  :name  ?name  .   }ORDER  BY  ?name   SPARQL… Simple queries
  • 13. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SELECT  ?gender  (COUNT(*)  AS  ?count)   WHERE{      ?p  :gender  ?gender   }  GROUP  BY  ?gender   SPARQL… BI queries
  • 14. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SELECT  ?gender  (COUNT(*)  AS  ?count)   WHERE{      ?p  :gender  ?gender   }  GROUP  BY  ?gender   SPARQL… BI queries
  • 15. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SELECT  ?name  (COUNT(?n)  AS  ?neighbours)   WHERE{      ?p  :knows  ?n  .      ?p  :name>  ?name  .   }  GROUP  BY  ?p  ?name  ORDER  BY  desc(?neighbours)   SPARQL… BI queries
  • 16. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SELECT  ?name  (COUNT(?n)  AS  ?neighbours)   WHERE{      ?p  :knows  ?n  .      ?p  :name>  ?name  .   }  GROUP  BY  ?p  ?name  ORDER  BY  desc(?neighbours)   SPARQL… BI queries
  • 17. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SPARQL… BI queries Ø  How influential a person is within a social network Ø  How a road is within an urban network Ø  How central an employee in an enterprise
  • 18. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SPARQL… Graph measure Can we use SPARQL to compute shortest paths in the graph? Short answer: NO! Long answer: Let’s try!
  • 19. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SELECT  ?v1  ?v2  (MIN(?l)  AS  ?shortestPath)   WHERE{      {          ?v1  :knows  ?v2  BIND  (1  AS  ?l)      }  UNION        {          ?v1  :knows{2}  ?v2  BIND  (2  AS  ?l)      }  UNION        {          ?v1  :knows{3}  ?v2  BIND  (3  AS  ?l)      }        FILTER  (?v1  !=  ?v2)   }  GROUP  BY  ?v1  ?v2   SPARQL… graph measure
  • 20. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SPARQL… graph measure
  • 21. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SPARQL… graph measure Ø  finding directions between physical locations Ø  finding the most direct way to contact a person Ø  finding the min-delay communication path
  • 22. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SPARQL… clustering Can we do clustering using SPARQL? YES! Peer-pressure algorithm implemented using (almost only) SPARQL* * http://yarcdata.com/blog/?p=318
  • 23. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge DROP  GRAPH  <urn:ga/g/xjz1>  ;     CREATE  GRAPH  <urn:ga/g/xjz1>;     INSERT  {GRAPH  <urn:ga/g/xjz1>  {?s  :cluster  ?clus3}}  WHERE  {        SELECT  ?s  (SAMPLE(?clus)  AS  ?clus3)  {          {  SELECT  ?s  (MAX(?clusCt)  AS  ?maxClusCt)                {  SELECT  ?s  ?clus  (COUNT(?clus)  AS  ?clusCt)  WHERE  {                        ?s  :knows  ?o  .                        GRAPH  <urn:ga/g/xjz0>  {  ?o  :cluster  ?clus  }                    }  GROUP  BY  ?s  ?clus                }  GROUP  BY  ?s            }            {  SELECT  ?s  ?clus  (COUNT(?clus)  AS  ?clusCt)  WHERE  {                    ?s  :knows  ?o  .                    GRAPH  <urn:ga/g/xjz0>  {  ?o  :cluster  ?clus  }                }  GROUP  BY  ?s  ?clus            }  FILTER  (?clusCt  =  ?maxClusCt)        }  GROUP  BY  ?s     }   SPARQL… clustering
  • 24. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge DROP  GRAPH  <urn:ga/g/xjz1>  ;     CREATE  GRAPH  <urn:ga/g/xjz1>;     INSERT  {GRAPH  <urn:ga/g/xjz1>  {?s  :cluster  ?clus3}}  WHERE  {        SELECT  ?s  (SAMPLE(?clus)  AS  ?clus3)  {          {  SELECT  ?s  (MAX(?clusCt)  AS  ?maxClusCt)                {  SELECT  ?s  ?clus  (COUNT(?clus)  AS  ?clusCt)  WHERE  {                        ?s  :knows  ?o  .                        GRAPH  <urn:ga/g/xjz0>  {  ?o  :cluster  ?clus  }                    }  GROUP  BY  ?s  ?clus                }  GROUP  BY  ?s            }            {  SELECT  ?s  ?clus  (COUNT(?clus)  AS  ?clusCt)  WHERE  {                    ?s  :knows  ?o  .                    GRAPH  <urn:ga/g/xjz0>  {  ?o  :cluster  ?clus  }                }  GROUP  BY  ?s  ?clus            }  FILTER  (?clusCt  =  ?maxClusCt)        }  GROUP  BY  ?s     }   SPARQL… clustering
  • 25. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SPARQL Expressivity Ø  BI-like operations (rollup and drilldown) Ø  Graph Measures Ø  Iterative algorithms (Clustering)
  • 26. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SPARQL Scalability… One approach is to use a scale-out architecture… think MapReduce or Hadoop q  Translate SPARQL into MapReduce q  Process RDF data directly in MapReduce
  • 27. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge All examples used in this presentation and equivalent of some of them using Pig Latin is available at: https://github.com/fadmaa/rdf-analytics Conclusion Ø  Can we do “data science” using RDF data? §  Do we have the data? YES §  Do we have the tools? Almost v  Is SPARQL expressive enough? Almost v  Does it scale? Yes… in principle, No in practice v  Is it usable/easy? Not really