SlideShare une entreprise Scribd logo
1  sur  28
Gagg: A Graph Aggregation Operator
June 2nd 2015
Fadi Maali*, Stephane Campinas, Stefan Decker
ESWC2015
* Funded by the Irish Research Council
The Famous LOD Cloud
1/20
http://lod-cloud.net/
The Famous LOD Cloud - COLOURED
2/20
http://lod-cloud.net/
The Famous LOD Cloud
(from a different Angel)
3/20
The Famous LOD Cloud
(from a different Angel)
4/20
The Famous LOD Cloud
(from a different Angel)
5/20
The Famous LOD Cloud
(from a different Angel)
6/20
Graph Aggregation
Condenses a large graph into a structurally
similar but smaller graph by collapsing vertices
and edges
Graph Aggregation - Schema Discovery
8/20
Introducing RDF Graph Summary with application to Assisted SPARQL Formulation
Graph Aggregation - Requirements
9/20
:linkset
:dbpedia :bbc-music
:crossdomain
23k
1.2b
20m
triples
triples
triples
:open-license
:media
:cc-by-sa
:closed-license
:bbc-terms
subject
subject
licenselicense
subjectsTarget objectsTarget
Graph Aggregation Methods
1. Custom Code
error prone, time, efficiency…
2. SPARQL
error prone, time, efficiency…
3. Graph Databases
expressivity, optimisation…
4. Gagg, a first-class operator
Graph Aggregation Methods
1. Custom Code
error prone, time, efficiency…
2. SPARQL
error prone, time, efficiency…
3. Graph Databases
expressivity, optimisation…
4. Gagg, a first-class operator
Operational Semantics
In-memory evaluation algorithm
Experimental evaluation
Gagg: Two-steps Aggregation
11/20
Gagg: Two-steps Aggregation
11/20
● Relation & measure
● Subject dimension(s)
& measure
● Object dimension(s) &
measure
Uses aggregation
functions and a template
similar to CONSTRUCT
queries
Graph Aggregation - Requirements
12/20
:linkset
:dbpedia :bbc-music
:crossdomain
23k
1.2b
20m
triples
triples
triples
:open-license
:media
:cc-by-sa
:closed-license
:bbc-terms
subject
subject
licenselicense
subjectsTarget objectsTarget
measure
relation
Graph Aggregation - Requirements
12/20
:linkset
:dbpedia :bbc-music
:crossdomain
23k
1.2b
20m
triples
triples
triples
:open-license
:media
:cc-by-sa
:closed-license
:bbc-terms
subject
subject
licenselicense
subjectsTarget objectsTarget
measure
relation
?l a void:LinkSet ;
void:subjectsTarget ?s ;
void:objectsTarget ?o ;
void:triples ?m .
Graph Aggregation - Requirements
13/20
:linkset
:dbpedia :bbc-music
:crossdomain
23k
1.2b
20m
triples
triples
triples
:open-license
:media
:cc-by-sa
:closed-license
:bbc-terms
subject
subject
licenselicense
subjectsTarget objectsTarget
Graph Aggregation - Requirements
13/20
:linkset
:dbpedia :bbc-music
:crossdomain
23k
1.2b
20m
triples
triples
triples
:open-license
:media
:cc-by-sa
:closed-license
:bbc-terms
subject
subject
licenselicense
subjectsTarget objectsTarget
?s dct:subject ?sd ;
void:triple ?sm .
Graph Aggregation - Requirements
14/20
:linkset
:dbpedia :bbc-music
:crossdomain
23k
1.2b
20m
triples
triples
triples
:open-license
:media
:cc-by-sa
:closed-license
:bbc-terms
subject
subject
licenselicense
subjectsTarget objectsTarget
Graph Aggregation - Requirements
14/20
:linkset
:dbpedia :bbc-music
:crossdomain
23k
1.2b
20m
triples
triples
triples
:open-license
:media
:cc-by-sa
:closed-license
:bbc-terms
subject
subject
licenselicense
subjectsTarget objectsTarget
?o dct:subject ?od ;
void:triple ?om .
Graph Aggregation - Definition
15/20
Q=(D,M,E,N,R,f)
D: subject dimensions
M: subject measure
E: object dimensions
N: object measure
R: relation query
f: reduce function
?x ?sd
?x ?sm
?y ?od
?y ?om
?x ?m?p ?y
Graph Aggregation - Grouped Graph
16/20
crossdomain media
10k 33k 3k 1.2M ......
linksTo
Graph Aggregation - Evaluation
17/20
● Build a binding table
● Build the Grouped Graph
O(|B|) algorithm where B is the size of the binding
table
● Apply the reduction function
?x ?m?p ?y ?sd ?sm ?od ?om
Graph Aggregation - Experiment Setup
18/20
● Extended In-memory Apache Jena
using SSE
● BSBM and SP2B
● Type Summary and Bibliometrics
Graph Aggregation - Experiment Results
19/20
Data size
(#triples)
fullSPARQL 3SPARQLs reduced Gagg
5k 0.08 0.06 0.01 0.03
190k 9.84 1.25 0.42 0.55
370k 31.88 2.82 1.00 1.13
1.8M 454.07 13.48 4.37 5.61
Type Summary on BSBM data
Graph aggregation as a first-class operator
- Easier for users to express
- Easier for engines to support and optimise
- Easier for further research and study
Further Questions:
- Syntax and effect on SPARQL
- Distributed implementation
Conclusion
20/20
PREFIX : <http://example.org/>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-
ns#>
CONSTRUCT {
_:b0 a ?t1; :count COUNT(?sub.s) .
_:b1 a ?t2; :count COUNT(?obj.o).
_:b2 a rdf:Statement; rdf:predicate ?p; rdf:subject
_:b0; rdf:object _:b1; :count ?prop_count
} WHERE {
GRAPH_AGGREGATION {
?s ?p ?o
{?s a ?t1} GROUP BY ?t1 AS ?sub
{?o a ?t2} GROUP BY ?t2 AS ?obj
}
}
SELECT ?t1 ?count_s ?subId ?t2 ?count_o ?objId ?p (COUNT(*) AS
?rel_count){
{
SELECT ?t1 ?count_s ?subId ?t2 ?count_o ?objId ?p {
?s a ?t1 . ?s ?p ?o . ?o a ?t2 .
{
SELECT ?t1 ?subId (COUNT(DISTINCT ?s) AS ?count_s){
?s a ?t1 . ?s ?p ?o .?o a ?t2 .
BIND (iri(CONCAT(str(?t1), "_s")) AS ?subId)
} GROUP BY ?t1 ?subId
}
{
SELECT ?t2 ?objId (COUNT(DISTINCT ?t2) AS ?count_o){
?s a ?t1 .
?s ?p ?o .
?o a ?t2 .
BIND (iri(CONCAT(str(?t2), "_o")) AS ?objId)
} GROUP BY ?t2 ?objId
}
}
}
} GROUP BY ?t1 ?count_s ?t2 ?count_o ?subId ?objId
}

Contenu connexe

Tendances

Large-Margin Multiple Kernel Learning for Discriminative Features Selection a...
Large-Margin Multiple Kernel Learning for Discriminative Features Selection a...Large-Margin Multiple Kernel Learning for Discriminative Features Selection a...
Large-Margin Multiple Kernel Learning for Discriminative Features Selection a...babak hosseini
 
33734947 sap-pp-tables
33734947 sap-pp-tables33734947 sap-pp-tables
33734947 sap-pp-tablesSwapnil Rajane
 
Apache Spark™ is here to stay
Apache Spark™ is here to stayApache Spark™ is here to stay
Apache Spark™ is here to stayGiovanna Roda
 
The Many Uses of FME at PNM
The Many Uses of FME at PNMThe Many Uses of FME at PNM
The Many Uses of FME at PNMSafe Software
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overviewBigData_Europe
 
2012 02-08 autodesk infrastructure event in stockholm-multiconsult projects
2012 02-08 autodesk infrastructure event in stockholm-multiconsult projects2012 02-08 autodesk infrastructure event in stockholm-multiconsult projects
2012 02-08 autodesk infrastructure event in stockholm-multiconsult projectsÅge Langedrag
 
Network Rail - Esri UK Annual Conference 2016
Network Rail - Esri UK Annual Conference 2016Network Rail - Esri UK Annual Conference 2016
Network Rail - Esri UK Annual Conference 2016Esri UK
 
Modular Multi-Objective Genetic Algorithm for Large Scale Bi-level Problems
Modular Multi-Objective Genetic Algorithm for Large Scale Bi-level ProblemsModular Multi-Objective Genetic Algorithm for Large Scale Bi-level Problems
Modular Multi-Objective Genetic Algorithm for Large Scale Bi-level ProblemsStefano Costanzo
 
BDE-BDVA Webinar: BDE Technical Overview
BDE-BDVA Webinar: BDE Technical OverviewBDE-BDVA Webinar: BDE Technical Overview
BDE-BDVA Webinar: BDE Technical OverviewBigData_Europe
 
Multi-objective Genetic Algorithm Applied to Conceptual Design of Single-stag...
Multi-objective Genetic Algorithm Applied to Conceptual Design of Single-stag...Multi-objective Genetic Algorithm Applied to Conceptual Design of Single-stag...
Multi-objective Genetic Algorithm Applied to Conceptual Design of Single-stag...Masahiro Kanazaki
 
Partitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph ExecutionPartitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph Execution Chen Wu
 
Spark Summit EU talk by Chris Pool and Jeroen Vlek
Spark Summit EU talk by Chris Pool and Jeroen Vlek Spark Summit EU talk by Chris Pool and Jeroen Vlek
Spark Summit EU talk by Chris Pool and Jeroen Vlek Spark Summit
 
Tivoli Common Reporting and Cognos - Customer Case
Tivoli Common Reporting and Cognos - Customer CaseTivoli Common Reporting and Cognos - Customer Case
Tivoli Common Reporting and Cognos - Customer CaseIBM Danmark
 
위성이미지 객체 검출 대회 - 2등
위성이미지 객체 검출 대회 - 2등위성이미지 객체 검출 대회 - 2등
위성이미지 객체 검출 대회 - 2등DACON AI 데이콘
 
Static model development
Static model developmentStatic model development
Static model developmentKunal Rathod
 
h5web: a web-based viewer of HDF5 files
h5web: a web-based viewer of HDF5 filesh5web: a web-based viewer of HDF5 files
h5web: a web-based viewer of HDF5 filesPaNOSC
 
Faga C Map Bosc2008
Faga C Map Bosc2008Faga C Map Bosc2008
Faga C Map Bosc2008bosc_2008
 
Big Data Processing in Pharo
Big Data Processing in PharoBig Data Processing in Pharo
Big Data Processing in PharoESUG
 

Tendances (20)

Big Data Technology
Big Data TechnologyBig Data Technology
Big Data Technology
 
Large-Margin Multiple Kernel Learning for Discriminative Features Selection a...
Large-Margin Multiple Kernel Learning for Discriminative Features Selection a...Large-Margin Multiple Kernel Learning for Discriminative Features Selection a...
Large-Margin Multiple Kernel Learning for Discriminative Features Selection a...
 
33734947 sap-pp-tables
33734947 sap-pp-tables33734947 sap-pp-tables
33734947 sap-pp-tables
 
Apache Spark™ is here to stay
Apache Spark™ is here to stayApache Spark™ is here to stay
Apache Spark™ is here to stay
 
The Many Uses of FME at PNM
The Many Uses of FME at PNMThe Many Uses of FME at PNM
The Many Uses of FME at PNM
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overview
 
2012 02-08 autodesk infrastructure event in stockholm-multiconsult projects
2012 02-08 autodesk infrastructure event in stockholm-multiconsult projects2012 02-08 autodesk infrastructure event in stockholm-multiconsult projects
2012 02-08 autodesk infrastructure event in stockholm-multiconsult projects
 
Network Rail - Esri UK Annual Conference 2016
Network Rail - Esri UK Annual Conference 2016Network Rail - Esri UK Annual Conference 2016
Network Rail - Esri UK Annual Conference 2016
 
Modular Multi-Objective Genetic Algorithm for Large Scale Bi-level Problems
Modular Multi-Objective Genetic Algorithm for Large Scale Bi-level ProblemsModular Multi-Objective Genetic Algorithm for Large Scale Bi-level Problems
Modular Multi-Objective Genetic Algorithm for Large Scale Bi-level Problems
 
BDE-BDVA Webinar: BDE Technical Overview
BDE-BDVA Webinar: BDE Technical OverviewBDE-BDVA Webinar: BDE Technical Overview
BDE-BDVA Webinar: BDE Technical Overview
 
Multi-objective Genetic Algorithm Applied to Conceptual Design of Single-stag...
Multi-objective Genetic Algorithm Applied to Conceptual Design of Single-stag...Multi-objective Genetic Algorithm Applied to Conceptual Design of Single-stag...
Multi-objective Genetic Algorithm Applied to Conceptual Design of Single-stag...
 
Partitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph ExecutionPartitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph Execution
 
Spark Summit EU talk by Chris Pool and Jeroen Vlek
Spark Summit EU talk by Chris Pool and Jeroen Vlek Spark Summit EU talk by Chris Pool and Jeroen Vlek
Spark Summit EU talk by Chris Pool and Jeroen Vlek
 
Tivoli Common Reporting and Cognos - Customer Case
Tivoli Common Reporting and Cognos - Customer CaseTivoli Common Reporting and Cognos - Customer Case
Tivoli Common Reporting and Cognos - Customer Case
 
위성이미지 객체 검출 대회 - 2등
위성이미지 객체 검출 대회 - 2등위성이미지 객체 검출 대회 - 2등
위성이미지 객체 검출 대회 - 2등
 
Static model development
Static model developmentStatic model development
Static model development
 
Hello cloud 3
Hello  cloud 3Hello  cloud 3
Hello cloud 3
 
h5web: a web-based viewer of HDF5 files
h5web: a web-based viewer of HDF5 filesh5web: a web-based viewer of HDF5 files
h5web: a web-based viewer of HDF5 files
 
Faga C Map Bosc2008
Faga C Map Bosc2008Faga C Map Bosc2008
Faga C Map Bosc2008
 
Big Data Processing in Pharo
Big Data Processing in PharoBig Data Processing in Pharo
Big Data Processing in Pharo
 

Similaire à Gagg: A graph Aggregation Operator

LinkedGeoData and GeoKnow
LinkedGeoData and GeoKnowLinkedGeoData and GeoKnow
LinkedGeoData and GeoKnowgeoknow
 
glTF Update with Tony Parisi WebGL Meetup August 2013
glTF Update with Tony Parisi WebGL Meetup August 2013glTF Update with Tony Parisi WebGL Meetup August 2013
glTF Update with Tony Parisi WebGL Meetup August 2013Tony Parisi
 
Apache spark linkedin
Apache spark linkedinApache spark linkedin
Apache spark linkedinYukti Kaura
 
GCD-FPGA-Based-DesignE
GCD-FPGA-Based-DesignEGCD-FPGA-Based-DesignE
GCD-FPGA-Based-DesignEIbrahim Hejab
 
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016Ivan Ermilov
 
MapServer #ProTips 2015
MapServer #ProTips 2015MapServer #ProTips 2015
MapServer #ProTips 2015Jeff McKenna
 
mago3D FOSS4G NA 2018
mago3D FOSS4G NA 2018mago3D FOSS4G NA 2018
mago3D FOSS4G NA 2018정대 천
 
Graphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXGraphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXAndrea Iacono
 
Let's integrate CAD/BIM/GIS on the same platform: A practical approach in rea...
Let's integrate CAD/BIM/GIS on the same platform: A practical approach in rea...Let's integrate CAD/BIM/GIS on the same platform: A practical approach in rea...
Let's integrate CAD/BIM/GIS on the same platform: A practical approach in rea...SANGHEE SHIN
 
WMS Performance Shootout 2011
WMS Performance Shootout 2011WMS Performance Shootout 2011
WMS Performance Shootout 2011Jeff McKenna
 
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORMDUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORMVLSICS Design
 
ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jong...
ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jong...ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jong...
ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jong...Spark Summit
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache SparkIndicThreads
 
State of GeoServer 2.10
State of GeoServer 2.10State of GeoServer 2.10
State of GeoServer 2.10Jody Garnett
 
Linked Data (in low-resource) Platforms: a mapping for Constrained Applicatio...
Linked Data (in low-resource) Platforms: a mapping for Constrained Applicatio...Linked Data (in low-resource) Platforms: a mapping for Constrained Applicatio...
Linked Data (in low-resource) Platforms: a mapping for Constrained Applicatio...SisInfLab-SWoT @Politecnico di Bari
 
RVC: A Multi-Decoder CAL Composer Tool
RVC: A Multi-Decoder CAL Composer ToolRVC: A Multi-Decoder CAL Composer Tool
RVC: A Multi-Decoder CAL Composer ToolMDC_UNICA
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsConnected Data World
 
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...IRJET Journal
 

Similaire à Gagg: A graph Aggregation Operator (20)

LinkedGeoData and GeoKnow
LinkedGeoData and GeoKnowLinkedGeoData and GeoKnow
LinkedGeoData and GeoKnow
 
glTF Update with Tony Parisi WebGL Meetup August 2013
glTF Update with Tony Parisi WebGL Meetup August 2013glTF Update with Tony Parisi WebGL Meetup August 2013
glTF Update with Tony Parisi WebGL Meetup August 2013
 
Apache spark linkedin
Apache spark linkedinApache spark linkedin
Apache spark linkedin
 
Open layers
Open layersOpen layers
Open layers
 
GCD-FPGA-Based-DesignE
GCD-FPGA-Based-DesignEGCD-FPGA-Based-DesignE
GCD-FPGA-Based-DesignE
 
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
 
MapServer #ProTips 2015
MapServer #ProTips 2015MapServer #ProTips 2015
MapServer #ProTips 2015
 
mago3D FOSS4G NA 2018
mago3D FOSS4G NA 2018mago3D FOSS4G NA 2018
mago3D FOSS4G NA 2018
 
Graphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXGraphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphX
 
Let's integrate CAD/BIM/GIS on the same platform: A practical approach in rea...
Let's integrate CAD/BIM/GIS on the same platform: A practical approach in rea...Let's integrate CAD/BIM/GIS on the same platform: A practical approach in rea...
Let's integrate CAD/BIM/GIS on the same platform: A practical approach in rea...
 
WMS Performance Shootout 2011
WMS Performance Shootout 2011WMS Performance Shootout 2011
WMS Performance Shootout 2011
 
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORMDUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
 
ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jong...
ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jong...ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jong...
ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jong...
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache Spark
 
State of GeoServer 2.10
State of GeoServer 2.10State of GeoServer 2.10
State of GeoServer 2.10
 
Linked Data (in low-resource) Platforms: a mapping for Constrained Applicatio...
Linked Data (in low-resource) Platforms: a mapping for Constrained Applicatio...Linked Data (in low-resource) Platforms: a mapping for Constrained Applicatio...
Linked Data (in low-resource) Platforms: a mapping for Constrained Applicatio...
 
Bn26425431
Bn26425431Bn26425431
Bn26425431
 
RVC: A Multi-Decoder CAL Composer Tool
RVC: A Multi-Decoder CAL Composer ToolRVC: A Multi-Decoder CAL Composer Tool
RVC: A Multi-Decoder CAL Composer Tool
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
 

Plus de Fadi Maali

Towards an RDF Analytics Language: Learning from Successful Experiences
Towards an RDF Analytics Language: Learning from Successful ExperiencesTowards an RDF Analytics Language: Learning from Successful Experiences
Towards an RDF Analytics Language: Learning from Successful ExperiencesFadi Maali
 
RDF Analytics... SPARQL and Beyond
RDF Analytics... SPARQL and BeyondRDF Analytics... SPARQL and Beyond
RDF Analytics... SPARQL and BeyondFadi Maali
 
Linked Data lifecycle
Linked Data lifecycleLinked Data lifecycle
Linked Data lifecycleFadi Maali
 
Self-service Linked Government Data
Self-service Linked Government DataSelf-service Linked Government Data
Self-service Linked Government DataFadi Maali
 
Dcat - Machine Accessible Data Catalogues
Dcat - Machine Accessible Data CataloguesDcat - Machine Accessible Data Catalogues
Dcat - Machine Accessible Data CataloguesFadi Maali
 
Open data showcase
Open data showcaseOpen data showcase
Open data showcaseFadi Maali
 
Employing Google Refine to publish Linked Data
Employing Google Refine to publish Linked DataEmploying Google Refine to publish Linked Data
Employing Google Refine to publish Linked DataFadi Maali
 
Government data catalogues interoperability
Government data catalogues interoperabilityGovernment data catalogues interoperability
Government data catalogues interoperabilityFadi Maali
 

Plus de Fadi Maali (8)

Towards an RDF Analytics Language: Learning from Successful Experiences
Towards an RDF Analytics Language: Learning from Successful ExperiencesTowards an RDF Analytics Language: Learning from Successful Experiences
Towards an RDF Analytics Language: Learning from Successful Experiences
 
RDF Analytics... SPARQL and Beyond
RDF Analytics... SPARQL and BeyondRDF Analytics... SPARQL and Beyond
RDF Analytics... SPARQL and Beyond
 
Linked Data lifecycle
Linked Data lifecycleLinked Data lifecycle
Linked Data lifecycle
 
Self-service Linked Government Data
Self-service Linked Government DataSelf-service Linked Government Data
Self-service Linked Government Data
 
Dcat - Machine Accessible Data Catalogues
Dcat - Machine Accessible Data CataloguesDcat - Machine Accessible Data Catalogues
Dcat - Machine Accessible Data Catalogues
 
Open data showcase
Open data showcaseOpen data showcase
Open data showcase
 
Employing Google Refine to publish Linked Data
Employing Google Refine to publish Linked DataEmploying Google Refine to publish Linked Data
Employing Google Refine to publish Linked Data
 
Government data catalogues interoperability
Government data catalogues interoperabilityGovernment data catalogues interoperability
Government data catalogues interoperability
 

Dernier

Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 

Dernier (20)

Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 

Gagg: A graph Aggregation Operator

  • 1. Gagg: A Graph Aggregation Operator June 2nd 2015 Fadi Maali*, Stephane Campinas, Stefan Decker ESWC2015 * Funded by the Irish Research Council
  • 2. The Famous LOD Cloud 1/20 http://lod-cloud.net/
  • 3. The Famous LOD Cloud - COLOURED 2/20 http://lod-cloud.net/
  • 4. The Famous LOD Cloud (from a different Angel) 3/20
  • 5. The Famous LOD Cloud (from a different Angel) 4/20
  • 6. The Famous LOD Cloud (from a different Angel) 5/20
  • 7. The Famous LOD Cloud (from a different Angel) 6/20
  • 8. Graph Aggregation Condenses a large graph into a structurally similar but smaller graph by collapsing vertices and edges
  • 9. Graph Aggregation - Schema Discovery 8/20 Introducing RDF Graph Summary with application to Assisted SPARQL Formulation
  • 10. Graph Aggregation - Requirements 9/20 :linkset :dbpedia :bbc-music :crossdomain 23k 1.2b 20m triples triples triples :open-license :media :cc-by-sa :closed-license :bbc-terms subject subject licenselicense subjectsTarget objectsTarget
  • 11. Graph Aggregation Methods 1. Custom Code error prone, time, efficiency… 2. SPARQL error prone, time, efficiency… 3. Graph Databases expressivity, optimisation… 4. Gagg, a first-class operator
  • 12. Graph Aggregation Methods 1. Custom Code error prone, time, efficiency… 2. SPARQL error prone, time, efficiency… 3. Graph Databases expressivity, optimisation… 4. Gagg, a first-class operator Operational Semantics In-memory evaluation algorithm Experimental evaluation
  • 14. Gagg: Two-steps Aggregation 11/20 ● Relation & measure ● Subject dimension(s) & measure ● Object dimension(s) & measure Uses aggregation functions and a template similar to CONSTRUCT queries
  • 15. Graph Aggregation - Requirements 12/20 :linkset :dbpedia :bbc-music :crossdomain 23k 1.2b 20m triples triples triples :open-license :media :cc-by-sa :closed-license :bbc-terms subject subject licenselicense subjectsTarget objectsTarget measure relation
  • 16. Graph Aggregation - Requirements 12/20 :linkset :dbpedia :bbc-music :crossdomain 23k 1.2b 20m triples triples triples :open-license :media :cc-by-sa :closed-license :bbc-terms subject subject licenselicense subjectsTarget objectsTarget measure relation ?l a void:LinkSet ; void:subjectsTarget ?s ; void:objectsTarget ?o ; void:triples ?m .
  • 17. Graph Aggregation - Requirements 13/20 :linkset :dbpedia :bbc-music :crossdomain 23k 1.2b 20m triples triples triples :open-license :media :cc-by-sa :closed-license :bbc-terms subject subject licenselicense subjectsTarget objectsTarget
  • 18. Graph Aggregation - Requirements 13/20 :linkset :dbpedia :bbc-music :crossdomain 23k 1.2b 20m triples triples triples :open-license :media :cc-by-sa :closed-license :bbc-terms subject subject licenselicense subjectsTarget objectsTarget ?s dct:subject ?sd ; void:triple ?sm .
  • 19. Graph Aggregation - Requirements 14/20 :linkset :dbpedia :bbc-music :crossdomain 23k 1.2b 20m triples triples triples :open-license :media :cc-by-sa :closed-license :bbc-terms subject subject licenselicense subjectsTarget objectsTarget
  • 20. Graph Aggregation - Requirements 14/20 :linkset :dbpedia :bbc-music :crossdomain 23k 1.2b 20m triples triples triples :open-license :media :cc-by-sa :closed-license :bbc-terms subject subject licenselicense subjectsTarget objectsTarget ?o dct:subject ?od ; void:triple ?om .
  • 21. Graph Aggregation - Definition 15/20 Q=(D,M,E,N,R,f) D: subject dimensions M: subject measure E: object dimensions N: object measure R: relation query f: reduce function ?x ?sd ?x ?sm ?y ?od ?y ?om ?x ?m?p ?y
  • 22. Graph Aggregation - Grouped Graph 16/20 crossdomain media 10k 33k 3k 1.2M ...... linksTo
  • 23. Graph Aggregation - Evaluation 17/20 ● Build a binding table ● Build the Grouped Graph O(|B|) algorithm where B is the size of the binding table ● Apply the reduction function ?x ?m?p ?y ?sd ?sm ?od ?om
  • 24. Graph Aggregation - Experiment Setup 18/20 ● Extended In-memory Apache Jena using SSE ● BSBM and SP2B ● Type Summary and Bibliometrics
  • 25. Graph Aggregation - Experiment Results 19/20 Data size (#triples) fullSPARQL 3SPARQLs reduced Gagg 5k 0.08 0.06 0.01 0.03 190k 9.84 1.25 0.42 0.55 370k 31.88 2.82 1.00 1.13 1.8M 454.07 13.48 4.37 5.61 Type Summary on BSBM data
  • 26. Graph aggregation as a first-class operator - Easier for users to express - Easier for engines to support and optimise - Easier for further research and study Further Questions: - Syntax and effect on SPARQL - Distributed implementation Conclusion 20/20
  • 27. PREFIX : <http://example.org/> PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax- ns#> CONSTRUCT { _:b0 a ?t1; :count COUNT(?sub.s) . _:b1 a ?t2; :count COUNT(?obj.o). _:b2 a rdf:Statement; rdf:predicate ?p; rdf:subject _:b0; rdf:object _:b1; :count ?prop_count } WHERE { GRAPH_AGGREGATION { ?s ?p ?o {?s a ?t1} GROUP BY ?t1 AS ?sub {?o a ?t2} GROUP BY ?t2 AS ?obj } }
  • 28. SELECT ?t1 ?count_s ?subId ?t2 ?count_o ?objId ?p (COUNT(*) AS ?rel_count){ { SELECT ?t1 ?count_s ?subId ?t2 ?count_o ?objId ?p { ?s a ?t1 . ?s ?p ?o . ?o a ?t2 . { SELECT ?t1 ?subId (COUNT(DISTINCT ?s) AS ?count_s){ ?s a ?t1 . ?s ?p ?o .?o a ?t2 . BIND (iri(CONCAT(str(?t1), "_s")) AS ?subId) } GROUP BY ?t1 ?subId } { SELECT ?t2 ?objId (COUNT(DISTINCT ?t2) AS ?count_o){ ?s a ?t1 . ?s ?p ?o . ?o a ?t2 . BIND (iri(CONCAT(str(?t2), "_o")) AS ?objId) } GROUP BY ?t2 ?objId } } } } GROUP BY ?t1 ?count_s ?t2 ?count_o ?subId ?objId }