SlideShare a Scribd company logo
1 of 20
Interest-Based RDF Update
Propagation
Kemele M. Endris, Sidra Faisal, Fabrizio Orlandi,
Sӧren Auer, Simon Scerri
EIS - Enterprise Information Systems
University of Bonn & Fraunhofer IAIS
ISWC 2015
Motivation: Problems accessing remote SPARQL endpoints
• Challenges:
• Availability is not guaranteed
• Low performance due to high load and traffic
• Restriction on the query forms and number
of results
• Possible solution:
• Setup a local replica of
source datasets
Unavailable!
"Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 2
Motivation: Syncing using a naïve approach does not scale
• Setting up local replica:
• Requires manual infrastructure setup and
maintenance
• Requires full data loading
• Data Freshness not guaranteed
• Approach to sync with source (Naïve):
1. Fully-loading releases of dataset dumps in
intervals
• Replace old versions of a dataset with new ones
2. Continuous synchronization
• Propagate all changes, including irrelevant ones
Source ReplicaMirror Tool
+200K triples
+100K triples
+120K triples
+400K triples
E
v
o
l
u
t
i
o
n
Changesets
"Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 3
Definition: Changeset - ∆
• Delta of a dataset V within two time points: t1 and t0 (t1> t0)
∆(Vt1
) = <Dt1-t0
, At1-t0
>
• Two components:
• Set of removed triples
Dt1-t0
= Vt0
 Vt1
• Set of added triples
At1-t0
= Vt1
 Vt0
"Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 4
Dataset Mirror Tool: Naïve approach
• A typical dataset mirror tool applies a changeset on a replica
• It removes the set of deleted triples and insert the set of added triples
• e.g., DBpedia Live Mirror tool
Vt1
= (Vt0
 Dt1-t0
) UAt1-t0Removed Triples:
dbr:Marcel dbp:goals 1 .
dbr:Marcel dbo:team dbr:FNFT .
dbr:Tim%02 foaf:name "Tim Berners-Lee" .
dbr:Cristiano_Ronaldo dbo:goals 205 .
Added Triples:
dbr:Cristiano_Ronaldo dbo:goals 230 .
dbr:Barack_Obama foaf:name "Barack Obama" .
dbr:Barack_Obama foaf:homepage "http://www.barackobama.com" .
dbr:Rio_Ferdinand a foaf:Person .
dbr:Rio_Ferdinand a dbo:Athlete .
dbr:Rio_Ferdinand dbp:goals 7 .
dbr:Arvid_Smit a dbo:Athlete .
replica
Replica at time t0
"Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 5
Our Approach: Interest-based update propagation
• Setup a slice (subset) of a dataset
• Interest-based update propagation
• Interest expressions using SPARQL basic
graph patterns
• Interest expressions are evaluated on
deltas (changesets) of the source
dataset
• Only triples that match interest
expression shipped to the replica
dataset
Source ReplicaiRap
+200K triples
+100K triples
+120K triples
+400K triples +2K triples
+12K triples
+2K triples
+5K triplesE
v
o
l
u
t
i
o
n
Changesets
Interesting
Changesets
iRap: Interest-based RDF update propagation framework
"Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 6
Interest-based update propagation
"Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 7
Source
Vt1
Interest evaluationChangeset ∆(Vt1
) Interesting
Changeset Δ(τt1
)
Replica
τt0
Interest
(ig)
𝐷𝑡1−𝑡0
𝐴 𝑡1−𝑡0
𝑟𝑡1−𝑡0
𝑈 𝑟𝑡1−𝑡0
′
𝑎 𝑡1−𝑡0
Candidate
Generation
Candidate
Assertion
Candidate
Generation
Candidate
Assertion
Potentially
Interesting dataset Potentially interesting changeset ∆(ρt1
)
Interest expression
• Based on SPARQL graph pattern
ig = <τ, b, op>
• Composed of:
• b - BGP – basic graph pattern
• op - OGP – optional graph pattern
• g - Source dataset URI – where changesets are downloaded from
• τ - Replica (target) dataset URI – where interesting changes are propagated to
CONSTRUCT WHERE {
?athlete a dbo:Athlete .
?athlete dbp:goals ?goals .
OPTIONAL {
?athlete foaf:homepage ?page .
}
}
BGP
OGP
"Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 8
Interest evaluation (1)
• Two steps:
I. Interest candidate generation
• Performs matching between the interest expression, ig, and a changeset, ∆(Vt1
)
• Generates a set of candidate triples
• e.g.,
II. Interest candidate assertion
• Evaluates candidate triples combined with the interest query on the target dataset
• e.g.,
• Performs matching between the assertion query with the target dataset
SELECT * WHERE {
?a a dbo:Athlete .
?a dbp:goals ?goals.
} VALUES (?a ?goals){ (dbr:Cristiano_Ronaldo 205) }
{ dbr:Cristiano_Ronaldo dbp:goals 205. }
"Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 9
Interest evaluation (2)
• Execution of the two steps on a changeset results in a set of:
• Interesting removed (and added) triples
• Potentially removed (and added) interesting triples
• Uninteresting triples
"Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 10
Remember?!
• A typical dataset mirror tool
Vt1
= (Vt0
 Dt1-t0
) UAt1-t0
Removed Triples:
dbr:Marcel dbp:goals 1 .
dbr:Marcel dbo:team dbr:FNFT .
dbr:Tim%02 foaf:name "Tim Berners-Lee" .
dbr:Cristiano_Ronaldo dbo:goals 205 .
Added Triples:
dbr:Cristiano_Ronaldo dbo:goals 230 .
dbr:Barack_Obama foaf:name "Barack Obama" .
dbr:Barack_Obama foaf:homepage "http://www.barackobama.com/" .
dbr:Rio_Ferdinand a foaf:Person .
dbr:Rio_Ferdinand a dbo:Athlete .
dbr:Rio_Ferdinand dbp:goals 7 .
dbr:Arvid_Smit a dbo:Athlete .
Replica
Replica at time t0
CONSTRUCT WHERE {
?athlete a dbo:Athlete .
?athlete dbp:goals ?goals .
OPTIONAL {
?athlete foaf:homepage ?page .
}
}
Interest query:
"Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 11
Interest-based update propagation
• Interesting Changeset Δ(τt1
):
• Interesting removed triples
• Interesting added triples
• Potentially Interesting Changeset ∆(ρt1
):
• Potentially interesting removed triples
• Potentially interesting added triples
Interesting Removed Triples:
dbr:Marcel dbp:goals 1 .
dbr:Cristiano_Ronaldo dbo:goals 205 .
Interesting Added Triples:
dbr:Cristiano_Ronaldo dbo:goals 230 .
dbr:Rio_Ferdinand a dbo:Athlete .
dbr:Rio_Ferdinand dbp:goals 7 .
replica
pi
Potentially Interesting Triples:
dbr:Barack_Obama foaf:homepage "http://www.barackobama.com/" .
dbr:Arvid_Smit a dbo:Athlete .
dbr:Marcel a dbo:Athlete .
INSERT DATA{…}
Potentially interesting
dataset
Athlete Dataset τt0
:
dbr:Marcel a dbo:Athlete .
dbr:Marcel dbp:goals 1 .
dbr:Cristiano_Ronaldo a dbo:Athlete .
dbr:Cristiano_Ronaldo dbo:goals 205 .
dbr:Cristiano_Ronaldo foaf:homepage
"http://cristianoronaldo.com" .
"Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 12
Athlete Dataset τt1
:
dbr:Cristiano_Ronaldo a dbo:Athlete .
dbr:Cristiano_Ronaldo dbo:goals 230.
dbr:Cristiano_Ronaldo foaf:homepage
"http://cristianoronaldo.com" .
dbr:Rio_Ferdinand a dbo:Athlete .
dbr:Rio_Ferdinand dbp:goals 7 .
iRap Framework
• Reference implementation of our approach
• Implemented using Java and Jena
• Open-source
• http://eis.iai.uni-bonn.de/Projects/iRap
"Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 13
Experimental setting
//Football dataset interest
CONSTRUCT WHERE {
?footballer a dbo:SoccerPlayer .
?footballer foaf:name ?name.
?footballer dbo:team ?team .
?team rdfs:label ?teamName.
}
//Location dataset interest
CONSTRUCT WHERE {
?location a ?type .
?location wgs:long ?long .
?location wgs:lat ?lat .
?location rdfs:label ?label .
?location dbo:abstract ?abstract .
OPTIONAL { ?location dcterms:subject ?subject }
}
• DBpedia 2014 – as a source dataset
• 12,057 Changesets
from Oct 01 – 15, 2014
• Two target datasets
1) Football dataset
• Slice of DBpedia: 265K triples
2) Location dataset
• Complete DBpedia 2014 dataset
• Two interest expressions:
"Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 14
1)
2)
Results: Football dataset
"Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 15
• Interesting triples:
• 0.38% of the total
removed triples
• 0.34% of the total
added triples
Results: Football dataset growth
"Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 16
Results: Location dataset
"Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 17
• Interesting triples:
• 4.38% of the total
removed triples
• 1.81% of the total
added triples
Results: Location dataset growth
"Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 18
Conclusion
• A novel approach for interest-based RDF update propagation and
detailed formalizations
• Our evaluation shows that our method can significantly cut down the
size of the data updates
• Summary of results from 12,057 changesets of DBpedia 2014
"Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 19
Dataset Interesting removed triples Interesting added triples
Football dataset 0.38% 0.34%
Location dataset 4.38% 1.81%
Thank you for your attention!
Questions?
• More on our Website:
http://eis.iai.uni-bonn.de/Projects/iRap
• Contact:
Google group: irap-ld@googlegroups.com
Twitter: @KemeleM - @BadmotorF
"Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 20

More Related Content

What's hot

SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLabSF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
Chester Chen
 
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
Chris Fregly
 
Hdf Augmentation: Interoperability in the Last Mile
Hdf Augmentation: Interoperability in the Last MileHdf Augmentation: Interoperability in the Last Mile
Hdf Augmentation: Interoperability in the Last Mile
Ted Habermann
 
IPython Notebook as a Unified Data Science Interface for Hadoop
IPython Notebook as a Unified Data Science Interface for HadoopIPython Notebook as a Unified Data Science Interface for Hadoop
IPython Notebook as a Unified Data Science Interface for Hadoop
DataWorks Summit
 

What's hot (20)

HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint FederationHiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
 
Shortening the feedback loop
Shortening the feedback loopShortening the feedback loop
Shortening the feedback loop
 
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data CubesSAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
 
Cypher and apache spark multiple graphs and more in open cypher
Cypher and apache spark  multiple graphs and more in  open cypherCypher and apache spark  multiple graphs and more in  open cypher
Cypher and apache spark multiple graphs and more in open cypher
 
FedX - Optimization Techniques for Federated Query Processing on Linked Data
FedX - Optimization Techniques for Federated Query Processing on Linked DataFedX - Optimization Techniques for Federated Query Processing on Linked Data
FedX - Optimization Techniques for Federated Query Processing on Linked Data
 
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLabSF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
 
Federated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedFederated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFed
 
Python in big data world
Python in big data worldPython in big data world
Python in big data world
 
Data at Spotify
Data at SpotifyData at Spotify
Data at Spotify
 
Dancing in R
Dancing in RDancing in R
Dancing in R
 
Semantic Web Technologies in Health Care Analytics
Semantic Web Technologies in Health Care AnalyticsSemantic Web Technologies in Health Care Analytics
Semantic Web Technologies in Health Care Analytics
 
RDF Stream Processing Models (RSP2014)
RDF Stream Processing Models (RSP2014)RDF Stream Processing Models (RSP2014)
RDF Stream Processing Models (RSP2014)
 
Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organi...
Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organi...Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organi...
Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organi...
 
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
 
Semantic web for ontology chapter4 bynk
Semantic web for ontology chapter4 bynkSemantic web for ontology chapter4 bynk
Semantic web for ontology chapter4 bynk
 
Hdf Augmentation: Interoperability in the Last Mile
Hdf Augmentation: Interoperability in the Last MileHdf Augmentation: Interoperability in the Last Mile
Hdf Augmentation: Interoperability in the Last Mile
 
How to Build a Real Time Analytics Enterprise with Open Source
How to Build a Real Time Analytics Enterprise with Open Source How to Build a Real Time Analytics Enterprise with Open Source
How to Build a Real Time Analytics Enterprise with Open Source
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ Spotify
 
IPython Notebook as a Unified Data Science Interface for Hadoop
IPython Notebook as a Unified Data Science Interface for HadoopIPython Notebook as a Unified Data Science Interface for Hadoop
IPython Notebook as a Unified Data Science Interface for Hadoop
 
Advanced Apache Spark Meetup Data Sources API Cassandra Spark Connector Spark...
Advanced Apache Spark Meetup Data Sources API Cassandra Spark Connector Spark...Advanced Apache Spark Meetup Data Sources API Cassandra Spark Connector Spark...
Advanced Apache Spark Meetup Data Sources API Cassandra Spark Connector Spark...
 

Viewers also liked (6)

Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...
Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...
Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...
 
Semantic Representation of Provenance in Wikipedia
Semantic Representation of Provenance in WikipediaSemantic Representation of Provenance in Wikipedia
Semantic Representation of Provenance in Wikipedia
 
Aggregated, Interoperable and Multi-Domain User Profiles for the Social Web
Aggregated, Interoperable and Multi-Domain User Profiles for the Social WebAggregated, Interoperable and Multi-Domain User Profiles for the Social Web
Aggregated, Interoperable and Multi-Domain User Profiles for the Social Web
 
Prov-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationProv-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance Visualization
 
Profiling User Interests on the Social Semantic Web
Profiling User Interests on the Social Semantic WebProfiling User Interests on the Social Semantic Web
Profiling User Interests on the Social Semantic Web
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
 

Similar to iRap - Interest based RDF update propagation

Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
ALTER WAY
 

Similar to iRap - Interest based RDF update propagation (20)

The Kasabi Information Marketplace
The Kasabi Information MarketplaceThe Kasabi Information Marketplace
The Kasabi Information Marketplace
 
Accelerating Delivery of Data Products - The EBSCO Way
Accelerating Delivery of Data Products - The EBSCO WayAccelerating Delivery of Data Products - The EBSCO Way
Accelerating Delivery of Data Products - The EBSCO Way
 
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
 
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
 
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
 
The UniProt SPARQL endpoint: 20 billion quads in production
The UniProt SPARQL endpoint: 20 billion quads in productionThe UniProt SPARQL endpoint: 20 billion quads in production
The UniProt SPARQL endpoint: 20 billion quads in production
 
20 billion triples in production
20 billion triples in production20 billion triples in production
20 billion triples in production
 
20 billion triples in production
20 billion triples in production20 billion triples in production
20 billion triples in production
 
Social Media World News Impact on Stock Index Values - Investment Fund Analyt...
Social Media World News Impact on Stock Index Values - Investment Fund Analyt...Social Media World News Impact on Stock Index Values - Investment Fund Analyt...
Social Media World News Impact on Stock Index Values - Investment Fund Analyt...
 
Second Thoughts about Metadata Standards for Data
Second Thoughts about Metadata Standards for DataSecond Thoughts about Metadata Standards for Data
Second Thoughts about Metadata Standards for Data
 
Weka tutorial
Weka tutorialWeka tutorial
Weka tutorial
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
Furore devdays 2017- rdf1(solbrig)
Furore devdays 2017- rdf1(solbrig)Furore devdays 2017- rdf1(solbrig)
Furore devdays 2017- rdf1(solbrig)
 
Statistical data in RDF
Statistical data in RDFStatistical data in RDF
Statistical data in RDF
 
Enterprise Search in the Big Data Era: Recent Developments and Open Challenges
Enterprise Search in the Big Data Era: Recent Developments and Open ChallengesEnterprise Search in the Big Data Era: Recent Developments and Open Challenges
Enterprise Search in the Big Data Era: Recent Developments and Open Challenges
 
Andrzej bialecki lr-2013-dublin
Andrzej bialecki lr-2013-dublinAndrzej bialecki lr-2013-dublin
Andrzej bialecki lr-2013-dublin
 
ICESat-2 H5-ES Product Development Strategy
ICESat-2 H5-ES Product Development StrategyICESat-2 H5-ES Product Development Strategy
ICESat-2 H5-ES Product Development Strategy
 
Oak meeting 18/09/2014
Oak meeting 18/09/2014Oak meeting 18/09/2014
Oak meeting 18/09/2014
 

More from Fabrizio Orlandi

Semantic Search on Heterogeneous Wiki Systems - poster
Semantic Search on Heterogeneous Wiki Systems - posterSemantic Search on Heterogeneous Wiki Systems - poster
Semantic Search on Heterogeneous Wiki Systems - poster
Fabrizio Orlandi
 

More from Fabrizio Orlandi (10)

Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
 
Modelling context and statement-level metadata in knowledge graphs
Modelling context and statement-level metadata in knowledge graphsModelling context and statement-level metadata in knowledge graphs
Modelling context and statement-level metadata in knowledge graphs
 
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic WebMulti-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
 
Semantic user profiling and Personalised filtering of the Twitter stream
Semantic user profiling and Personalised filtering of the Twitter streamSemantic user profiling and Personalised filtering of the Twitter stream
Semantic user profiling and Personalised filtering of the Twitter stream
 
Semantic search on heterogeneous wiki systems - Wikimania 2010
Semantic search on heterogeneous wiki systems - Wikimania 2010Semantic search on heterogeneous wiki systems - Wikimania 2010
Semantic search on heterogeneous wiki systems - Wikimania 2010
 
Semantic Search on Heterogeneous Wiki Systems - wikisym2010
Semantic Search on Heterogeneous Wiki Systems - wikisym2010Semantic Search on Heterogeneous Wiki Systems - wikisym2010
Semantic Search on Heterogeneous Wiki Systems - wikisym2010
 
Semantic Search on Heterogeneous Wiki Systems - poster
Semantic Search on Heterogeneous Wiki Systems - posterSemantic Search on Heterogeneous Wiki Systems - poster
Semantic Search on Heterogeneous Wiki Systems - poster
 
Semantic Search on Heterogeneous Wiki Systems - Short
Semantic Search on Heterogeneous Wiki Systems - ShortSemantic Search on Heterogeneous Wiki Systems - Short
Semantic Search on Heterogeneous Wiki Systems - Short
 
Enabling cross-wikis integration by extending the SIOC ontology
Enabling cross-wikis integration by extending the SIOC ontologyEnabling cross-wikis integration by extending the SIOC ontology
Enabling cross-wikis integration by extending the SIOC ontology
 

Recently uploaded

1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
AldoGarca30
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 

Recently uploaded (20)

NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 

iRap - Interest based RDF update propagation

  • 1. Interest-Based RDF Update Propagation Kemele M. Endris, Sidra Faisal, Fabrizio Orlandi, Sӧren Auer, Simon Scerri EIS - Enterprise Information Systems University of Bonn & Fraunhofer IAIS ISWC 2015
  • 2. Motivation: Problems accessing remote SPARQL endpoints • Challenges: • Availability is not guaranteed • Low performance due to high load and traffic • Restriction on the query forms and number of results • Possible solution: • Setup a local replica of source datasets Unavailable! "Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 2
  • 3. Motivation: Syncing using a naïve approach does not scale • Setting up local replica: • Requires manual infrastructure setup and maintenance • Requires full data loading • Data Freshness not guaranteed • Approach to sync with source (Naïve): 1. Fully-loading releases of dataset dumps in intervals • Replace old versions of a dataset with new ones 2. Continuous synchronization • Propagate all changes, including irrelevant ones Source ReplicaMirror Tool +200K triples +100K triples +120K triples +400K triples E v o l u t i o n Changesets "Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 3
  • 4. Definition: Changeset - ∆ • Delta of a dataset V within two time points: t1 and t0 (t1> t0) ∆(Vt1 ) = <Dt1-t0 , At1-t0 > • Two components: • Set of removed triples Dt1-t0 = Vt0 Vt1 • Set of added triples At1-t0 = Vt1 Vt0 "Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 4
  • 5. Dataset Mirror Tool: Naïve approach • A typical dataset mirror tool applies a changeset on a replica • It removes the set of deleted triples and insert the set of added triples • e.g., DBpedia Live Mirror tool Vt1 = (Vt0 Dt1-t0 ) UAt1-t0Removed Triples: dbr:Marcel dbp:goals 1 . dbr:Marcel dbo:team dbr:FNFT . dbr:Tim%02 foaf:name "Tim Berners-Lee" . dbr:Cristiano_Ronaldo dbo:goals 205 . Added Triples: dbr:Cristiano_Ronaldo dbo:goals 230 . dbr:Barack_Obama foaf:name "Barack Obama" . dbr:Barack_Obama foaf:homepage "http://www.barackobama.com" . dbr:Rio_Ferdinand a foaf:Person . dbr:Rio_Ferdinand a dbo:Athlete . dbr:Rio_Ferdinand dbp:goals 7 . dbr:Arvid_Smit a dbo:Athlete . replica Replica at time t0 "Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 5
  • 6. Our Approach: Interest-based update propagation • Setup a slice (subset) of a dataset • Interest-based update propagation • Interest expressions using SPARQL basic graph patterns • Interest expressions are evaluated on deltas (changesets) of the source dataset • Only triples that match interest expression shipped to the replica dataset Source ReplicaiRap +200K triples +100K triples +120K triples +400K triples +2K triples +12K triples +2K triples +5K triplesE v o l u t i o n Changesets Interesting Changesets iRap: Interest-based RDF update propagation framework "Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 6
  • 7. Interest-based update propagation "Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 7 Source Vt1 Interest evaluationChangeset ∆(Vt1 ) Interesting Changeset Δ(τt1 ) Replica τt0 Interest (ig) 𝐷𝑡1−𝑡0 𝐴 𝑡1−𝑡0 𝑟𝑡1−𝑡0 𝑈 𝑟𝑡1−𝑡0 ′ 𝑎 𝑡1−𝑡0 Candidate Generation Candidate Assertion Candidate Generation Candidate Assertion Potentially Interesting dataset Potentially interesting changeset ∆(ρt1 )
  • 8. Interest expression • Based on SPARQL graph pattern ig = <τ, b, op> • Composed of: • b - BGP – basic graph pattern • op - OGP – optional graph pattern • g - Source dataset URI – where changesets are downloaded from • τ - Replica (target) dataset URI – where interesting changes are propagated to CONSTRUCT WHERE { ?athlete a dbo:Athlete . ?athlete dbp:goals ?goals . OPTIONAL { ?athlete foaf:homepage ?page . } } BGP OGP "Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 8
  • 9. Interest evaluation (1) • Two steps: I. Interest candidate generation • Performs matching between the interest expression, ig, and a changeset, ∆(Vt1 ) • Generates a set of candidate triples • e.g., II. Interest candidate assertion • Evaluates candidate triples combined with the interest query on the target dataset • e.g., • Performs matching between the assertion query with the target dataset SELECT * WHERE { ?a a dbo:Athlete . ?a dbp:goals ?goals. } VALUES (?a ?goals){ (dbr:Cristiano_Ronaldo 205) } { dbr:Cristiano_Ronaldo dbp:goals 205. } "Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 9
  • 10. Interest evaluation (2) • Execution of the two steps on a changeset results in a set of: • Interesting removed (and added) triples • Potentially removed (and added) interesting triples • Uninteresting triples "Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 10
  • 11. Remember?! • A typical dataset mirror tool Vt1 = (Vt0 Dt1-t0 ) UAt1-t0 Removed Triples: dbr:Marcel dbp:goals 1 . dbr:Marcel dbo:team dbr:FNFT . dbr:Tim%02 foaf:name "Tim Berners-Lee" . dbr:Cristiano_Ronaldo dbo:goals 205 . Added Triples: dbr:Cristiano_Ronaldo dbo:goals 230 . dbr:Barack_Obama foaf:name "Barack Obama" . dbr:Barack_Obama foaf:homepage "http://www.barackobama.com/" . dbr:Rio_Ferdinand a foaf:Person . dbr:Rio_Ferdinand a dbo:Athlete . dbr:Rio_Ferdinand dbp:goals 7 . dbr:Arvid_Smit a dbo:Athlete . Replica Replica at time t0 CONSTRUCT WHERE { ?athlete a dbo:Athlete . ?athlete dbp:goals ?goals . OPTIONAL { ?athlete foaf:homepage ?page . } } Interest query: "Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 11
  • 12. Interest-based update propagation • Interesting Changeset Δ(τt1 ): • Interesting removed triples • Interesting added triples • Potentially Interesting Changeset ∆(ρt1 ): • Potentially interesting removed triples • Potentially interesting added triples Interesting Removed Triples: dbr:Marcel dbp:goals 1 . dbr:Cristiano_Ronaldo dbo:goals 205 . Interesting Added Triples: dbr:Cristiano_Ronaldo dbo:goals 230 . dbr:Rio_Ferdinand a dbo:Athlete . dbr:Rio_Ferdinand dbp:goals 7 . replica pi Potentially Interesting Triples: dbr:Barack_Obama foaf:homepage "http://www.barackobama.com/" . dbr:Arvid_Smit a dbo:Athlete . dbr:Marcel a dbo:Athlete . INSERT DATA{…} Potentially interesting dataset Athlete Dataset τt0 : dbr:Marcel a dbo:Athlete . dbr:Marcel dbp:goals 1 . dbr:Cristiano_Ronaldo a dbo:Athlete . dbr:Cristiano_Ronaldo dbo:goals 205 . dbr:Cristiano_Ronaldo foaf:homepage "http://cristianoronaldo.com" . "Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 12 Athlete Dataset τt1 : dbr:Cristiano_Ronaldo a dbo:Athlete . dbr:Cristiano_Ronaldo dbo:goals 230. dbr:Cristiano_Ronaldo foaf:homepage "http://cristianoronaldo.com" . dbr:Rio_Ferdinand a dbo:Athlete . dbr:Rio_Ferdinand dbp:goals 7 .
  • 13. iRap Framework • Reference implementation of our approach • Implemented using Java and Jena • Open-source • http://eis.iai.uni-bonn.de/Projects/iRap "Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 13
  • 14. Experimental setting //Football dataset interest CONSTRUCT WHERE { ?footballer a dbo:SoccerPlayer . ?footballer foaf:name ?name. ?footballer dbo:team ?team . ?team rdfs:label ?teamName. } //Location dataset interest CONSTRUCT WHERE { ?location a ?type . ?location wgs:long ?long . ?location wgs:lat ?lat . ?location rdfs:label ?label . ?location dbo:abstract ?abstract . OPTIONAL { ?location dcterms:subject ?subject } } • DBpedia 2014 – as a source dataset • 12,057 Changesets from Oct 01 – 15, 2014 • Two target datasets 1) Football dataset • Slice of DBpedia: 265K triples 2) Location dataset • Complete DBpedia 2014 dataset • Two interest expressions: "Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 14 1) 2)
  • 15. Results: Football dataset "Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 15 • Interesting triples: • 0.38% of the total removed triples • 0.34% of the total added triples
  • 16. Results: Football dataset growth "Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 16
  • 17. Results: Location dataset "Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 17 • Interesting triples: • 4.38% of the total removed triples • 1.81% of the total added triples
  • 18. Results: Location dataset growth "Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 18
  • 19. Conclusion • A novel approach for interest-based RDF update propagation and detailed formalizations • Our evaluation shows that our method can significantly cut down the size of the data updates • Summary of results from 12,057 changesets of DBpedia 2014 "Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 19 Dataset Interesting removed triples Interesting added triples Football dataset 0.38% 0.34% Location dataset 4.38% 1.81%
  • 20. Thank you for your attention! Questions? • More on our Website: http://eis.iai.uni-bonn.de/Projects/iRap • Contact: Google group: irap-ld@googlegroups.com Twitter: @KemeleM - @BadmotorF "Interest-Based RDF Update Propagation" - KM. Endris, S. Faisal, F. Orlandi, S. Auer, S. Scerri – ISWC2015 20

Editor's Notes

  1. The basic means to access and navigate the graph is to dereference HTTP URIs into RDF descriptions and to traverse RDF links discovered within the retrieved data. In addition, parts of the graph may also be accessed via SPARQL endpoints or downloaded in the form of RDF data set dumps.