SlideShare une entreprise Scribd logo
1  sur  53
Télécharger pour lire hors ligne
DistLODStats: Distributed Computation of
RDF Dataset Statistics
Gezim Sejdiu, Ivan Ermilov, Jens Lehmann and Mohamed Nadjib Mami
2 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
❖ Introduction
❖ Approach
❖ Evaluation
❖ Use Cases
❖ Conclusion and Future work
Outline
3 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
Introduction
❖ Over the last years, the size of the Semantic Web has increased and
several large-scale datasets were published
4 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
Introduction
Source: LOD-Cloud (http://lod-cloud.net/ )
❖ Over the last years, the size of the Semantic Web has increased and
several large-scale datasets were published
➢ As of August 2018
5 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
Introduction
Source: LOD-Cloud (http://lod-cloud.net/ )
❖ Over the last years, the size of the Semantic Web has increased and
several large-scale datasets were published
➢ Based on LOD Stats (http://lodstats.aksw.org/ )
~10, 000 datasets
Openly available online
using Semantic Web
standards
6 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
Introduction
Source: LOD-Cloud (http://lod-cloud.net/ )
❖ Over the last years, the size of the Semantic Web has increased and
several large-scale datasets were published
➢ Based on LOD Stats (http://lodstats.aksw.org/ )
~10, 000 datasets
Openly available online
using Semantic Web
standards
many datasets
RDFized and kept private
(e.g. Supply chain,
manufacture, ethereum
dataset, etc.)
7 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
Introduction
Source: LOD-Cloud (http://lod-cloud.net/ )
❖ Dealing with such amount of data makes many tasks hard to be
solved on single machines
8 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
Introduction
Source: LOD-Cloud (http://lod-cloud.net/ )
❖ Dealing with such amount of data makes many tasks hard to be
solved on single machines
Vocabulary
Reuse
Find a suitable
vocabulary for
your dataset
9 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
Introduction
Source: LOD-Cloud (http://lod-cloud.net/ )
❖ Dealing with such amount of data makes many tasks hard to be
solved on single machines
Vocabulary
Reuse
Find a suitable
vocabulary for
your dataset
Coverage
Analysis
Does dataset
contain
necessary
information?
10 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
Introduction
Source: LOD-Cloud (http://lod-cloud.net/ )
❖ Dealing with such amount of data makes many tasks hard to be
solved on single machines
Vocabulary
Reuse
Find a suitable
vocabulary for
your dataset
Coverage
Analysis
Does dataset
contain
necessary
information?
Privacy
Analysis
Does dataset
contain
sensitive
information?
11 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
Introduction
Source: LOD-Cloud (http://lod-cloud.net/ )
❖ Dealing with such amount of data makes many tasks hard to be
solved on single machines
Vocabulary
Reuse
Find a suitable
vocabulary for
your dataset
Coverage
Analysis
Does dataset
contain
necessary
information?
Privacy
Analysis
Does dataset
contain
sensitive
information?
Entity
Linking
Which datasets
are good
candidates for
interlinking?
12 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
Introduction
Source: LOD-Cloud (http://lod-cloud.net/ )
❖ Dealing with such amount of data makes many tasks hard to be
solved on single machines
We need statistics!Vocabulary
Reuse
Find a suitable
vocabulary for
your dataset
Coverage
Analysis
Does dataset
contain
necessary
information?
Privacy
Analysis
Does dataset
contain
sensitive
information?
Entity
Linking
Which datasets
are good
candidates for
interlinking?
13 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
Introduction
Source: LOD-Cloud (http://lod-cloud.net/ )
❖ Dealing with such amount of data makes many tasks hard to be
solved on single machines
but
(using an efficient approach)
Vocabulary
Reuse
Find a suitable
vocabulary for
your dataset
Coverage
Analysis
Does dataset
contain
necessary
information?
Privacy
Analysis
Does dataset
contain
sensitive
information?
Entity
Linking
Which datasets
are good
candidates for
interlinking?
We need statistics!
14 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
SANSA Overview
❖ SANSA [1] is a processing data flow engine that provides data
distribution, and fault tolerance for distributed computations over
RDF large-scale datasets
❖ SANSA includes several libraries:
➢ Read / Write RDF / OWL library
➢ Querying library
➢ Inference library
➢ ML- Machine Learning core library
BigDataEurope
Inference
Knowledge Distribution &
Representation
DeployCoreAPIs&Libraries
Local Cluster
Standalone Resource manager
Querying
Machine Learning
15 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
❖ A statistical criterion C (adopted definition from [2]) is a triple C = (F,
D, P), where:
➢ F is a SPARQL filter condition
➢ D is a derived dataset from the main dataset (RDD of triples) after applying
F
➢ P is a post-processing filter operating on the data structure D
❖ RDDs are in-memory collections of records that can be operated in
parallel on large clusters
➢ We use RDDs to represent RDF triples
Approach
16 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
❖ Overview on DistLODStats architecture
Approach
17 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
Approach (DistLODStats workflow)
RDF Data
18 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
Approach (DistLODStats workflow)
RDF Data
Definitions
● Define rdf statistics in the format of [(Rule→ Filter) → Action → Postprocessing]
RDFStatistics
SANSA Engine
Statistics
ClassDistribution
PropertyDistribution
Indegree
Outdegree
Distinctentities
Literals
SameAs
Namespaces
---
Class Distribution
Filter rule Action Postprocessing
?p=rdf:type && isIRI(?o) M[?o]++ top(M,100)
19 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
Approach (DistLODStats workflow)
RDF Data
Definitions
● Define rdf statistics in the format of [(Rule→ Filter) → Action → Postprocessing]
RDFStatistics
SANSA Engine
Statistics
ClassDistribution
PropertyDistribution
Indegree
Outdegree
Distinctentities
Literals
SameAs
Namespaces
---
❖ We use RDDs to represent
RDF triples
val triples = spark.rdf(lang)(input)
val stats =
triples.statsClassUsageCount()
Class Distribution
Filter rule Action Postprocessing
?p=rdf:type && isIRI(?o) M[?o]++ top(M,100)
20 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
Approach (DistLODStats workflow)
RDF Data
Definitions
● Define rdf statistics in the format of [(Rule→ Filter) → Action → Postprocessing]
RDFStatistics
SANSA Engine
Statistics
ClassDistribution
PropertyDistribution
Indegree
Outdegree
Distinctentities
Literals
SameAs
Namespaces
---
filter_rule = triples.filter(f =>
f.predicateMatches(RDF.`type`.asNode
()) && f.getObject.isURI)
.map(_.getObject)
Worker Worker Worker
Class Distribution
Filter rule Action Postprocessing
?p=rdf:type && isIRI(?o) M[?o]++ top(M,100)
21 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
Approach (DistLODStats workflow)
RDF Data
Definitions
● Define rdf statistics in the format of [(Rule→ Filter) → Action → Postprocessing]
RDFStatistics
SANSA Engine
Statistics
ClassDistribution
PropertyDistribution
Indegree
Outdegree
Distinctentities
Literals
SameAs
Namespaces
---
action = filter_rule
.map(f => (f, 1))
.reduceByKey(_ + _)
Worker Worker Worker
5
:Person
2
:Place
1
:Work
6
:Org.
4
:Species
1
:Person
3
:Person
1
:Place
Class Distribution
Filter rule Action Postprocessing
?p=rdf:type && isIRI(?o) M[?o]++ top(M,100)
22 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
Approach (DistLODStats workflow)
RDF Data
Definitions
● Define rdf statistics in the format of [(Rule→ Filter) → Action → Postprocessing]
RDFStatistics
SANSA Engine
Statistics
ClassDistribution
PropertyDistribution
Indegree
Outdegree
Distinctentities
Literals
SameAs
Namespaces
---
result = action.sortBy(_._2, false)
.take(100)
Worker Worker Worker
5
:Person
2
:Place
1
:Work
6
:Org.
4
:Species
1
:Person
3
:Person
1
:Place
Master
9
:Person
6
:Org.
3
:Place
1
:Work
4
:Species
Class Distribution
Filter rule Action Postprocessing
?p=rdf:type && isIRI(?o) M[?o]++ top(M,100)
23 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
Approach (DistLODStats workflow)
RDF Data
Definitions
● Define rdf statistics in the format of [(Rule→ Filter) → Action → Postprocessing]
RDFStatistics
SANSA Engine
Statistics
ClassDistribution
PropertyDistribution
Indegree
Outdegree
Distinctentities
Literals
SameAs
Namespaces
---
result.voidify(output)
Worker Worker Worker
5
:Person
2
:Place
1
:Work
6
:Org.
4
:Species
1
:Person
3
:Person
1
:Place
Master
9
:Person
6
:Org.
3
:Place
1
:Work
4
:Species
Analyse
SANSA-Notebooks
Vocabulary of Interlinked Datasets
(VoID)
Class Distribution
Filter rule Action Postprocessing
?p=rdf:type && isIRI(?o) M[?o]++ top(M,100)
24 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
❖ Q1: How does the runtime of the algorithm change when more
nodes in the cluster are added?
❖ Q2: How does the algorithm scale to larger datasets?
❖ Q3: How does the algorithm scale to a larger number of datasets?
Evaluation
25 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
❖ Experimental Setup
➢ Cluster configuration
■ 6 machines (1 master, 5 workers): Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz (32 Cores), 128 GB RAM, 12
TB SATA RAID-5
■ Spark-2.2.0, Hadoop 2.8.0, Scala 2.11.11 and Java 8
➢ Datasets (all in nt format)
■ LinkedGeoData: (1,292,933,812 triples; 191.17 GB)
■ DBpedia v3.9: (en: 812,545,486 friples; 114.4 GB), (de: 336,714,883 triples; 48.6 GB), (fr: 340,849,556
triples; 49.77 GB)
■ BSBM: (BSBM_2GB: 8,289,484 triples; 2 GB), (BSBM_20GB: 81,980,472 triples; 20 GB), (BSBM_200GB:
817,774,057 triples; 200 GB)
Evaluation
26 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
❖ Distributed Processing on Large-Scale Datasets
Evaluation
Runtime (h) (mean/std)
LODStats DistLODStats
a) files b) bigfile c) local d) cluster e) speedup ratio
LinkedGeoData n/a n/a 36.65/0.13 4.37/0.15 7.4x
DBpedia_en 24.63/0.57 fail 25.34/0.11 2.97/0.08 7.6x
DBpedia_de n/a n/a 10.34/0.06 1.2/0.0 7.3x
DBpedia_fr n/a n/a 10.49/0.09 1.27/0.04 7.3x
27 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
❖ Distributed Processing on Large-Scale Datasets
Evaluation
Runtime (h) (mean/std)
LODStats DistLODStats
a) files b) bigfile c) local d) cluster e) speedup ratio
LinkedGeoData n/a n/a 36.65/0.13 4.37/0.15 7.4x
DBpedia_en 24.63/0.57 fail 25.34/0.11 2.97/0.08 7.6x
DBpedia_de n/a n/a 10.34/0.06 1.2/0.0 7.3x
DBpedia_fr n/a n/a 10.49/0.09 1.27/0.04 7.3x
28 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
❖ Distributed Processing on Large-Scale Datasets
Evaluation
Runtime (h) (mean/std)
LODStats DistLODStats
a) files b) bigfile c) local d) cluster e) speedup ratio
LinkedGeoData n/a n/a 36.65/0.13 4.37/0.15 7.4x
DBpedia_en 24.63/0.57 fail 25.34/0.11 2.97/0.08 7.6x
DBpedia_de n/a n/a 10.34/0.06 1.2/0.0 7.3x
DBpedia_fr n/a n/a 10.49/0.09 1.27/0.04 7.3x
29 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
❖ Distributed Processing on Large-Scale Datasets
Evaluation
Runtime (h) (mean/std)
LODStats DistLODStats
a) files b) bigfile c) local d) cluster e) speedup ratio
LinkedGeoData n/a n/a 36.65/0.13 4.37/0.15 7.4x
DBpedia_en 24.63/0.57 fail 25.34/0.11 2.97/0.08 7.6x
DBpedia_de n/a n/a 10.34/0.06 1.2/0.0 7.3x
DBpedia_fr n/a n/a 10.49/0.09 1.27/0.04 7.3x
* e) = d) / c) - 1
30 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
❖ Speedup performance evaluation of DistLODStats
Evaluation
31 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
❖ Speedup performance evaluation of DistLODStats
Evaluation
DistLODStats shows consistent
improvements for each
dataset when running on a
cluster with an geometric
mean of the speedup of 7.4x,
which answer Q1.
32 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
❖ Sizeup performance evaluation of DistLODStats
Evaluation
33 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
❖ Sizeup performance evaluation of DistLODStats
Evaluation
The execution time cost grows
linearly when the size of the
dataset increases and is
near-constant as long as the
data fits in memory.
DistLODStats scales well in
context of sizeup, which
answers Q2.
34 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
❖ Scalability performance evaluation on DistLODStats
Evaluation
35 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
❖ Scalability performance evaluation on DistLODStats
Evaluation
We can see that as the
number of workers increases,
the execution time cost is
super-linear on BSBM_50GB
dataset.
36 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
❖ Speedup Ratio and Efficiency of DistLODStats
Evaluation
37 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
❖ Speedup Ratio and Efficiency of DistLODStats
Evaluation
The speedup performance
trend is consistent as the
number of workers increases.
Efficiency increased only up to
the 4th worker for
BSBM_50GB dataset.
The results imply that
DistLODStats can achieve near
linear or even super linear
scalability in performance,
which answers Q3.
* speedup (S) =
* efficiency (E) =
- execution time of the algorithm
on local mode
- time required to complete the
task on N workers
38 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
Powered by
<http://lodstats.aksw.org/>
Comprehensive
statistics – LODStats
DistLODStats is used as
underlying engine overcoming
the previous limitations and
generating statistical
descriptions, including e.g. VoID,
for large parts of the Linked Open
Data Cloud
39 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
Powered by
<https://aleth.io/>
Blockchain – Alethio
Use Case
Alethio is using SANSA in general
and DistLODStats specifically in
order to perform large-scale
batch analytics, e.g. computing
the asset turnover for sets of
accounts, computing attack
pattern frequencies and Opcode
usage statistics. DistLODStats
was run on a 100 node cluster
with 400 cores
<http://lodstats.aksw.org/>
Comprehensive
statistics – LODStats
DistLODStats is used as
underlying engine overcoming
the previous limitations and
generating statistical
descriptions, including e.g. VoID,
for large parts of the Linked Open
Data Cloud
40 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
Powered by
<https://aleth.io/>
Blockchain – Alethio
Use Case
Alethio is using SANSA in general
and DistLODStats specifically in
order to perform large-scale
batch analytics, e.g. computing
the asset turnover for sets of
accounts, computing attack
pattern frequencies and Opcode
usage statistics. DistLODStats
was run on a 100 node cluster
with 400 cores
<http://lodstats.aksw.org/>
Comprehensive
statistics – LODStats
DistLODStats is used as
underlying engine overcoming
the previous limitations and
generating statistical
descriptions, including e.g. VoID,
for large parts of the Linked Open
Data Cloud
<https://www.big-data-europe.eu/>
Big Data Platform –
BDE
DistLODStats is used for
computing statistics over those
logs within the BDE platform. BDE
uses the Mu Swarm Logger
service for detecting docker
events and convert their
representation to RDF. In order to
generate visualisations of log
statistics, BDE then calls
DistLODStats from
SANSA-Notebooks
41 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
Powered by
<https://aleth.io/>
Blockchain – Alethio
Use Case
Alethio is using SANSA in general
and DistLODStats specifically in
order to perform large-scale
batch analytics, e.g. computing
the asset turnover for sets of
accounts, computing attack
pattern frequencies and Opcode
usage statistics. DistLODStats
was run on a 100 node cluster
with 400 cores
<http://lodstats.aksw.org/>
Comprehensive
statistics – LODStats
DistLODStats is used as
underlying engine overcoming
the previous limitations and
generating statistical
descriptions, including e.g. VoID,
for large parts of the Linked Open
Data Cloud
<https://www.big-data-europe.eu/>
Big Data Platform –
BDE
DistLODStats is used for
computing statistics over those
logs within the BDE platform. BDE
uses the Mu Swarm Logger
service for detecting docker
events and convert their
representation to RDF. In order to
generate visualisations of log
statistics, BDE then calls
DistLODStats from
SANSA-Notebooks
<http://abstat.disco.unimib.it/>
LOD Summaries –
ABSTAT
DistLODStats is used for data set
summarisation of large-scale
RDF datasets in this context
42 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
❖ Obtaining an overview over the Web of Data:
➢ data-intensive and computing-intensive
➢ challenge to develop fast and efficient algorithms that can handle large
scale RDF datasets
❖ DistLODStats, a novel software component (integrated into the
larger SANSA framework) for distributed in-memory computation of
RDF Datasets statistics implemented using the Spark framework
❖ As a future work we plan to further improve time efficiency and
perform load balancing
Conclusion
43 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
STATisfy: A REST Interface for DistLODStats
CollaborativeAnalyticsServices
Marketplace
REST
Server
BigDataEurope
Local Cluster
Standalone Resource manager
Master
Worker 1 Worker 2 Worker n
SANSA DistLODStats
44 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
STATisfy: A REST Interface for DistLODStats
CollaborativeAnalyticsServices
Marketplace
REST
Server
BigDataEurope
Local Cluster
Standalone Resource manager
Master
Worker 1 Worker 2 Worker n
SANSA DistLODStats
Visit our poster [P29] at 7pm!
THANK YOU!
SANSA Semantic Analytics Stack
https://github.com/SANSA-Stack
● SANSA-RDF
● SANSA-OWL
● SANSA-Query
● SANSA-Inference
● SANSA-ML
● SANSA-Examples
● SANSA-Notebooks
Gezim Sejdiu
PhD Student
Smart Data Analytics / University of Bonn
http://sda.tech
@Gezim_Sejdiu
https://gezimsejdiu.github.io/
<http://sansa-stack.net/distlodstats/>
46 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
[1]. J. Lehmann, G. Sejdiu, L. Buhmann, P. Westphal, C. Stadler, I. Ermilov, S. Bin, ¨ N.
Chakraborty, M. Saleem, A.-C. N. Ngonga, and H. Jabeen. Distributed Semantic Analytics
using the SANSA Stack. In Proceedings of 16th International Semantic Web Conference,
2017
[2]. J. Demter, S. Auer, M. Martin, and J. Lehmann. Lodstats—an extensible framework for
high-performance dataset analytics. In Proceedings of the EKAW 2012, Lecture Notes in
Computer Science (LNCS) 7603. Springer, 2012
References
47 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
Backup Slides
48 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
Introduction
❖ Over the last years, the size of the Semantic Web has increased and
several large-scale datasets were published
➢ As of May 2007
Source: LOD-Cloud (http://lod-cloud.net/ )
49 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
Introduction
❖ Over the last years, the size of the Semantic Web has increased and
several large-scale datasets were published
➢ As of September 2008
Source: LOD-Cloud (http://lod-cloud.net/ )
50 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
Introduction
❖ Over the last years, the size of the Semantic Web has increased and
several large-scale datasets were published
➢ As of September 2011
Source: LOD-Cloud (http://lod-cloud.net/ )
51 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
❖ Overall Breakdown by Criterion Analysis (log scale)
Evaluation
52 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
❖ Overall Breakdown by Criterion Analysis (log scale)
Evaluation
The execution time is longer
when there is data movement
in the cluster compared to
when data is processed
without movement.
53 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn
❖ Overall Breakdown by Criterion Analysis (log scale)
Evaluation
There are some criteria that
are quite efficient to compute
even with data movement e.g.
22, 23. This is because data is
largely filtered before the
movement.

Contenu connexe

Tendances

The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedSören Auer
 
Another RDF Encoding Form
Another RDF Encoding FormAnother RDF Encoding Form
Another RDF Encoding FormJakob .
 
Web Data Management with RDF
Web Data Management with RDFWeb Data Management with RDF
Web Data Management with RDFM. Tamer Özsu
 
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolWi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolLaura Po
 
Managing RDF data with graph databases
Managing RDF data with graph databasesManaging RDF data with graph databases
Managing RDF data with graph databasesGraph-TA
 
Deriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataDeriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataGraph-TA
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data VisualizationLaura Po
 
Applications of Word Vectors in Text Retrieval and Classification
Applications of Word Vectors in Text Retrieval and ClassificationApplications of Word Vectors in Text Retrieval and Classification
Applications of Word Vectors in Text Retrieval and Classificationshakimov
 
Jarrar: SPARQL - RDF Query Language
Jarrar: SPARQL - RDF Query LanguageJarrar: SPARQL - RDF Query Language
Jarrar: SPARQL - RDF Query LanguageMustafa Jarrar
 
List.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF ListsList.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF ListsAlbert Meroño-Peñuela
 
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Jeff Z. Pan
 
Semantic Pipes and Semantic Mashups
Semantic Pipes and Semantic MashupsSemantic Pipes and Semantic Mashups
Semantic Pipes and Semantic Mashupsgiurca
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using RVictoria López
 
Chapter 10 Data Mining Techniques
 Chapter 10 Data Mining Techniques Chapter 10 Data Mining Techniques
Chapter 10 Data Mining TechniquesHouw Liong The
 

Tendances (20)

Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
SWT Lecture Session 8 - Rules
SWT Lecture Session 8 - RulesSWT Lecture Session 8 - Rules
SWT Lecture Session 8 - Rules
 
Another RDF Encoding Form
Another RDF Encoding FormAnother RDF Encoding Form
Another RDF Encoding Form
 
Web Data Management with RDF
Web Data Management with RDFWeb Data Management with RDF
Web Data Management with RDF
 
NISO/DCMI Webinar: International Bibliographic Standards, Linked Data, and th...
NISO/DCMI Webinar: International Bibliographic Standards, Linked Data, and th...NISO/DCMI Webinar: International Bibliographic Standards, Linked Data, and th...
NISO/DCMI Webinar: International Bibliographic Standards, Linked Data, and th...
 
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolWi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX tool
 
Managing RDF data with graph databases
Managing RDF data with graph databasesManaging RDF data with graph databases
Managing RDF data with graph databases
 
Deriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataDeriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF Data
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
 
Applications of Word Vectors in Text Retrieval and Classification
Applications of Word Vectors in Text Retrieval and ClassificationApplications of Word Vectors in Text Retrieval and Classification
Applications of Word Vectors in Text Retrieval and Classification
 
Jarrar: SPARQL - RDF Query Language
Jarrar: SPARQL - RDF Query LanguageJarrar: SPARQL - RDF Query Language
Jarrar: SPARQL - RDF Query Language
 
List.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF ListsList.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF Lists
 
LD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and toolsLD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and tools
 
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
 
Triple Stores
Triple StoresTriple Stores
Triple Stores
 
Introduction to RDF Data Model
Introduction to RDF Data ModelIntroduction to RDF Data Model
Introduction to RDF Data Model
 
Semantic Pipes and Semantic Mashups
Semantic Pipes and Semantic MashupsSemantic Pipes and Semantic Mashups
Semantic Pipes and Semantic Mashups
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using R
 
Chapter 10 Data Mining Techniques
 Chapter 10 Data Mining Techniques Chapter 10 Data Mining Techniques
Chapter 10 Data Mining Techniques
 

Similaire à DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk

Release webinar: Sansa and Ontario
Release webinar: Sansa and OntarioRelease webinar: Sansa and Ontario
Release webinar: Sansa and OntarioBigData_Europe
 
Wed roman tut_open_datapub
Wed roman tut_open_datapubWed roman tut_open_datapub
Wed roman tut_open_datapubeswcsummerschool
 
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...Matthäus Zloch
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...Gezim Sejdiu
 
Standardizing for Open Data
Standardizing for Open DataStandardizing for Open Data
Standardizing for Open DataIvan Herman
 
Towards A Scalable Semantic-based Distributed Approach for SPARQL query evalu...
Towards A Scalable Semantic-based Distributed Approach for SPARQL query evalu...Towards A Scalable Semantic-based Distributed Approach for SPARQL query evalu...
Towards A Scalable Semantic-based Distributed Approach for SPARQL query evalu...Gezim Sejdiu
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And VisualizationIvan Ermilov
 
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...Marta Villegas
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Stefan Dietze
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale Bernadette Hyland-Wood
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph IntroductionSören Auer
 
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...CONUL Conference
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataGiorgos Santipantakis
 
Linked Data Driven Data Virtualization for Web-scale Integration
Linked Data Driven Data Virtualization for Web-scale IntegrationLinked Data Driven Data Virtualization for Web-scale Integration
Linked Data Driven Data Virtualization for Web-scale Integrationrumito
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic WebRoberto García
 
Linked Open Data: A simple how-to
Linked Open Data: A simple how-toLinked Open Data: A simple how-to
Linked Open Data: A simple how-tonvitucci
 
UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.
UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.
UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.tomasknap
 
Going for GOLD - Adventures in Open Linked Geospatial Metadata
Going for GOLD - Adventures in Open Linked Geospatial MetadataGoing for GOLD - Adventures in Open Linked Geospatial Metadata
Going for GOLD - Adventures in Open Linked Geospatial MetadataEDINA, University of Edinburgh
 
FIWARE Global Summit - IDS Implementation with FIWARE Software Components
FIWARE Global Summit - IDS Implementation with FIWARE Software ComponentsFIWARE Global Summit - IDS Implementation with FIWARE Software Components
FIWARE Global Summit - IDS Implementation with FIWARE Software ComponentsFIWARE
 
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioDo it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioOpen Knowledge Belgium
 

Similaire à DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk (20)

Release webinar: Sansa and Ontario
Release webinar: Sansa and OntarioRelease webinar: Sansa and Ontario
Release webinar: Sansa and Ontario
 
Wed roman tut_open_datapub
Wed roman tut_open_datapubWed roman tut_open_datapub
Wed roman tut_open_datapub
 
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
 
Standardizing for Open Data
Standardizing for Open DataStandardizing for Open Data
Standardizing for Open Data
 
Towards A Scalable Semantic-based Distributed Approach for SPARQL query evalu...
Towards A Scalable Semantic-based Distributed Approach for SPARQL query evalu...Towards A Scalable Semantic-based Distributed Approach for SPARQL query evalu...
Towards A Scalable Semantic-based Distributed Approach for SPARQL query evalu...
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
 
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
Linked Data Driven Data Virtualization for Web-scale Integration
Linked Data Driven Data Virtualization for Web-scale IntegrationLinked Data Driven Data Virtualization for Web-scale Integration
Linked Data Driven Data Virtualization for Web-scale Integration
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
 
Linked Open Data: A simple how-to
Linked Open Data: A simple how-toLinked Open Data: A simple how-to
Linked Open Data: A simple how-to
 
UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.
UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.
UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.
 
Going for GOLD - Adventures in Open Linked Geospatial Metadata
Going for GOLD - Adventures in Open Linked Geospatial MetadataGoing for GOLD - Adventures in Open Linked Geospatial Metadata
Going for GOLD - Adventures in Open Linked Geospatial Metadata
 
FIWARE Global Summit - IDS Implementation with FIWARE Software Components
FIWARE Global Summit - IDS Implementation with FIWARE Software ComponentsFIWARE Global Summit - IDS Implementation with FIWARE Software Components
FIWARE Global Summit - IDS Implementation with FIWARE Software Components
 
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioDo it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
 

Dernier

Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxRenuJangid3
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Silpa
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxSilpa
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Silpa
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Silpa
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsbassianu17
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Silpa
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptxryanrooker
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body Areesha Ahmad
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 

Dernier (20)

Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 

DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk

  • 1. DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu, Ivan Ermilov, Jens Lehmann and Mohamed Nadjib Mami
  • 2. 2 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn ❖ Introduction ❖ Approach ❖ Evaluation ❖ Use Cases ❖ Conclusion and Future work Outline
  • 3. 3 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn Introduction ❖ Over the last years, the size of the Semantic Web has increased and several large-scale datasets were published
  • 4. 4 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn Introduction Source: LOD-Cloud (http://lod-cloud.net/ ) ❖ Over the last years, the size of the Semantic Web has increased and several large-scale datasets were published ➢ As of August 2018
  • 5. 5 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn Introduction Source: LOD-Cloud (http://lod-cloud.net/ ) ❖ Over the last years, the size of the Semantic Web has increased and several large-scale datasets were published ➢ Based on LOD Stats (http://lodstats.aksw.org/ ) ~10, 000 datasets Openly available online using Semantic Web standards
  • 6. 6 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn Introduction Source: LOD-Cloud (http://lod-cloud.net/ ) ❖ Over the last years, the size of the Semantic Web has increased and several large-scale datasets were published ➢ Based on LOD Stats (http://lodstats.aksw.org/ ) ~10, 000 datasets Openly available online using Semantic Web standards many datasets RDFized and kept private (e.g. Supply chain, manufacture, ethereum dataset, etc.)
  • 7. 7 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn Introduction Source: LOD-Cloud (http://lod-cloud.net/ ) ❖ Dealing with such amount of data makes many tasks hard to be solved on single machines
  • 8. 8 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn Introduction Source: LOD-Cloud (http://lod-cloud.net/ ) ❖ Dealing with such amount of data makes many tasks hard to be solved on single machines Vocabulary Reuse Find a suitable vocabulary for your dataset
  • 9. 9 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn Introduction Source: LOD-Cloud (http://lod-cloud.net/ ) ❖ Dealing with such amount of data makes many tasks hard to be solved on single machines Vocabulary Reuse Find a suitable vocabulary for your dataset Coverage Analysis Does dataset contain necessary information?
  • 10. 10 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn Introduction Source: LOD-Cloud (http://lod-cloud.net/ ) ❖ Dealing with such amount of data makes many tasks hard to be solved on single machines Vocabulary Reuse Find a suitable vocabulary for your dataset Coverage Analysis Does dataset contain necessary information? Privacy Analysis Does dataset contain sensitive information?
  • 11. 11 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn Introduction Source: LOD-Cloud (http://lod-cloud.net/ ) ❖ Dealing with such amount of data makes many tasks hard to be solved on single machines Vocabulary Reuse Find a suitable vocabulary for your dataset Coverage Analysis Does dataset contain necessary information? Privacy Analysis Does dataset contain sensitive information? Entity Linking Which datasets are good candidates for interlinking?
  • 12. 12 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn Introduction Source: LOD-Cloud (http://lod-cloud.net/ ) ❖ Dealing with such amount of data makes many tasks hard to be solved on single machines We need statistics!Vocabulary Reuse Find a suitable vocabulary for your dataset Coverage Analysis Does dataset contain necessary information? Privacy Analysis Does dataset contain sensitive information? Entity Linking Which datasets are good candidates for interlinking?
  • 13. 13 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn Introduction Source: LOD-Cloud (http://lod-cloud.net/ ) ❖ Dealing with such amount of data makes many tasks hard to be solved on single machines but (using an efficient approach) Vocabulary Reuse Find a suitable vocabulary for your dataset Coverage Analysis Does dataset contain necessary information? Privacy Analysis Does dataset contain sensitive information? Entity Linking Which datasets are good candidates for interlinking? We need statistics!
  • 14. 14 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn SANSA Overview ❖ SANSA [1] is a processing data flow engine that provides data distribution, and fault tolerance for distributed computations over RDF large-scale datasets ❖ SANSA includes several libraries: ➢ Read / Write RDF / OWL library ➢ Querying library ➢ Inference library ➢ ML- Machine Learning core library BigDataEurope Inference Knowledge Distribution & Representation DeployCoreAPIs&Libraries Local Cluster Standalone Resource manager Querying Machine Learning
  • 15. 15 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn ❖ A statistical criterion C (adopted definition from [2]) is a triple C = (F, D, P), where: ➢ F is a SPARQL filter condition ➢ D is a derived dataset from the main dataset (RDD of triples) after applying F ➢ P is a post-processing filter operating on the data structure D ❖ RDDs are in-memory collections of records that can be operated in parallel on large clusters ➢ We use RDDs to represent RDF triples Approach
  • 16. 16 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn ❖ Overview on DistLODStats architecture Approach
  • 17. 17 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn Approach (DistLODStats workflow) RDF Data
  • 18. 18 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn Approach (DistLODStats workflow) RDF Data Definitions ● Define rdf statistics in the format of [(Rule→ Filter) → Action → Postprocessing] RDFStatistics SANSA Engine Statistics ClassDistribution PropertyDistribution Indegree Outdegree Distinctentities Literals SameAs Namespaces --- Class Distribution Filter rule Action Postprocessing ?p=rdf:type && isIRI(?o) M[?o]++ top(M,100)
  • 19. 19 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn Approach (DistLODStats workflow) RDF Data Definitions ● Define rdf statistics in the format of [(Rule→ Filter) → Action → Postprocessing] RDFStatistics SANSA Engine Statistics ClassDistribution PropertyDistribution Indegree Outdegree Distinctentities Literals SameAs Namespaces --- ❖ We use RDDs to represent RDF triples val triples = spark.rdf(lang)(input) val stats = triples.statsClassUsageCount() Class Distribution Filter rule Action Postprocessing ?p=rdf:type && isIRI(?o) M[?o]++ top(M,100)
  • 20. 20 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn Approach (DistLODStats workflow) RDF Data Definitions ● Define rdf statistics in the format of [(Rule→ Filter) → Action → Postprocessing] RDFStatistics SANSA Engine Statistics ClassDistribution PropertyDistribution Indegree Outdegree Distinctentities Literals SameAs Namespaces --- filter_rule = triples.filter(f => f.predicateMatches(RDF.`type`.asNode ()) && f.getObject.isURI) .map(_.getObject) Worker Worker Worker Class Distribution Filter rule Action Postprocessing ?p=rdf:type && isIRI(?o) M[?o]++ top(M,100)
  • 21. 21 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn Approach (DistLODStats workflow) RDF Data Definitions ● Define rdf statistics in the format of [(Rule→ Filter) → Action → Postprocessing] RDFStatistics SANSA Engine Statistics ClassDistribution PropertyDistribution Indegree Outdegree Distinctentities Literals SameAs Namespaces --- action = filter_rule .map(f => (f, 1)) .reduceByKey(_ + _) Worker Worker Worker 5 :Person 2 :Place 1 :Work 6 :Org. 4 :Species 1 :Person 3 :Person 1 :Place Class Distribution Filter rule Action Postprocessing ?p=rdf:type && isIRI(?o) M[?o]++ top(M,100)
  • 22. 22 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn Approach (DistLODStats workflow) RDF Data Definitions ● Define rdf statistics in the format of [(Rule→ Filter) → Action → Postprocessing] RDFStatistics SANSA Engine Statistics ClassDistribution PropertyDistribution Indegree Outdegree Distinctentities Literals SameAs Namespaces --- result = action.sortBy(_._2, false) .take(100) Worker Worker Worker 5 :Person 2 :Place 1 :Work 6 :Org. 4 :Species 1 :Person 3 :Person 1 :Place Master 9 :Person 6 :Org. 3 :Place 1 :Work 4 :Species Class Distribution Filter rule Action Postprocessing ?p=rdf:type && isIRI(?o) M[?o]++ top(M,100)
  • 23. 23 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn Approach (DistLODStats workflow) RDF Data Definitions ● Define rdf statistics in the format of [(Rule→ Filter) → Action → Postprocessing] RDFStatistics SANSA Engine Statistics ClassDistribution PropertyDistribution Indegree Outdegree Distinctentities Literals SameAs Namespaces --- result.voidify(output) Worker Worker Worker 5 :Person 2 :Place 1 :Work 6 :Org. 4 :Species 1 :Person 3 :Person 1 :Place Master 9 :Person 6 :Org. 3 :Place 1 :Work 4 :Species Analyse SANSA-Notebooks Vocabulary of Interlinked Datasets (VoID) Class Distribution Filter rule Action Postprocessing ?p=rdf:type && isIRI(?o) M[?o]++ top(M,100)
  • 24. 24 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn ❖ Q1: How does the runtime of the algorithm change when more nodes in the cluster are added? ❖ Q2: How does the algorithm scale to larger datasets? ❖ Q3: How does the algorithm scale to a larger number of datasets? Evaluation
  • 25. 25 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn ❖ Experimental Setup ➢ Cluster configuration ■ 6 machines (1 master, 5 workers): Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz (32 Cores), 128 GB RAM, 12 TB SATA RAID-5 ■ Spark-2.2.0, Hadoop 2.8.0, Scala 2.11.11 and Java 8 ➢ Datasets (all in nt format) ■ LinkedGeoData: (1,292,933,812 triples; 191.17 GB) ■ DBpedia v3.9: (en: 812,545,486 friples; 114.4 GB), (de: 336,714,883 triples; 48.6 GB), (fr: 340,849,556 triples; 49.77 GB) ■ BSBM: (BSBM_2GB: 8,289,484 triples; 2 GB), (BSBM_20GB: 81,980,472 triples; 20 GB), (BSBM_200GB: 817,774,057 triples; 200 GB) Evaluation
  • 26. 26 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn ❖ Distributed Processing on Large-Scale Datasets Evaluation Runtime (h) (mean/std) LODStats DistLODStats a) files b) bigfile c) local d) cluster e) speedup ratio LinkedGeoData n/a n/a 36.65/0.13 4.37/0.15 7.4x DBpedia_en 24.63/0.57 fail 25.34/0.11 2.97/0.08 7.6x DBpedia_de n/a n/a 10.34/0.06 1.2/0.0 7.3x DBpedia_fr n/a n/a 10.49/0.09 1.27/0.04 7.3x
  • 27. 27 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn ❖ Distributed Processing on Large-Scale Datasets Evaluation Runtime (h) (mean/std) LODStats DistLODStats a) files b) bigfile c) local d) cluster e) speedup ratio LinkedGeoData n/a n/a 36.65/0.13 4.37/0.15 7.4x DBpedia_en 24.63/0.57 fail 25.34/0.11 2.97/0.08 7.6x DBpedia_de n/a n/a 10.34/0.06 1.2/0.0 7.3x DBpedia_fr n/a n/a 10.49/0.09 1.27/0.04 7.3x
  • 28. 28 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn ❖ Distributed Processing on Large-Scale Datasets Evaluation Runtime (h) (mean/std) LODStats DistLODStats a) files b) bigfile c) local d) cluster e) speedup ratio LinkedGeoData n/a n/a 36.65/0.13 4.37/0.15 7.4x DBpedia_en 24.63/0.57 fail 25.34/0.11 2.97/0.08 7.6x DBpedia_de n/a n/a 10.34/0.06 1.2/0.0 7.3x DBpedia_fr n/a n/a 10.49/0.09 1.27/0.04 7.3x
  • 29. 29 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn ❖ Distributed Processing on Large-Scale Datasets Evaluation Runtime (h) (mean/std) LODStats DistLODStats a) files b) bigfile c) local d) cluster e) speedup ratio LinkedGeoData n/a n/a 36.65/0.13 4.37/0.15 7.4x DBpedia_en 24.63/0.57 fail 25.34/0.11 2.97/0.08 7.6x DBpedia_de n/a n/a 10.34/0.06 1.2/0.0 7.3x DBpedia_fr n/a n/a 10.49/0.09 1.27/0.04 7.3x * e) = d) / c) - 1
  • 30. 30 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn ❖ Speedup performance evaluation of DistLODStats Evaluation
  • 31. 31 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn ❖ Speedup performance evaluation of DistLODStats Evaluation DistLODStats shows consistent improvements for each dataset when running on a cluster with an geometric mean of the speedup of 7.4x, which answer Q1.
  • 32. 32 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn ❖ Sizeup performance evaluation of DistLODStats Evaluation
  • 33. 33 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn ❖ Sizeup performance evaluation of DistLODStats Evaluation The execution time cost grows linearly when the size of the dataset increases and is near-constant as long as the data fits in memory. DistLODStats scales well in context of sizeup, which answers Q2.
  • 34. 34 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn ❖ Scalability performance evaluation on DistLODStats Evaluation
  • 35. 35 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn ❖ Scalability performance evaluation on DistLODStats Evaluation We can see that as the number of workers increases, the execution time cost is super-linear on BSBM_50GB dataset.
  • 36. 36 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn ❖ Speedup Ratio and Efficiency of DistLODStats Evaluation
  • 37. 37 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn ❖ Speedup Ratio and Efficiency of DistLODStats Evaluation The speedup performance trend is consistent as the number of workers increases. Efficiency increased only up to the 4th worker for BSBM_50GB dataset. The results imply that DistLODStats can achieve near linear or even super linear scalability in performance, which answers Q3. * speedup (S) = * efficiency (E) = - execution time of the algorithm on local mode - time required to complete the task on N workers
  • 38. 38 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn Powered by <http://lodstats.aksw.org/> Comprehensive statistics – LODStats DistLODStats is used as underlying engine overcoming the previous limitations and generating statistical descriptions, including e.g. VoID, for large parts of the Linked Open Data Cloud
  • 39. 39 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn Powered by <https://aleth.io/> Blockchain – Alethio Use Case Alethio is using SANSA in general and DistLODStats specifically in order to perform large-scale batch analytics, e.g. computing the asset turnover for sets of accounts, computing attack pattern frequencies and Opcode usage statistics. DistLODStats was run on a 100 node cluster with 400 cores <http://lodstats.aksw.org/> Comprehensive statistics – LODStats DistLODStats is used as underlying engine overcoming the previous limitations and generating statistical descriptions, including e.g. VoID, for large parts of the Linked Open Data Cloud
  • 40. 40 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn Powered by <https://aleth.io/> Blockchain – Alethio Use Case Alethio is using SANSA in general and DistLODStats specifically in order to perform large-scale batch analytics, e.g. computing the asset turnover for sets of accounts, computing attack pattern frequencies and Opcode usage statistics. DistLODStats was run on a 100 node cluster with 400 cores <http://lodstats.aksw.org/> Comprehensive statistics – LODStats DistLODStats is used as underlying engine overcoming the previous limitations and generating statistical descriptions, including e.g. VoID, for large parts of the Linked Open Data Cloud <https://www.big-data-europe.eu/> Big Data Platform – BDE DistLODStats is used for computing statistics over those logs within the BDE platform. BDE uses the Mu Swarm Logger service for detecting docker events and convert their representation to RDF. In order to generate visualisations of log statistics, BDE then calls DistLODStats from SANSA-Notebooks
  • 41. 41 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn Powered by <https://aleth.io/> Blockchain – Alethio Use Case Alethio is using SANSA in general and DistLODStats specifically in order to perform large-scale batch analytics, e.g. computing the asset turnover for sets of accounts, computing attack pattern frequencies and Opcode usage statistics. DistLODStats was run on a 100 node cluster with 400 cores <http://lodstats.aksw.org/> Comprehensive statistics – LODStats DistLODStats is used as underlying engine overcoming the previous limitations and generating statistical descriptions, including e.g. VoID, for large parts of the Linked Open Data Cloud <https://www.big-data-europe.eu/> Big Data Platform – BDE DistLODStats is used for computing statistics over those logs within the BDE platform. BDE uses the Mu Swarm Logger service for detecting docker events and convert their representation to RDF. In order to generate visualisations of log statistics, BDE then calls DistLODStats from SANSA-Notebooks <http://abstat.disco.unimib.it/> LOD Summaries – ABSTAT DistLODStats is used for data set summarisation of large-scale RDF datasets in this context
  • 42. 42 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn ❖ Obtaining an overview over the Web of Data: ➢ data-intensive and computing-intensive ➢ challenge to develop fast and efficient algorithms that can handle large scale RDF datasets ❖ DistLODStats, a novel software component (integrated into the larger SANSA framework) for distributed in-memory computation of RDF Datasets statistics implemented using the Spark framework ❖ As a future work we plan to further improve time efficiency and perform load balancing Conclusion
  • 43. 43 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn STATisfy: A REST Interface for DistLODStats CollaborativeAnalyticsServices Marketplace REST Server BigDataEurope Local Cluster Standalone Resource manager Master Worker 1 Worker 2 Worker n SANSA DistLODStats
  • 44. 44 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn STATisfy: A REST Interface for DistLODStats CollaborativeAnalyticsServices Marketplace REST Server BigDataEurope Local Cluster Standalone Resource manager Master Worker 1 Worker 2 Worker n SANSA DistLODStats Visit our poster [P29] at 7pm!
  • 45. THANK YOU! SANSA Semantic Analytics Stack https://github.com/SANSA-Stack ● SANSA-RDF ● SANSA-OWL ● SANSA-Query ● SANSA-Inference ● SANSA-ML ● SANSA-Examples ● SANSA-Notebooks Gezim Sejdiu PhD Student Smart Data Analytics / University of Bonn http://sda.tech @Gezim_Sejdiu https://gezimsejdiu.github.io/ <http://sansa-stack.net/distlodstats/>
  • 46. 46 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn [1]. J. Lehmann, G. Sejdiu, L. Buhmann, P. Westphal, C. Stadler, I. Ermilov, S. Bin, ¨ N. Chakraborty, M. Saleem, A.-C. N. Ngonga, and H. Jabeen. Distributed Semantic Analytics using the SANSA Stack. In Proceedings of 16th International Semantic Web Conference, 2017 [2]. J. Demter, S. Auer, M. Martin, and J. Lehmann. Lodstats—an extensible framework for high-performance dataset analytics. In Proceedings of the EKAW 2012, Lecture Notes in Computer Science (LNCS) 7603. Springer, 2012 References
  • 47. 47 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn Backup Slides
  • 48. 48 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn Introduction ❖ Over the last years, the size of the Semantic Web has increased and several large-scale datasets were published ➢ As of May 2007 Source: LOD-Cloud (http://lod-cloud.net/ )
  • 49. 49 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn Introduction ❖ Over the last years, the size of the Semantic Web has increased and several large-scale datasets were published ➢ As of September 2008 Source: LOD-Cloud (http://lod-cloud.net/ )
  • 50. 50 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn Introduction ❖ Over the last years, the size of the Semantic Web has increased and several large-scale datasets were published ➢ As of September 2011 Source: LOD-Cloud (http://lod-cloud.net/ )
  • 51. 51 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn ❖ Overall Breakdown by Criterion Analysis (log scale) Evaluation
  • 52. 52 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn ❖ Overall Breakdown by Criterion Analysis (log scale) Evaluation The execution time is longer when there is data movement in the cluster compared to when data is processed without movement.
  • 53. 53 DistLODStats: Distributed Computation of RDF Dataset Statistics Gezim Sejdiu - University of Bonn ❖ Overall Breakdown by Criterion Analysis (log scale) Evaluation There are some criteria that are quite efficient to compute even with data movement e.g. 22, 23. This is because data is largely filtered before the movement.