SlideShare a Scribd company logo
1 of 47
Slide 1Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Ansgar Scherp
Mining and Managing
Large-scale Linked Open Data
GVDB, NΓΆrten-Hardenberg, May 25, 2016
Thanks to: Chifumi Nishioka, Renata Dividino, Thomas Gottron,
and many more …
Slide 2Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Team Knowledge Discovery @
Ansgar
Scherp
Ahmed
Saleh
Chifumi
Nishioka
Falk
BΓΆschen
Mohammad
Abdel-Qader
Till Blume
Anke
Koslowski
(Secretariat)
Henrik
Schmidt
(Engineer)
Lukas
Galke
Florian
Mai
&
Slide 3Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Linked Open Data (LOD) Cloud
β€’ Publishing and interlinking data on the web
β€’ Different quality, purpose, and sources
β€’ Using the Resource Description Framework(RDF)
World Wide Web LOD Cloud
Documents Data
Hyperlinks via <a> Typed Links
HTML RDF
Addresses (URIs) Addresses (URIs)
Slide 4Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Relevance of Linked Data?
Slide 5Prof. Ansgar Scherp – asc@informatik.uni-kiel.de1000+ Datasets, 50+ Billion Triples
Media
Geographic
Publications
Web 2.0
eGovernment
Cross-Domain
Life
Sciences
Linked Data: May β€˜07 οƒ  August β€˜14
Source: http://lod-cloud.net
Social Networking
Slide 6Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
LOD on One Slide: Example Graph
biglynx:matt-briggs
foaf:Person
rdf:type
Fully qualified URI using vocabulary prefixes:
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix biglynx: <http://biglynx.co.uk/people/> .
Object
Predicate
Subject
RDF Triple
Slide 7Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
LOD on One Slide: Example Graph
biglynx:matt-briggs
foaf:Person
rdf:type
Fully qualified URI using vocabulary prefixes:
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix biglynx: <http://biglynx.co.uk/people/> .
biglynx:Director
rdf:type …
…
Slide 8Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
LOD on One Slide: Example Graph
biglynx:matt-briggs
foaf:Person
biglynx:dave-smith
biglynx:Director
rdf:type
foaf:knows
rdf:type
_1:point
wgs84:
lat
wgs84:
long
dp:London
foaf:based_near
……
…
…
ex:loc
β€œ-0.118”
β€œ51.509”
Types
Properties
Entity
Slide 9Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Motivation for the SchemEX Index
β€’ Single entry point to query the LOD cloud
β€’ Search for data sources containing entities like
– β€˜Persons, who are Politicians and Actors’
– β€˜Research data sets’
– β€˜Scientific publications’
Query
SELECT ?x
FROM …
WHERE {
?x rdf:type ex:Actor .
?x rdf:type ex:Politician . }
Index1
2
2
2
Slide 10Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Input Data for SchemEX
β€’ Quads: <subject> <predicate> <object> <context>
β€’ Example:
<http://biglynx.co.uk/people/matt-briggs>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://xmlns.com/foaf/0.1/Person>
<http://biglynx.co.uk/people/matt-briggs.rdf>
<http://biglynx.co.uk/people/
matt-briggs.rdf>
rdf:type
biglynx:
matt-briggs
foaf:
Person
LOD Cloud
Dataset 𝑋
Slide 11Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
SchemEX Idea
β€’ Schema-level index SchemEX
β€’ Assign RDF entities to graph patterns
β€’ Map graph patterns to data sources (context)
β€’ Defined over entities, but store the context
β€’ Construction of schema-level index
β€’ Stream-based for scalability
β€’ Stratified bi-simulation for detecting patterns
β€’ Little loss of accuracy
[KGS+12]
Slide 12Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Building the Index from a Stream
β€’ Stream of quads coming from a LD crawler
… Q16, Q15, Q14, Q13, Q12, Q11, Q10, Q9, Q8, Q7, Q6, Q5, Q4, Q3, Q2, Q1
FiFo
4
3
2
1
1
6
2
3
4
5
C3
C2
C2
C1
+ Reasonable accuracy at cache size of 50k
Slide 13Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Full BTC 2011Data Set: 2.17 Bn Triples
Cache size: 50 k
Winner
BTC’11
+ Linear runtime with respect to number of triples
+ Memory consumption scales with window size
Slide 14Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
[GSK+13] Generalization
Specialization
Result list with
examples
Inspired by
Google
Slide 15Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
LODatio Under the Hood
SPARQL
Snippets
Generalize
Retrieve
Data Sources
Query
translation
Rank
Specialize
Count
Select
Select
β€’ Hybrid database with off-the-shelf components
Slide 16Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
LOD on One Slide: Recap
biglynx:matt-briggs
foaf:Person
biglynx:dave-smith
biglynx:Director
rdf:type
foaf:knows
rdf:type
_1:point
wgs84:
lat
wgs84:
long
dp:London
foaf:based_near
……
…
…
ex:loc
β€œ-0.118”
β€œ51.509”
Type Set (TS)
Property Set (PS)
Information theoretic analyses of LOD
β€’ How much information is encoded in TS and PS?
β€’ … information encoded, once TS or PS is known?
β€’ … to which degree are TS and PS redundant?
β€’ Example: 20% of PLDs do not need TS (6% for PS)
[GKS15]
Slide 17Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
β€’ 29 weekly LOD snapshots of ~100 Mio triples
β€’ Still running since May 2012 (now 200+ weeks)
KΓ€fer et al.’s Temporal Analysis of LOD
β€’ Data on the cloud changes a lot
[KΓ€fer et al., 2013] T. KΓ€fer, A. Abdelrahman, J. Umbrich, P. O'Byrne, A. Hogan: Observing Linked
Data Dynamics. ESWC 2013: 213-227
Changes?
β€’ But vocabularies defining RDF types and
properties are highly static, e.g., RDF, FOAF
LOD cloud ~2012 LOD cloud ~2014
Slide 18Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
𝐻(𝑃𝑆|𝑇𝑆=𝑑𝑠)
𝐻(𝑇𝑆|𝑃𝑆=𝑝𝑠)
But:DoChangesOccurinPS and TS?
β€’ Analysis: expected conditional entropy over time
β€’ 𝐻(𝑃𝑆|𝑇𝑆 = 𝑑𝑠): entropy of 𝑃𝑆 given 𝑇𝑆 is known
β€’ Observation: types become less important
β€’ Changes in the use of TS and PS ? !
Slide 19Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Changes over Time
β€’ Extended characteristic sets: ECS = PS βˆͺ TS
# of ECS
Avg.: 83.898 ECS per week
# of ECS
[DSG+13]
β€’ Avg. 73% of ECS re-occur next week (orange)
β€’ Avg. 35% of ECS remain unchanged (blue)
β€’ Avg. 20% of entity sets of ECS change / week
[Neumann and Moerkotte, 2011] Thomas Neumann, Guido Moerkotte: Characteristic sets:
Accurate cardinality estimation for RDF queries with multiple joins. ICDE 2011: 984-994
[Neumann and
Moerkotte, 2011]
Slide 20Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Temporal Dynamics of the Entities?
β€’ Notion of entity motivated by ECS: entity is a
set of triples 𝑋 sharing the same subject URI 𝑠
β€’ Example:
–1 entity
–4 triples
w.l.o.g.
β€’ Useful to keep LOD caches up-to-date?
β€’ Can we predict when LOD sources will change?
Slide 21Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Dynamics Function Θ
β€’ Definition of Θ over change rate function 𝑐(𝑋𝑑)
Time
X
𝑑𝑖 𝑑𝑗
Θ
Θ 𝑑 𝑖
𝑋 = Θ(𝑋𝑑 𝑗
) βˆ’ Θ(𝑋𝑑 𝑖
) = 𝑑 𝑖
𝑑 𝑗
𝑐 𝑋𝑑 d𝑑
[DGS+14]
𝑑𝑗
β‰ˆ
π‘˜=𝑖+1
𝑗
𝛿(𝑋𝑑 π‘˜βˆ’1
, 𝑋𝑑 π‘˜
)
β€’ Approximation as step function over changes
Monotone,
non-negative
c
Slide 22Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Update Strategies for LOD Sources
β€’ Apply strategies from keeping caches of WWW
documents up-to-date to maintain LOD caches
β€’ Assumptions
–LOD is fetched from various sources
–Sources are scored and prioritized based on
strategy
–Data of a source is fetched only when the
operation can be entirely executed
Slide 23Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Scheduling Update Strategies
a) HTTP Header [Dividino et al., 2014a]
b) Age or Last Visited [Dasdan et al., 2009, Cho and
Garcia-Molina, 2000]
c) PageRank [Page et al., 1999, Boldi et al., 2004,
Baeza-Yates et al., 2005]
d) LOD Sources Size
e) Change Ratio [Douglis et al., 1997, Cho et al., 2002.
Tan et al., 2007]
f) Change Rate [Olston et al., 2002, Ntoulas et al.,
2004, Dividino et al., 2013]
g) History Information: Dynamics [Dividino et al., 2014b]
We borrow strategies developed for the WWW and
metrics for data change analysis in the LOD cloud.
Slide 24Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Ranking
Sources which
changed (most)
Sources that not
changed/less changesTimeti tj
e) Change Ratio
β€’ Captures the change
frequency of the data
(freshness)
β€’ Percentage of data items
in the cache that are up-to-date
Slide 25Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
f) Change Rate
β€’ Data from sources which are less similar which their
previous update (snapshot) should be updated first
β€’ Comparison of two RDF data sets
– 𝑋 : Set of triple statements
– 𝛿 : Numeric expression (distance)
π›Ώπ½π‘Žπ‘π‘π‘Žπ‘Ÿπ‘‘ 𝑋1, 𝑋2 =
1 βˆ’
𝑋1 ∩ 𝑋2
𝑋1 βˆͺ 𝑋2
0,Β₯[ )
Time𝑑𝑖 𝑑𝑗
𝛿
Example:
Slide 26Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
g) History Information: Dynamics
β€’ Data from sources which most evolve in a given
period of time should be updated first
β€’ Uses both history information and change rate
Θ(𝑋𝑑 𝑗
) βˆ’ Θ(𝑋𝑑 𝑖
) = 𝑑 𝑖
𝑑 𝑗
𝑐 𝑋𝑑 d𝑑
Time
X
𝑑𝑖 𝑑𝑗
Θ
c
β‰ˆ
π‘˜=𝑖+1
𝑗
𝛿(𝑋𝑑 π‘˜βˆ’1
, 𝑋𝑑 π‘˜
)
Slide 27Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Evaluation
ο‚Ÿ Idea: simulation of limitations of available
computational resources (network bandwidth,
computation time)
Time
100%
𝑑𝑖 𝑑𝑖+1
Slide 28Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Evaluation: Single Step Update
Time
100%
15%
5%40%
75%
95%60%
𝑑𝑖 𝑑𝑖+1
Slide 29Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Evaluation: Iterative Updates
Time
. . .
15%
5%40%
75%
95%60%
15%
5%40%
75%
95%60%
100%
𝑑𝑖 𝑑𝑖+1 𝑑𝑖+2
Slide 30Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Dataset
β€’ Dynamic Linked Data Observatory
β€’ Weekly snapshots, 14 M triples
ο‚Ÿ 154 snapshots (approx. 3 years)
ο‚Ÿ 590 data sources (PLD)
Top 10 largest data sources Average size
dbpedia.org 3,406,364.5
edgarwrap.ontologycentral.com 982,631.0
dbtune.org 864,107.6
dbtropes.org 787,299.9
data.linkedct.org 498,986.3
aims.fao.org 416,708.9
www.legislation.gov.uk 399,601.6
kent.zpr.fer.hr 387,034.8
identi.ca 278,316.2
webenemasuno.linkeddata.es 250,557.9
Slide 31Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Metrics:Precision & Recall
β€’ Precision: portion of cached data that are
actually up-to-date
β€’ Recall: portion of data in the LOD cloud that
is identical to the cached data
Cached data
Actual data on the LOD cloud
(w.r.t. to the 590 sources considered)
Slide 32Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Results: Single Step Update
Time
t jti
100% 15%
5%40%
75%
95%60%
Slide 33Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Results: Iterative Updates
Time
tjti tj
. . .
15%
5%40%
75%
95%60%
15%
5%
40%
75%
95%60%
100%
Slide 34Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Results: Iterative Updates
Time
tjti tj
. . .
15%
5%40%
75%
95%60%
15%
5%
40%
75%
95%60%
100%
Slide 35Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Results: Iterative Updates
Time
tjti tj
. . .
15%
5%40%
75%
95%60%
15%
5%
40%
75%
95%60%
100%
Slide 36Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Results: Summary
ο‚Ÿ Best strategies: ones which
capture the change
behaviour over time
ο‚Ÿ Specially for low relative
bandwidth
Slide 37Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Dynamics Function Θ: Revisited
Time
X
𝑑𝑖 𝑑𝑗
c
β€’ Can we predict when LOD sources will change?
β€’ Notion of dynamics to compute periodicities!
β€’ Dynamics as vector of changes:
< 𝛿(𝑋𝑑1
, 𝑋𝑑2
), … , 𝛿(𝑋𝑑 π‘βˆ’1
, 𝑋𝑑 𝑁
) >
Slide 38Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Temporal Clustering of Entities
β€’ Dynamics as vector: < 𝛿(𝑋𝑑1
, 𝑋𝑑2
), … , 𝛿(𝑋𝑑 π‘βˆ’1
, 𝑋𝑑 𝑁
) >
Time
Change(logscale)
[NS15]
β€’ Clustering with
k-means++ to
find patterns
β€’ 165 snapshots
β€’ 65,044 entities
β€’ 7 patterns (after
optimizing π‘˜)
Slide 39Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Periodicity of Entity Dynamics
β€’ Examples: < 0, 3, 2, 0, 3, 2, 0 >, < 1, 2, 1, 2, 1, 2 >
# of
entities
Most likely
periodicity
C1 12,982 66
C2 168 23
C3 35 1
C4 12 1
C5 1 1
C6 1,541 56
C7 30 37
CS 50,725
[Elfeky et al., 2005] Mohamed G. Elfeky, Walid G. Aref, Ahmed K. Elmagarmid:
Periodicity Detection in Time Series Databases. IEEE Trans. Knowl. Data Eng.
17(7): 875-887 (2005)
β€’ Convolution-based algorithm
[Elfeky et al. 2005]
β€’ Entities of legislation.gov.uk
found in several clusters
(C1,C3,C4,C5,C6)
β€’ No changes (CS): 77.29%
β€’ CS: entities from w3.org and
ontologydesignpatterns.org
Slide 40Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Application Areas: More than One!
β€’ Searching for LOD sources
[GSK+13,KGS+12]
β€’ Strategies for updating data caches [DGS15]
β€’ Programming queries against LOD [SSS12]
β€’ Recommending LOD vocabularies [SGS16]
 Foundation for Future Data-driven Applications
Slide 41Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Summary: KDD in Social Media & DL
How to deal with the vast amount of content related to
research and innovation?
β€’ H2020 INSO-4 project, duration: 04/2016-03/2019
β€’ Data mining & visualization tools enabling information
professionals to deal with large corpora
β€’ Website: http://www.moving-project.eu/
New
Slide 42Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Got Interested?
Knowledge Discovery at ZBW
Contact me!
Prof. Dr. Ansgar Scherp
β€’ Email: a.scherp@zbw.eu
β€’ Twitter: https://twitter.com/ansgarscherp
β€’ Slideshare: http://de.slideshare.net/ascherp
β€’ KD-Website:
http://www.zbw.eu/en/research/knowledge-discovery/
http://www.kd.informatik.uni-kiel.de/en/
Slide 43Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
References
[DGS15] R. Dividino, T. Gottron, A. Scherp: Strategies for Efficiently Keeping Local
Linked Open Data Caches Up-To-Date. International Semantic Web Conference (2)
2015: 356-373
[DGS+14] R. Dividino, T. Gottron, A. Scherp, G. GrΓΆner: From Changes to Dynamics:
Dynamics Analysis of Linked Open Data Sources. PROFILES@ESWC 2014
[GKS15] T. Gottron, M. Knauf, A. Scherp: Analysis of schema structures in the Linked
Open Data graph based on unique subject URIs, pay-level domains, and vocabulary
usage. Distributed and Parallel Databases 33(4): 515-553 (2015)
[DSG+13] R. Dividino, A. Scherp, G. GrΓΆner, T. Gottron: Change-a-LOD: Does the
Schema on the Linked Data Cloud Change or Not? COLD 2013
[GSK+13] T. Gottron, A. Scherp, B. Krayer, A. Peters: LODatio: using a schema-level
index to support users in finding relevant sources of linked data. K-CAP 2013: 105-108
[KGS+12] M. Konrath, T. Gottron, S. Staab, A. Scherp: SchemEX - Efficient construction
of a data catalogue by stream-based indexing of linked data. J. Web Sem. 16: 52-58
(2012)
[NS15] C. Nishioka, A Scherp: Temporal Patterns and Periodicity of Entity Dynamics in
the Linked Open Data Cloud. K-CAP 2015.
[SGS16] J. Schaible, T. Gottron, and A. Scherp: TermPicker Enabling the Reuse of
Vocabulary Terms by Exploiting Data from the Linked Open Data Cloud, ESWC,
Springer, 2016.
[SSS12] S. Scheglmann, A. Scherp, S. Staab: Declarative Representation of
Programming Access to Ontologies. ESWC 2012: 659-673
Slide 44Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
a) HTTP Header
β€’ Data from sources which have been changed
since the last update should be updated first
HTTP Response
HEADER
…
Last-Modified: Tue, 15 Nov 1994 12:45:26
GMT
CONTENT
Slide 45Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
b) Age or Last Visited
β€’ Time elapsed from last
update (the difference
between query time and
last update time)
β€’ It guarantees that every
source is updated after a
period
Ranking
Sources that have been
at longer time updated
Sources that have
been recently updated
Slide 46Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
c) PageRank and d) Source Size
β€’ PageRank captures popularity/
importance of the LOD source
β€’ Data from sources with highest
PageRank are updated first
β€’ LOD source size: data from the
biggest/smallest LOD sources
should be updated first
Ranking
Sources with
higher PR
Sources with
lower PR
Slide 47Prof. Ansgar Scherp – asc@informatik.uni-kiel.de
Results: Single Step Update
Time
t jti
100% 15%
5%40%
75%
95%60%

More Related Content

What's hot

Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Databricks
Β 
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...DataStax Academy
Β 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...MLconf
Β 
Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?Rob Emanuele
Β 
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...Spark Summit
Β 
A Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache SolrA Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache SolrQAware GmbH
Β 
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATLParikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATLMLconf
Β 
Efficient and Fast Time Series Storage - The missing link in dynamic software...
Efficient and Fast Time Series Storage - The missing link in dynamic software...Efficient and Fast Time Series Storage - The missing link in dynamic software...
Efficient and Fast Time Series Storage - The missing link in dynamic software...Florian Lautenschlager
Β 
MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 Albert Bifet
Β 
Artificial intelligence and data stream mining
Artificial intelligence and data stream miningArtificial intelligence and data stream mining
Artificial intelligence and data stream miningAlbert Bifet
Β 
Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Florian Lautenschlager
Β 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeFlink Forward
Β 
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks DataWorks Summit/Hadoop Summit
Β 
The new time series kid on the block
The new time series kid on the blockThe new time series kid on the block
The new time series kid on the blockFlorian Lautenschlager
Β 
Geo exploration simplified with Elastic Maps
Geo exploration simplified with Elastic MapsGeo exploration simplified with Elastic Maps
Geo exploration simplified with Elastic MapsElasticsearch
Β 
Introduction to Generalised Low-Rank Model and Missing Values
Introduction to Generalised Low-Rank Model and Missing ValuesIntroduction to Generalised Low-Rank Model and Missing Values
Introduction to Generalised Low-Rank Model and Missing ValuesJo-fai Chow
Β 
Applying Machine Learning using H2O
Applying Machine Learning using H2OApplying Machine Learning using H2O
Applying Machine Learning using H2OIan Gomez
Β 
Realtime Data Analysis Patterns
Realtime Data Analysis PatternsRealtime Data Analysis Patterns
Realtime Data Analysis PatternsMikio L. Braun
Β 

What's hot (20)

Working with HDF and netCDF Data in ArcGIS: Tools and Case Studies
Working with HDF and netCDF Data in ArcGIS: Tools and Case StudiesWorking with HDF and netCDF Data in ArcGIS: Tools and Case Studies
Working with HDF and netCDF Data in ArcGIS: Tools and Case Studies
Β 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Β 
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
Β 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Β 
Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?
Β 
Python at Warp Speed
Python at Warp SpeedPython at Warp Speed
Python at Warp Speed
Β 
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Β 
A Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache SolrA Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache Solr
Β 
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATLParikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Β 
Efficient and Fast Time Series Storage - The missing link in dynamic software...
Efficient and Fast Time Series Storage - The missing link in dynamic software...Efficient and Fast Time Series Storage - The missing link in dynamic software...
Efficient and Fast Time Series Storage - The missing link in dynamic software...
Β 
MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016
Β 
Artificial intelligence and data stream mining
Artificial intelligence and data stream miningArtificial intelligence and data stream mining
Artificial intelligence and data stream mining
Β 
Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017
Β 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Β 
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Β 
The new time series kid on the block
The new time series kid on the blockThe new time series kid on the block
The new time series kid on the block
Β 
Geo exploration simplified with Elastic Maps
Geo exploration simplified with Elastic MapsGeo exploration simplified with Elastic Maps
Geo exploration simplified with Elastic Maps
Β 
Introduction to Generalised Low-Rank Model and Missing Values
Introduction to Generalised Low-Rank Model and Missing ValuesIntroduction to Generalised Low-Rank Model and Missing Values
Introduction to Generalised Low-Rank Model and Missing Values
Β 
Applying Machine Learning using H2O
Applying Machine Learning using H2OApplying Machine Learning using H2O
Applying Machine Learning using H2O
Β 
Realtime Data Analysis Patterns
Realtime Data Analysis PatternsRealtime Data Analysis Patterns
Realtime Data Analysis Patterns
Β 

Viewers also liked

TRECVID 2016 POSTER CERTH-ITI
TRECVID 2016 POSTER CERTH-ITITRECVID 2016 POSTER CERTH-ITI
TRECVID 2016 POSTER CERTH-ITIMOVING Project
Β 
TRECVID 2016 Ad-hoc Video Search task, CERTH-ITI
TRECVID 2016 Ad-hoc Video Search task, CERTH-ITITRECVID 2016 Ad-hoc Video Search task, CERTH-ITI
TRECVID 2016 Ad-hoc Video Search task, CERTH-ITIMOVING Project
Β 
Including financial criteria in the strategic planning of knowledge repositor...
Including financial criteria in the strategic planning of knowledge repositor...Including financial criteria in the strategic planning of knowledge repositor...
Including financial criteria in the strategic planning of knowledge repositor...MOVING Project
Β 
Information theoritic analysis of entity dynamics on the linked open data cloud
Information theoritic analysis of entity dynamics on the linked open data cloudInformation theoritic analysis of entity dynamics on the linked open data cloud
Information theoritic analysis of entity dynamics on the linked open data cloudMOVING Project
Β 
VIDEO AESTHETIC QUALITY ASSESSMENT USING KERNEL SUPPORT VECTOR MACHINE WITH I...
VIDEO AESTHETIC QUALITY ASSESSMENT USING KERNEL SUPPORT VECTOR MACHINE WITH I...VIDEO AESTHETIC QUALITY ASSESSMENT USING KERNEL SUPPORT VECTOR MACHINE WITH I...
VIDEO AESTHETIC QUALITY ASSESSMENT USING KERNEL SUPPORT VECTOR MACHINE WITH I...MOVING Project
Β 
Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recomm...
Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recomm...Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recomm...
Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recomm...MOVING Project
Β 

Viewers also liked (6)

TRECVID 2016 POSTER CERTH-ITI
TRECVID 2016 POSTER CERTH-ITITRECVID 2016 POSTER CERTH-ITI
TRECVID 2016 POSTER CERTH-ITI
Β 
TRECVID 2016 Ad-hoc Video Search task, CERTH-ITI
TRECVID 2016 Ad-hoc Video Search task, CERTH-ITITRECVID 2016 Ad-hoc Video Search task, CERTH-ITI
TRECVID 2016 Ad-hoc Video Search task, CERTH-ITI
Β 
Including financial criteria in the strategic planning of knowledge repositor...
Including financial criteria in the strategic planning of knowledge repositor...Including financial criteria in the strategic planning of knowledge repositor...
Including financial criteria in the strategic planning of knowledge repositor...
Β 
Information theoritic analysis of entity dynamics on the linked open data cloud
Information theoritic analysis of entity dynamics on the linked open data cloudInformation theoritic analysis of entity dynamics on the linked open data cloud
Information theoritic analysis of entity dynamics on the linked open data cloud
Β 
VIDEO AESTHETIC QUALITY ASSESSMENT USING KERNEL SUPPORT VECTOR MACHINE WITH I...
VIDEO AESTHETIC QUALITY ASSESSMENT USING KERNEL SUPPORT VECTOR MACHINE WITH I...VIDEO AESTHETIC QUALITY ASSESSMENT USING KERNEL SUPPORT VECTOR MACHINE WITH I...
VIDEO AESTHETIC QUALITY ASSESSMENT USING KERNEL SUPPORT VECTOR MACHINE WITH I...
Β 
Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recomm...
Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recomm...Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recomm...
Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recomm...
Β 

Similar to Mining and Managing Large-scale Linked Open Data

On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingPlanetData Network of Excellence
Β 
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...Oscar Corcho
Β 
Chronix: A fast and efficient time series storage based on Apache Solr
Chronix: A fast and efficient time series storage based on Apache SolrChronix: A fast and efficient time series storage based on Apache Solr
Chronix: A fast and efficient time series storage based on Apache SolrFlorian Lautenschlager
Β 
Real-Time Analytics with Apache Cassandra and Apache Spark,
Real-Time Analytics with Apache Cassandra and Apache Spark,Real-Time Analytics with Apache Cassandra and Apache Spark,
Real-Time Analytics with Apache Cassandra and Apache Spark,Swiss Data Forum Swiss Data Forum
Β 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkGuido Schmutz
Β 
Knowledge Discovery in Social Media and Scientific Digital Libraries
Knowledge Discovery in Social Media and Scientific Digital LibrariesKnowledge Discovery in Social Media and Scientific Digital Libraries
Knowledge Discovery in Social Media and Scientific Digital LibrariesAnsgar Scherp
Β 
Data Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RData Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RRadek Maciaszek
Β 
Spark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science LondonSpark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science LondonDatabricks
Β 
Time to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudTime to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudAmazon Web Services
Β 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
Β 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructureAnubhav Jain
Β 
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion StoicaRISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion StoicaSpark Summit
Β 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsJen Aman
Β 
Chronix Time Series Database - The New Time Series Kid on the Block
Chronix Time Series Database - The New Time Series Kid on the BlockChronix Time Series Database - The New Time Series Kid on the Block
Chronix Time Series Database - The New Time Series Kid on the BlockQAware GmbH
Β 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLucidworks
Β 
Leveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkQAware GmbH
Β 
Apache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series databaseApache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series databaseFlorian Lautenschlager
Β 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introductionHektor Jacynycz GarcΓ­a
Β 
Design Pattern of HBase Configuration
Design Pattern of HBase ConfigurationDesign Pattern of HBase Configuration
Design Pattern of HBase ConfigurationDan Han
Β 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilitiesIan Foster
Β 

Similar to Mining and Managing Large-scale Linked Open Data (20)

On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream Processing
Β 
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
Β 
Chronix: A fast and efficient time series storage based on Apache Solr
Chronix: A fast and efficient time series storage based on Apache SolrChronix: A fast and efficient time series storage based on Apache Solr
Chronix: A fast and efficient time series storage based on Apache Solr
Β 
Real-Time Analytics with Apache Cassandra and Apache Spark,
Real-Time Analytics with Apache Cassandra and Apache Spark,Real-Time Analytics with Apache Cassandra and Apache Spark,
Real-Time Analytics with Apache Cassandra and Apache Spark,
Β 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
Β 
Knowledge Discovery in Social Media and Scientific Digital Libraries
Knowledge Discovery in Social Media and Scientific Digital LibrariesKnowledge Discovery in Social Media and Scientific Digital Libraries
Knowledge Discovery in Social Media and Scientific Digital Libraries
Β 
Data Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RData Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and R
Β 
Spark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science LondonSpark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science London
Β 
Time to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudTime to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the Cloud
Β 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
Β 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructure
Β 
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion StoicaRISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
Β 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time Decisions
Β 
Chronix Time Series Database - The New Time Series Kid on the Block
Chronix Time Series Database - The New Time Series Kid on the BlockChronix Time Series Database - The New Time Series Kid on the Block
Chronix Time Series Database - The New Time Series Kid on the Block
Β 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Β 
Leveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with Spark
Β 
Apache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series databaseApache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series database
Β 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introduction
Β 
Design Pattern of HBase Configuration
Design Pattern of HBase ConfigurationDesign Pattern of HBase Configuration
Design Pattern of HBase Configuration
Β 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilities
Β 

More from MOVING Project

Opening up education through digitization. Remarks on recent developments in ...
Opening up education through digitization. Remarks on recent developments in ...Opening up education through digitization. Remarks on recent developments in ...
Opening up education through digitization. Remarks on recent developments in ...MOVING Project
Β 
MOVING: Applying digital science methodology for TVET
MOVING: Applying digital science methodology for TVETMOVING: Applying digital science methodology for TVET
MOVING: Applying digital science methodology for TVETMOVING Project
Β 
Learning analytics for reflective learning
Learning analytics for reflective learningLearning analytics for reflective learning
Learning analytics for reflective learningMOVING Project
Β 
Challenges in Developing Automatic Learning Guidance in Relation to an Inform...
Challenges in Developing Automatic Learning Guidance in Relation to an Inform...Challenges in Developing Automatic Learning Guidance in Relation to an Inform...
Challenges in Developing Automatic Learning Guidance in Relation to an Inform...MOVING Project
Β 
Unesco mobileweek 2019_frontier_tech_oer-final
Unesco mobileweek 2019_frontier_tech_oer-finalUnesco mobileweek 2019_frontier_tech_oer-final
Unesco mobileweek 2019_frontier_tech_oer-finalMOVING Project
Β 
Inferring knowledge acquisition through Web navigation behaviour
Inferring knowledge acquisition through Web navigation behaviourInferring knowledge acquisition through Web navigation behaviour
Inferring knowledge acquisition through Web navigation behaviourMOVING Project
Β 
ITI-CERTH participation in TRECVID 2018
ITI-CERTH participation in TRECVID 2018ITI-CERTH participation in TRECVID 2018
ITI-CERTH participation in TRECVID 2018MOVING Project
Β 
Wissenschaft 2.0 und offene Forschungsmethoden vermitteln– Der MOOC "Science ...
Wissenschaft 2.0 und offene Forschungsmethoden vermitteln– Der MOOC "Science ...Wissenschaft 2.0 und offene Forschungsmethoden vermitteln– Der MOOC "Science ...
Wissenschaft 2.0 und offene Forschungsmethoden vermitteln– Der MOOC "Science ...MOVING Project
Β 
Wissenschaft 2.0 und offene Forschungsmethoden vermitteln: Der MOOC Science 2...
Wissenschaft 2.0 und offene Forschungsmethoden vermitteln: Der MOOC Science 2...Wissenschaft 2.0 und offene Forschungsmethoden vermitteln: Der MOOC Science 2...
Wissenschaft 2.0 und offene Forschungsmethoden vermitteln: Der MOOC Science 2...MOVING Project
Β 
VERGE: A Multimodal Interactive Search Engine for Video Browsing and Retrieval
VERGE: A Multimodal Interactive Search Engine for Video Browsing and RetrievalVERGE: A Multimodal Interactive Search Engine for Video Browsing and Retrieval
VERGE: A Multimodal Interactive Search Engine for Video Browsing and RetrievalMOVING Project
Β 
Temporal Lecture Video Fragmentation using Word Embeddings
Temporal Lecture Video Fragmentation using Word EmbeddingsTemporal Lecture Video Fragmentation using Word Embeddings
Temporal Lecture Video Fragmentation using Word EmbeddingsMOVING Project
Β 
The Impact of Blocking and Name-Matching on Author Disambiguation.
The Impact of Blocking and Name-Matching on Author Disambiguation.The Impact of Blocking and Name-Matching on Author Disambiguation.
The Impact of Blocking and Name-Matching on Author Disambiguation.MOVING Project
Β 
Effective Unsupervised Author Disambiguation with Relative Frequencies
Effective Unsupervised Author Disambiguation with Relative FrequenciesEffective Unsupervised Author Disambiguation with Relative Frequencies
Effective Unsupervised Author Disambiguation with Relative FrequenciesMOVING Project
Β 
What to read next? Challenges and Preliminary Results in Selecting Represen...
What to read next? Challenges and  Preliminary Results in Selecting  Represen...What to read next? Challenges and  Preliminary Results in Selecting  Represen...
What to read next? Challenges and Preliminary Results in Selecting Represen...MOVING Project
Β 
Qualitative Analysis of Vocabulary Evolution on the Linked Open Data Cloud
Qualitative Analysis of Vocabulary Evolution on the Linked Open Data CloudQualitative Analysis of Vocabulary Evolution on the Linked Open Data Cloud
Qualitative Analysis of Vocabulary Evolution on the Linked Open Data CloudMOVING Project
Β 
Analyzing the Evolution of Vocabulary Terms and Their Impact on the LOD Cloud
Analyzing the Evolution of Vocabulary Terms and Their Impact on the LOD CloudAnalyzing the Evolution of Vocabulary Terms and Their Impact on the LOD Cloud
Analyzing the Evolution of Vocabulary Terms and Their Impact on the LOD CloudMOVING Project
Β 
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...MOVING Project
Β 
Deep Multi-task Learning with Label Correlation Constraint for Video Concept ...
Deep Multi-task Learning with Label Correlation Constraint for Video Concept ...Deep Multi-task Learning with Label Correlation Constraint for Video Concept ...
Deep Multi-task Learning with Label Correlation Constraint for Video Concept ...MOVING Project
Β 
Generic to Specific Recognition Models for Membership Analysis in Group Videos
Generic to Specific Recognition Models for Membership Analysis in Group VideosGeneric to Specific Recognition Models for Membership Analysis in Group Videos
Generic to Specific Recognition Models for Membership Analysis in Group VideosMOVING Project
Β 
MOVING the Industry 4.0
MOVING the Industry 4.0MOVING the Industry 4.0
MOVING the Industry 4.0MOVING Project
Β 

More from MOVING Project (20)

Opening up education through digitization. Remarks on recent developments in ...
Opening up education through digitization. Remarks on recent developments in ...Opening up education through digitization. Remarks on recent developments in ...
Opening up education through digitization. Remarks on recent developments in ...
Β 
MOVING: Applying digital science methodology for TVET
MOVING: Applying digital science methodology for TVETMOVING: Applying digital science methodology for TVET
MOVING: Applying digital science methodology for TVET
Β 
Learning analytics for reflective learning
Learning analytics for reflective learningLearning analytics for reflective learning
Learning analytics for reflective learning
Β 
Challenges in Developing Automatic Learning Guidance in Relation to an Inform...
Challenges in Developing Automatic Learning Guidance in Relation to an Inform...Challenges in Developing Automatic Learning Guidance in Relation to an Inform...
Challenges in Developing Automatic Learning Guidance in Relation to an Inform...
Β 
Unesco mobileweek 2019_frontier_tech_oer-final
Unesco mobileweek 2019_frontier_tech_oer-finalUnesco mobileweek 2019_frontier_tech_oer-final
Unesco mobileweek 2019_frontier_tech_oer-final
Β 
Inferring knowledge acquisition through Web navigation behaviour
Inferring knowledge acquisition through Web navigation behaviourInferring knowledge acquisition through Web navigation behaviour
Inferring knowledge acquisition through Web navigation behaviour
Β 
ITI-CERTH participation in TRECVID 2018
ITI-CERTH participation in TRECVID 2018ITI-CERTH participation in TRECVID 2018
ITI-CERTH participation in TRECVID 2018
Β 
Wissenschaft 2.0 und offene Forschungsmethoden vermitteln– Der MOOC "Science ...
Wissenschaft 2.0 und offene Forschungsmethoden vermitteln– Der MOOC "Science ...Wissenschaft 2.0 und offene Forschungsmethoden vermitteln– Der MOOC "Science ...
Wissenschaft 2.0 und offene Forschungsmethoden vermitteln– Der MOOC "Science ...
Β 
Wissenschaft 2.0 und offene Forschungsmethoden vermitteln: Der MOOC Science 2...
Wissenschaft 2.0 und offene Forschungsmethoden vermitteln: Der MOOC Science 2...Wissenschaft 2.0 und offene Forschungsmethoden vermitteln: Der MOOC Science 2...
Wissenschaft 2.0 und offene Forschungsmethoden vermitteln: Der MOOC Science 2...
Β 
VERGE: A Multimodal Interactive Search Engine for Video Browsing and Retrieval
VERGE: A Multimodal Interactive Search Engine for Video Browsing and RetrievalVERGE: A Multimodal Interactive Search Engine for Video Browsing and Retrieval
VERGE: A Multimodal Interactive Search Engine for Video Browsing and Retrieval
Β 
Temporal Lecture Video Fragmentation using Word Embeddings
Temporal Lecture Video Fragmentation using Word EmbeddingsTemporal Lecture Video Fragmentation using Word Embeddings
Temporal Lecture Video Fragmentation using Word Embeddings
Β 
The Impact of Blocking and Name-Matching on Author Disambiguation.
The Impact of Blocking and Name-Matching on Author Disambiguation.The Impact of Blocking and Name-Matching on Author Disambiguation.
The Impact of Blocking and Name-Matching on Author Disambiguation.
Β 
Effective Unsupervised Author Disambiguation with Relative Frequencies
Effective Unsupervised Author Disambiguation with Relative FrequenciesEffective Unsupervised Author Disambiguation with Relative Frequencies
Effective Unsupervised Author Disambiguation with Relative Frequencies
Β 
What to read next? Challenges and Preliminary Results in Selecting Represen...
What to read next? Challenges and  Preliminary Results in Selecting  Represen...What to read next? Challenges and  Preliminary Results in Selecting  Represen...
What to read next? Challenges and Preliminary Results in Selecting Represen...
Β 
Qualitative Analysis of Vocabulary Evolution on the Linked Open Data Cloud
Qualitative Analysis of Vocabulary Evolution on the Linked Open Data CloudQualitative Analysis of Vocabulary Evolution on the Linked Open Data Cloud
Qualitative Analysis of Vocabulary Evolution on the Linked Open Data Cloud
Β 
Analyzing the Evolution of Vocabulary Terms and Their Impact on the LOD Cloud
Analyzing the Evolution of Vocabulary Terms and Their Impact on the LOD CloudAnalyzing the Evolution of Vocabulary Terms and Their Impact on the LOD Cloud
Analyzing the Evolution of Vocabulary Terms and Their Impact on the LOD Cloud
Β 
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Β 
Deep Multi-task Learning with Label Correlation Constraint for Video Concept ...
Deep Multi-task Learning with Label Correlation Constraint for Video Concept ...Deep Multi-task Learning with Label Correlation Constraint for Video Concept ...
Deep Multi-task Learning with Label Correlation Constraint for Video Concept ...
Β 
Generic to Specific Recognition Models for Membership Analysis in Group Videos
Generic to Specific Recognition Models for Membership Analysis in Group VideosGeneric to Specific Recognition Models for Membership Analysis in Group Videos
Generic to Specific Recognition Models for Membership Analysis in Group Videos
Β 
MOVING the Industry 4.0
MOVING the Industry 4.0MOVING the Industry 4.0
MOVING the Industry 4.0
Β 

Recently uploaded

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
Β 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
Β 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
Β 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
Β 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
Β 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
Β 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
Β 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
Β 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
Β 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
Β 
Mcleodganj Call Girls πŸ₯° 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls πŸ₯° 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls πŸ₯° 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls πŸ₯° 8617370543 Service Offer VIP Hot ModelDeepika Singh
Β 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
Β 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
Β 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
Β 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Christopher Logan Kennedy
Β 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
Β 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
Β 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
Β 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
Β 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
Β 

Recently uploaded (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Β 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
Β 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Β 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Β 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
Β 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Β 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Β 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
Β 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Β 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
Β 
Mcleodganj Call Girls πŸ₯° 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls πŸ₯° 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls πŸ₯° 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls πŸ₯° 8617370543 Service Offer VIP Hot Model
Β 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Β 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Β 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Β 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
Β 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Β 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
Β 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Β 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
Β 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
Β 

Mining and Managing Large-scale Linked Open Data

  • 1. Slide 1Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Ansgar Scherp Mining and Managing Large-scale Linked Open Data GVDB, NΓΆrten-Hardenberg, May 25, 2016 Thanks to: Chifumi Nishioka, Renata Dividino, Thomas Gottron, and many more …
  • 2. Slide 2Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Team Knowledge Discovery @ Ansgar Scherp Ahmed Saleh Chifumi Nishioka Falk BΓΆschen Mohammad Abdel-Qader Till Blume Anke Koslowski (Secretariat) Henrik Schmidt (Engineer) Lukas Galke Florian Mai &
  • 3. Slide 3Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Linked Open Data (LOD) Cloud β€’ Publishing and interlinking data on the web β€’ Different quality, purpose, and sources β€’ Using the Resource Description Framework(RDF) World Wide Web LOD Cloud Documents Data Hyperlinks via <a> Typed Links HTML RDF Addresses (URIs) Addresses (URIs)
  • 4. Slide 4Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Relevance of Linked Data?
  • 5. Slide 5Prof. Ansgar Scherp – asc@informatik.uni-kiel.de1000+ Datasets, 50+ Billion Triples Media Geographic Publications Web 2.0 eGovernment Cross-Domain Life Sciences Linked Data: May β€˜07 οƒ  August β€˜14 Source: http://lod-cloud.net Social Networking
  • 6. Slide 6Prof. Ansgar Scherp – asc@informatik.uni-kiel.de LOD on One Slide: Example Graph biglynx:matt-briggs foaf:Person rdf:type Fully qualified URI using vocabulary prefixes: @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://w3.org/1999/02/22-rdf-syntax-ns#> . @prefix biglynx: <http://biglynx.co.uk/people/> . Object Predicate Subject RDF Triple
  • 7. Slide 7Prof. Ansgar Scherp – asc@informatik.uni-kiel.de LOD on One Slide: Example Graph biglynx:matt-briggs foaf:Person rdf:type Fully qualified URI using vocabulary prefixes: @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://w3.org/1999/02/22-rdf-syntax-ns#> . @prefix biglynx: <http://biglynx.co.uk/people/> . biglynx:Director rdf:type … …
  • 8. Slide 8Prof. Ansgar Scherp – asc@informatik.uni-kiel.de LOD on One Slide: Example Graph biglynx:matt-briggs foaf:Person biglynx:dave-smith biglynx:Director rdf:type foaf:knows rdf:type _1:point wgs84: lat wgs84: long dp:London foaf:based_near …… … … ex:loc β€œ-0.118” β€œ51.509” Types Properties Entity
  • 9. Slide 9Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Motivation for the SchemEX Index β€’ Single entry point to query the LOD cloud β€’ Search for data sources containing entities like – β€˜Persons, who are Politicians and Actors’ – β€˜Research data sets’ – β€˜Scientific publications’ Query SELECT ?x FROM … WHERE { ?x rdf:type ex:Actor . ?x rdf:type ex:Politician . } Index1 2 2 2
  • 10. Slide 10Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Input Data for SchemEX β€’ Quads: <subject> <predicate> <object> <context> β€’ Example: <http://biglynx.co.uk/people/matt-briggs> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> <http://biglynx.co.uk/people/matt-briggs.rdf> <http://biglynx.co.uk/people/ matt-briggs.rdf> rdf:type biglynx: matt-briggs foaf: Person LOD Cloud Dataset 𝑋
  • 11. Slide 11Prof. Ansgar Scherp – asc@informatik.uni-kiel.de SchemEX Idea β€’ Schema-level index SchemEX β€’ Assign RDF entities to graph patterns β€’ Map graph patterns to data sources (context) β€’ Defined over entities, but store the context β€’ Construction of schema-level index β€’ Stream-based for scalability β€’ Stratified bi-simulation for detecting patterns β€’ Little loss of accuracy [KGS+12]
  • 12. Slide 12Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Building the Index from a Stream β€’ Stream of quads coming from a LD crawler … Q16, Q15, Q14, Q13, Q12, Q11, Q10, Q9, Q8, Q7, Q6, Q5, Q4, Q3, Q2, Q1 FiFo 4 3 2 1 1 6 2 3 4 5 C3 C2 C2 C1 + Reasonable accuracy at cache size of 50k
  • 13. Slide 13Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Full BTC 2011Data Set: 2.17 Bn Triples Cache size: 50 k Winner BTC’11 + Linear runtime with respect to number of triples + Memory consumption scales with window size
  • 14. Slide 14Prof. Ansgar Scherp – asc@informatik.uni-kiel.de [GSK+13] Generalization Specialization Result list with examples Inspired by Google
  • 15. Slide 15Prof. Ansgar Scherp – asc@informatik.uni-kiel.de LODatio Under the Hood SPARQL Snippets Generalize Retrieve Data Sources Query translation Rank Specialize Count Select Select β€’ Hybrid database with off-the-shelf components
  • 16. Slide 16Prof. Ansgar Scherp – asc@informatik.uni-kiel.de LOD on One Slide: Recap biglynx:matt-briggs foaf:Person biglynx:dave-smith biglynx:Director rdf:type foaf:knows rdf:type _1:point wgs84: lat wgs84: long dp:London foaf:based_near …… … … ex:loc β€œ-0.118” β€œ51.509” Type Set (TS) Property Set (PS) Information theoretic analyses of LOD β€’ How much information is encoded in TS and PS? β€’ … information encoded, once TS or PS is known? β€’ … to which degree are TS and PS redundant? β€’ Example: 20% of PLDs do not need TS (6% for PS) [GKS15]
  • 17. Slide 17Prof. Ansgar Scherp – asc@informatik.uni-kiel.de β€’ 29 weekly LOD snapshots of ~100 Mio triples β€’ Still running since May 2012 (now 200+ weeks) KΓ€fer et al.’s Temporal Analysis of LOD β€’ Data on the cloud changes a lot [KΓ€fer et al., 2013] T. KΓ€fer, A. Abdelrahman, J. Umbrich, P. O'Byrne, A. Hogan: Observing Linked Data Dynamics. ESWC 2013: 213-227 Changes? β€’ But vocabularies defining RDF types and properties are highly static, e.g., RDF, FOAF LOD cloud ~2012 LOD cloud ~2014
  • 18. Slide 18Prof. Ansgar Scherp – asc@informatik.uni-kiel.de 𝐻(𝑃𝑆|𝑇𝑆=𝑑𝑠) 𝐻(𝑇𝑆|𝑃𝑆=𝑝𝑠) But:DoChangesOccurinPS and TS? β€’ Analysis: expected conditional entropy over time β€’ 𝐻(𝑃𝑆|𝑇𝑆 = 𝑑𝑠): entropy of 𝑃𝑆 given 𝑇𝑆 is known β€’ Observation: types become less important β€’ Changes in the use of TS and PS ? !
  • 19. Slide 19Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Changes over Time β€’ Extended characteristic sets: ECS = PS βˆͺ TS # of ECS Avg.: 83.898 ECS per week # of ECS [DSG+13] β€’ Avg. 73% of ECS re-occur next week (orange) β€’ Avg. 35% of ECS remain unchanged (blue) β€’ Avg. 20% of entity sets of ECS change / week [Neumann and Moerkotte, 2011] Thomas Neumann, Guido Moerkotte: Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins. ICDE 2011: 984-994 [Neumann and Moerkotte, 2011]
  • 20. Slide 20Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Temporal Dynamics of the Entities? β€’ Notion of entity motivated by ECS: entity is a set of triples 𝑋 sharing the same subject URI 𝑠 β€’ Example: –1 entity –4 triples w.l.o.g. β€’ Useful to keep LOD caches up-to-date? β€’ Can we predict when LOD sources will change?
  • 21. Slide 21Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Dynamics Function Θ β€’ Definition of Θ over change rate function 𝑐(𝑋𝑑) Time X 𝑑𝑖 𝑑𝑗 Θ Θ 𝑑 𝑖 𝑋 = Θ(𝑋𝑑 𝑗 ) βˆ’ Θ(𝑋𝑑 𝑖 ) = 𝑑 𝑖 𝑑 𝑗 𝑐 𝑋𝑑 d𝑑 [DGS+14] 𝑑𝑗 β‰ˆ π‘˜=𝑖+1 𝑗 𝛿(𝑋𝑑 π‘˜βˆ’1 , 𝑋𝑑 π‘˜ ) β€’ Approximation as step function over changes Monotone, non-negative c
  • 22. Slide 22Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Update Strategies for LOD Sources β€’ Apply strategies from keeping caches of WWW documents up-to-date to maintain LOD caches β€’ Assumptions –LOD is fetched from various sources –Sources are scored and prioritized based on strategy –Data of a source is fetched only when the operation can be entirely executed
  • 23. Slide 23Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Scheduling Update Strategies a) HTTP Header [Dividino et al., 2014a] b) Age or Last Visited [Dasdan et al., 2009, Cho and Garcia-Molina, 2000] c) PageRank [Page et al., 1999, Boldi et al., 2004, Baeza-Yates et al., 2005] d) LOD Sources Size e) Change Ratio [Douglis et al., 1997, Cho et al., 2002. Tan et al., 2007] f) Change Rate [Olston et al., 2002, Ntoulas et al., 2004, Dividino et al., 2013] g) History Information: Dynamics [Dividino et al., 2014b] We borrow strategies developed for the WWW and metrics for data change analysis in the LOD cloud.
  • 24. Slide 24Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Ranking Sources which changed (most) Sources that not changed/less changesTimeti tj e) Change Ratio β€’ Captures the change frequency of the data (freshness) β€’ Percentage of data items in the cache that are up-to-date
  • 25. Slide 25Prof. Ansgar Scherp – asc@informatik.uni-kiel.de f) Change Rate β€’ Data from sources which are less similar which their previous update (snapshot) should be updated first β€’ Comparison of two RDF data sets – 𝑋 : Set of triple statements – 𝛿 : Numeric expression (distance) π›Ώπ½π‘Žπ‘π‘π‘Žπ‘Ÿπ‘‘ 𝑋1, 𝑋2 = 1 βˆ’ 𝑋1 ∩ 𝑋2 𝑋1 βˆͺ 𝑋2 0,Β₯[ ) Time𝑑𝑖 𝑑𝑗 𝛿 Example:
  • 26. Slide 26Prof. Ansgar Scherp – asc@informatik.uni-kiel.de g) History Information: Dynamics β€’ Data from sources which most evolve in a given period of time should be updated first β€’ Uses both history information and change rate Θ(𝑋𝑑 𝑗 ) βˆ’ Θ(𝑋𝑑 𝑖 ) = 𝑑 𝑖 𝑑 𝑗 𝑐 𝑋𝑑 d𝑑 Time X 𝑑𝑖 𝑑𝑗 Θ c β‰ˆ π‘˜=𝑖+1 𝑗 𝛿(𝑋𝑑 π‘˜βˆ’1 , 𝑋𝑑 π‘˜ )
  • 27. Slide 27Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Evaluation ο‚Ÿ Idea: simulation of limitations of available computational resources (network bandwidth, computation time) Time 100% 𝑑𝑖 𝑑𝑖+1
  • 28. Slide 28Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Evaluation: Single Step Update Time 100% 15% 5%40% 75% 95%60% 𝑑𝑖 𝑑𝑖+1
  • 29. Slide 29Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Evaluation: Iterative Updates Time . . . 15% 5%40% 75% 95%60% 15% 5%40% 75% 95%60% 100% 𝑑𝑖 𝑑𝑖+1 𝑑𝑖+2
  • 30. Slide 30Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Dataset β€’ Dynamic Linked Data Observatory β€’ Weekly snapshots, 14 M triples ο‚Ÿ 154 snapshots (approx. 3 years) ο‚Ÿ 590 data sources (PLD) Top 10 largest data sources Average size dbpedia.org 3,406,364.5 edgarwrap.ontologycentral.com 982,631.0 dbtune.org 864,107.6 dbtropes.org 787,299.9 data.linkedct.org 498,986.3 aims.fao.org 416,708.9 www.legislation.gov.uk 399,601.6 kent.zpr.fer.hr 387,034.8 identi.ca 278,316.2 webenemasuno.linkeddata.es 250,557.9
  • 31. Slide 31Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Metrics:Precision & Recall β€’ Precision: portion of cached data that are actually up-to-date β€’ Recall: portion of data in the LOD cloud that is identical to the cached data Cached data Actual data on the LOD cloud (w.r.t. to the 590 sources considered)
  • 32. Slide 32Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Results: Single Step Update Time t jti 100% 15% 5%40% 75% 95%60%
  • 33. Slide 33Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Results: Iterative Updates Time tjti tj . . . 15% 5%40% 75% 95%60% 15% 5% 40% 75% 95%60% 100%
  • 34. Slide 34Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Results: Iterative Updates Time tjti tj . . . 15% 5%40% 75% 95%60% 15% 5% 40% 75% 95%60% 100%
  • 35. Slide 35Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Results: Iterative Updates Time tjti tj . . . 15% 5%40% 75% 95%60% 15% 5% 40% 75% 95%60% 100%
  • 36. Slide 36Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Results: Summary ο‚Ÿ Best strategies: ones which capture the change behaviour over time ο‚Ÿ Specially for low relative bandwidth
  • 37. Slide 37Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Dynamics Function Θ: Revisited Time X 𝑑𝑖 𝑑𝑗 c β€’ Can we predict when LOD sources will change? β€’ Notion of dynamics to compute periodicities! β€’ Dynamics as vector of changes: < 𝛿(𝑋𝑑1 , 𝑋𝑑2 ), … , 𝛿(𝑋𝑑 π‘βˆ’1 , 𝑋𝑑 𝑁 ) >
  • 38. Slide 38Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Temporal Clustering of Entities β€’ Dynamics as vector: < 𝛿(𝑋𝑑1 , 𝑋𝑑2 ), … , 𝛿(𝑋𝑑 π‘βˆ’1 , 𝑋𝑑 𝑁 ) > Time Change(logscale) [NS15] β€’ Clustering with k-means++ to find patterns β€’ 165 snapshots β€’ 65,044 entities β€’ 7 patterns (after optimizing π‘˜)
  • 39. Slide 39Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Periodicity of Entity Dynamics β€’ Examples: < 0, 3, 2, 0, 3, 2, 0 >, < 1, 2, 1, 2, 1, 2 > # of entities Most likely periodicity C1 12,982 66 C2 168 23 C3 35 1 C4 12 1 C5 1 1 C6 1,541 56 C7 30 37 CS 50,725 [Elfeky et al., 2005] Mohamed G. Elfeky, Walid G. Aref, Ahmed K. Elmagarmid: Periodicity Detection in Time Series Databases. IEEE Trans. Knowl. Data Eng. 17(7): 875-887 (2005) β€’ Convolution-based algorithm [Elfeky et al. 2005] β€’ Entities of legislation.gov.uk found in several clusters (C1,C3,C4,C5,C6) β€’ No changes (CS): 77.29% β€’ CS: entities from w3.org and ontologydesignpatterns.org
  • 40. Slide 40Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Application Areas: More than One! β€’ Searching for LOD sources [GSK+13,KGS+12] β€’ Strategies for updating data caches [DGS15] β€’ Programming queries against LOD [SSS12] β€’ Recommending LOD vocabularies [SGS16]  Foundation for Future Data-driven Applications
  • 41. Slide 41Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Summary: KDD in Social Media & DL How to deal with the vast amount of content related to research and innovation? β€’ H2020 INSO-4 project, duration: 04/2016-03/2019 β€’ Data mining & visualization tools enabling information professionals to deal with large corpora β€’ Website: http://www.moving-project.eu/ New
  • 42. Slide 42Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Got Interested? Knowledge Discovery at ZBW Contact me! Prof. Dr. Ansgar Scherp β€’ Email: a.scherp@zbw.eu β€’ Twitter: https://twitter.com/ansgarscherp β€’ Slideshare: http://de.slideshare.net/ascherp β€’ KD-Website: http://www.zbw.eu/en/research/knowledge-discovery/ http://www.kd.informatik.uni-kiel.de/en/
  • 43. Slide 43Prof. Ansgar Scherp – asc@informatik.uni-kiel.de References [DGS15] R. Dividino, T. Gottron, A. Scherp: Strategies for Efficiently Keeping Local Linked Open Data Caches Up-To-Date. International Semantic Web Conference (2) 2015: 356-373 [DGS+14] R. Dividino, T. Gottron, A. Scherp, G. GrΓΆner: From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources. PROFILES@ESWC 2014 [GKS15] T. Gottron, M. Knauf, A. Scherp: Analysis of schema structures in the Linked Open Data graph based on unique subject URIs, pay-level domains, and vocabulary usage. Distributed and Parallel Databases 33(4): 515-553 (2015) [DSG+13] R. Dividino, A. Scherp, G. GrΓΆner, T. Gottron: Change-a-LOD: Does the Schema on the Linked Data Cloud Change or Not? COLD 2013 [GSK+13] T. Gottron, A. Scherp, B. Krayer, A. Peters: LODatio: using a schema-level index to support users in finding relevant sources of linked data. K-CAP 2013: 105-108 [KGS+12] M. Konrath, T. Gottron, S. Staab, A. Scherp: SchemEX - Efficient construction of a data catalogue by stream-based indexing of linked data. J. Web Sem. 16: 52-58 (2012) [NS15] C. Nishioka, A Scherp: Temporal Patterns and Periodicity of Entity Dynamics in the Linked Open Data Cloud. K-CAP 2015. [SGS16] J. Schaible, T. Gottron, and A. Scherp: TermPicker Enabling the Reuse of Vocabulary Terms by Exploiting Data from the Linked Open Data Cloud, ESWC, Springer, 2016. [SSS12] S. Scheglmann, A. Scherp, S. Staab: Declarative Representation of Programming Access to Ontologies. ESWC 2012: 659-673
  • 44. Slide 44Prof. Ansgar Scherp – asc@informatik.uni-kiel.de a) HTTP Header β€’ Data from sources which have been changed since the last update should be updated first HTTP Response HEADER … Last-Modified: Tue, 15 Nov 1994 12:45:26 GMT CONTENT
  • 45. Slide 45Prof. Ansgar Scherp – asc@informatik.uni-kiel.de b) Age or Last Visited β€’ Time elapsed from last update (the difference between query time and last update time) β€’ It guarantees that every source is updated after a period Ranking Sources that have been at longer time updated Sources that have been recently updated
  • 46. Slide 46Prof. Ansgar Scherp – asc@informatik.uni-kiel.de c) PageRank and d) Source Size β€’ PageRank captures popularity/ importance of the LOD source β€’ Data from sources with highest PageRank are updated first β€’ LOD source size: data from the biggest/smallest LOD sources should be updated first Ranking Sources with higher PR Sources with lower PR
  • 47. Slide 47Prof. Ansgar Scherp – asc@informatik.uni-kiel.de Results: Single Step Update Time t jti 100% 15% 5%40% 75% 95%60%