SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
What‘s all the data about –
profiling and interlinking Web datasets
Stefan Dietze
L3S Research Center
27/03/14 1Stefan Dietze
Recent work on Linked Data exploration/discovery/search
 Entity interlinking & dataset interlinking recommendation
 Dataset profiling
 Data consistency & conflicts
Research areas
 Web science, Information Retrieval, Semantic Web & Linked
Data, data & knowledge integration (mapping, classification,
interlinking)
 Application domains: education/TEL, Web archiving, …
Some projects
Introduction
http://www.l3s.de/
Stefan Dietze 27/03/14 2
 See also: http://purl.org/dietze
…why are there so few datasets actually used?
 Date reuse and in-links focused on trusted „reference
graphs“ such as DBpedia, Freebase etc
 Long tail of LD datasets which are neither reused nor linked
to (LOD Cloud alone 300+ datasets, 50 bn triples)
 Explanations?
Linked Data is awesome, but...
27/03/14
 „HTTP-accessibility“
(SPARQL, URI-dereferencing)
 „Structure“ & „Semantics“
(=> shared/linked vocabularies)
 „Interlinked“
 „Persistent“
Hm,
really?
Stefan Dietze
Linked data is more diverse than we think
SPARQL Web-Querying Infrastructure: Ready for Action?,
Carlos Buil-Aranda, Aidan Hogan, Jürgen Umbrich Pierre-Yves
Vandenbussch, International Semantic Web Conference 2013,
(ISWC2013).
SPARQL endpoint availability over time [Buil-Aranda et al 2013]
Accessibility of datasets?
 Less than 50% of all SPARQL endpoints actually responsive
at given point of time
 “THE” SPARQL protocol? No, but many variants & subsets
 …
Shared vocabularies & schemas, but:
 …still very heterogeneous [d’Aquin, WebSci13]
 …data partially messy and not conformant
(RDFS, schemas) [HoganJWS2012]
 …even widely used reference datasets such as
DBpedia noisy [Paulheim2013]
Co-occurence graph of data
types in 146 datasets: 144
Vocabularies, 588 highly
overlapping types, 719
Properties
Assessing the Educational Linked Data Landscape, D’Aquin, M.,
Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris,
France, May 2013.
Type Inference on Noisy RDF Data, Paulheim H., Bizer, C. Semantic
Web – ISWC 2013, Lecture Notes in Computer Science Volume 8218,
2013, pp 510-525
An empirical survey of Linked Data conformance. Hogan, A., Umbrich,
J., Harth, A., Cyganiak, R., Polleres, A., Decker., S., In the Journal of Web
Semantics 14: pp. 14–44, 2012Stefan Dietze
What about data consistency?
Inconsistency and Incompleteness of Linked Datasets – a
Case Study, Yuan, W., Demidova, E., Dietze, S., Zhu, X., Web
Science 2014, WebSci14, under review.
27/03/14
Too many/diverse datasets, too little information
Stefan Dietze 27/03/14
?
?
? ?? ?
 Which datasets are useful & trustworthy for case
XY (eg „learning about the solar system“) ? Which
topics are covered?
 Types: which datasets describe statistics, videos,
slides, publications etc?
 Currentness, dynamics, accessability/reliability,
data quantity & quality?
Data curation and dataset profiling
Dataset
Catalog/Registry
Stefan Dietze 27/03/14
 Catalog of data: classification of
datasets according to resource
types, disciplines/topics, data
quality, accessability, etc
 Infrastructure for
distributed/federated querying
describes
 Which datasets are useful & trustworthy for case
XY (eg „learning about the solar system“) ? Which
topics are covered?
 Types: which datasets describe statistics, videos,
slides, publications etc?
 Currentness, dynamics, accessability/reliability,
data quantity & quality?
db:Astro. Objects
Dataset profiling: what’s all the data about
Dataset
Metadata
Stefan Dietze 27/03/14
BIBO
AAISO
FOAF
contains
Entity disambiguation &
linking [ESWC13]
Topic profile extraction
[WWW13, ESCW14]
db:Astronomy
db:Astro. Objects
Dataset
Catalog/Registry
yov:Video
po:Programme
BBC Programme
<po:Programme …>
<po:Series>Wonders of the Solar System</.>
<po:Actor>Brian Cox</…>
</po:Programme…>
<yo:Video …>
<dc:title>Pluto & the
Dwarf Planets</dc:title>
…
</yo:Video…>
Yovisto Video
bibo:Fil
bibo:Fi
bibo:Film
Schema mappings
[WebSci13]
Schemas/vocabularies on the Web: XKCD 927
Stefan Dietze 27/03/14
https://xkcd.com/927/
Schema assessment and mapping
Co-occurence of
data types
(in 146 datasets:
144 Vocabularies,
588 highly
overlapping types,
719 Properties)
Assessing the Educational Linked Data Landscape,
D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science
2013 (WebSci2013), Paris, France, May 2013.
<po:Programme …>
<po:title>Secret Universe –
The Life of the Cell</po:title>
…
</po:Programme…>
BBC Programme
<sioc:Item …>
<label>Viral diseases &
bacteria</title>
…
</sioc:Item ….>
SlideShare Set
po:Programme
sioc:Item
?
http://datahub.io/group/linked-education
Stefan Dietze 27/03/14
Schema assessment and mapping
Co-occurence of
data types
(in 146 datasets:
144 Vocabularies,
588 highly
overlapping types,
719 Properties)
Co-occurence after
mapping into most
frequent schemas
(201 frequent types
mapped into 79
classes)
Assessing the Educational Linked Data Landscape,
D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science
2013 (WebSci2013), Paris, France, May 2013.
bibo:Slideshow
bibo:Film
bibo:Document
<po:Programme …>
<po:title>Secret Universe –
The Life of the Cell</po:title>
…
</po:Programme…>
BBC Programme
<sioc:Item …>
<label>Viral diseases &
bacteria</title>
…
</sioc:Item ….>
SlideShare Set
po:Programme
sioc:Item
Stefan Dietze 27/03/14
LinkedUp Data Catalog
in a nutshell http://datahub.io/group/linked-education
http://data.linkededucation.org/linkedup/catalog/
 RDF (VoID) dataset catalog: browse &
query distributed datasets
 Live information about endpoint
accessibility
 Federated queries using type mappings
Stefan Dietze 27/03/14
http://datahub.io/group/linked-education
<yo:Video 8748720>
<dc:title>Pluto & the
Dwarf Planets</dc:title>
…
</yo:Video 8748720>
Video
<sioc:Item 2139393292>
<title>Planetary motion
& gravity</title>
…
</sioc:Item 2139393292>
Slideset
Topics/categories addressed?
Relatedness of resources/entities?
(types, semantics)
<po:Programme519215>
<po:Series>Wonders of the Solar
System</po:Series>
<po:Episode>Emp. of the Sun</po:Episode>
<po:Actor>Brian Cox</po:Actor>
</po:Programme519215 >
Programme
Combining a co-occurrence-based and a semantic measure
for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R.
Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended
Semantic Web Conference, (May 2013).
A Scalable Approach for Efficiently Generating
Structured Dataset Topic Profiles, Fetahu, B., Dietze, S.,
Nunes, B. P., Casanova, M. A., Nejdl, W., 11th Extended
Semantic Web Conference (ESWC2014), Crete, Greece, (2014).
Challenge: semantics of resources/datasets?
15Stefan Dietze 27/03/14
<yo:Video 8748720>
<dc:title>Pluto & the
Dwarf Planets</dc:title>
…
</yo:Video 8748720>
Video
<po:Programme519215>
<po:Series>Wonders of the Solar
System</po:Series>
<po:Episode>Emp. of the Sun</po:Episode>
<po:Actor>Brian Cox</po:Actor>
</po:Programme519215 >
Programme
Data disambiguation (for linking & profiling)
Brian Cox?
Sun?
Pluto?
16Stefan Dietze 27/03/14
db:Pluto
(Dwarf
Planet)
db:Astrono-
mical Objects
db:Sun
Data disambiguation using background knowledge
„Semantic relatetedness“ of resources?
db:Astronomy
17
<po:Programme519215>
<po:Series>Wonders of the Solar
System</po:Series>
<po:Episode>Emp. of the Sun</po:Episode>
<po:Actor>Brian Cox</po:Actor>
</po:Programme519215 >
Programme
<sioc:Item 2139393292>
<title>Planetary motion
& gravity</title>
…
</sioc:Item 2139393292>
Slideset
<yo:Video 8748720>
<dc:title>Pluto & the
Dwarf Planets</dc:title>
…
</yo:Video 8748720>
Video
Stefan Dietze 27/03/14
db:Pluto
(Dwarf
Planet)
db:Astrono-
mical Objects
<yov:Lecture8748720>
<title>Pluto & the Dwarf
Planets</title>
…
< yov:Lecture8748720>
Online Lecture
db:Astronomy
 Computation of connectivity scores
between resources/entities
 Method: combination of a
 (i) semantic (graph-based) connectivity
score (SCS) with
 (ii) a Web co-occurence-based measure
(CBM) (similar to NGD)
 For (i): adaptation of Katz-Index from SNA
for (linked) data graphs (considering path
number and path lengths of transversal
properties)
db:Sun
SCS = 0.32
CBM = 0.24
http://purl.org/vol/doc/
http://purl.org/vol/ns/
19/09/2013 19Stefan Dietze
Combining a co-occurrence-based and a semantic
measure for entity linking, B. P. Nunes, S. Dietze, M.A.
Casanova, R. Kawase, B. Fetahu, and W. Nejdl., ESWC 2013
- 10th Extended Semantic Web Conference, (May 2013).
Entity linking: semantic relatedness
<sioc:Item 2139393292>
<title>Planetary motion
& gravity</title>
…
</sioc:Item 2139393292>
Slideset
<po:Programme519215>
<po:Series>Wonders of the Solar
System</po:Series>
<po:Episode>Emp. of the Sun</po:Episode>
<po:Actor>Brian Cox</po:Actor>
</po:Programme519215 >
Programme
Entity linking: evaluation
27/03/14 20Stefan Dietze
 Evaluation based on USA Today News items (80.000 entity pairs)
 Manually created gold standard
(1000 entity pairs)
 Baseline: Explicit Semantic Analysis (ESA)
=> CBM/SCS: „relatedness“; ESA: „similarity“
Precision/Recall/F1 for SCS, CBM, ESA.
Combining a co-occurrence-based and a semantic
measure for entity linking, B. P. Nunes, S. Dietze, M.A.
Casanova, R. Kawase, B. Fetahu, and W. Nejdl., ESWC 2013
- 10th Extended Semantic Web Conference, (May 2013).
db:Astrono-
mical Objects
db:Astronomy
db:Sun
 Extracting representative metadata („topic profile“) for each dataset
 Ranking of most representative (DBpedia) categories (= topics); applied to all responsive LOD datasets
 Scalability vs representativeness: sampling & ranking for good scalability/accuracy balance
DBpedia category graph
Stefan Dietze 27/03/14
Dataset profiling: what‘s the data about?
A Scalable Approach for Efficiently Generating
Structured Dataset Topic Profiles, Fetahu, B.,
Dietze, S., Nunes, B. P., Casanova, M. A., Nejdl, W.,
11th Extended Semantic Web Conference
,(ESWC2014), Crete, Greece, (2014).
<po:Programme519215>
<po:Series>Wonders of the Solar
System</po:Series>
<po:Episode>Emp. of the Sun</po:Episode>
<po:Actor>Brian Cox</po:Actor>
</po:Programme519215 >
Programme
Dataset profiling: approach
Stefan Dietze 27/03/14
1. Sampling of resource instances
(random sampling, weighted sampling, resource
centrality sampling)
2. Entity and topic extraction (NER via DBpedia
Spotlight, category mapping and expansion)
3. Normalisation and ranking (using graphical-
models such as PageRank with Priors, HITS with
Priors and K-Step Markov)
=> Result: weighted dataset-topic profile graph
A Scalable Approach for Efficiently Generating
Structured Dataset Topic Profiles, Fetahu, B.,
Dietze, S., Nunes, B. P., Casanova, M. A., Nejdl, W.,
11th Extended Semantic Web Conference
(ESWC2014), Crete, Greece, (2014).
Dataset profiling: exploring LOD datasets/topics
in a nutshell http://data-observatory.org/lod-profiles/
Stefan Dietze 27/03/14
 Automatic extraction of dataset “topics” [ESWC2014]
 Visualisation & exploration of dataset-topic graph
(datasets, topics, relationships)
 Includes all (responsive) datasets of LOD Cloud
Dataset profiling: results evaluation
Stefan Dietze 27/03/14
NDCG (averaged over all datasets) .
Datasets & Ground Truth
 Yovisto, Oxpoints, LAK Dataset, Semantic Web
Dogfood
 Crowd-sourced topic indicators from datasets
(keywords, tags)
 Manual mapping to entities & category extraction
(ranking according to frequency)
Baselines
 1) LDA, 2) tf/idf (applied to entire datasets)
 Topic extraction according to our approach,
weighting/ranking based on term weight
Measure
 NDCG @ rank l
 Performance (time/NDCG) for different sampling
strategies/sizes etc
Stefan Dietze 27/03/14
dbp:Category:Royal_Medal_winners
dbp:Category:1955_births
dbp:Category:People_from_London
dbp:Category:Buzzwords
dbp:Category:Web_Services
dbp:Category:HTTP
dbp:Category:Unitarian_Universalists
dbp:Category:World_Wide_Web
What have these categories in common?
Stefan Dietze 27/03/14
Diversity of category profile for a single paper
Berners-Lee, Tim; Hendler, James, Ora Lassila (2001). "The Semantic Web".
Scientific American Magazine.
person
document
dbp:Tim_Berners-Lee
dbp:Category:1955_births
dbp:Category:People_from_London
dbp:Category:Buzzwords
dbp:Semantic_Web
dbp:Category:Semantic_Web
dbp:Category:Web_Services
dbp:Category:HTTP
dbp:Category:Unitarian_Universalists
first-level categories (dcterms:subject)
dbp:Category:World_Wide_Web
dbp:Category:Royal_Medal_winners
 DBpedia category graph not an ideal “topic” vocabulary:
 Broad and noisy
 “Categories” vs “topics” (for capturing disciplines, thesauri
like UMBEL or UNESCO Thesaurus seem better suited)
 Hierarchy ?
 Filtering of certain partitions of category graph (too generic
categories etc)
 Mixing categories across resource types (document, person)
creates “perceived noise”
 But: broadness is useful as general vocabulary for
categorisation of all sorts of resource types
Stefan Dietze 27/03/14
Dataset profiling: some lessons learned
Stefan Dietze 27/03/14
http://data-observatory.org/led-explorer/
 Type specific views on datasets/
categories
 “Document” (foaf:document)
 “Person “ (foaf:person)
 “Course” (aaiso:course)
 Currently applied to datasets in
LinkedUp Catalog only (as
schema mappings already
available here)
Type-specific exploration of dataset categories
Stefan Dietze 27/03/14
Dataset interlinking recommendation
Candidate datasets for interlinking?
34
t
Linkset1
Linkset2
Problem
 Given dataset t, ranking datasets from D
according to probability score (di, t) to
contain linking candidates (entities)
 Features:
 Vocabulary overlap
 Existing links (SNA)
 Datasets more likely to contain linking
candidates if they (a) share common
schema elements, or (b) already link to t
or datasets t links to (friend of a friend)
Conclusions
 Roughly 60% MAP for both approaches
 Future work: quantity of links, more
remote links, extraction of dataset links
rather than data from DataHub
Lopes, G.R., Paes Leme, L.A.P., Nunes, B.P., Casanova, M.A.,
Dietze, S., Recommending Tripleset Interlinking through a
Social Network Approach, The 14th International Conference
on Web Information System Engineering (WISE 2013),
Nanjing, China, 2013.
Paes Leme, L. A. P., Lopes, G. R., Nunes, B. P., Casanova,
M.A., Dietze, S., Identifying candidate datasets for data
interlinking, in Proceedings of the 13th International
Conference on Web Engineering, (2013).
Rank
1 DBLP
2 ACM
3 OAI
4 CiteSeer
5 IBM
6 Roma
7 IEEE
8 Ulm
9 Pisa
?
?
Stefan Dietze 27/03/14 37
Success models:
data & applications
 LinkedUp Challenge
to identify innovative
tools & applications
 Evaluation methods
and approaches
“LinkedUp” – Linking Web Data (for Education)
L
Data linking & curation
Technology transfer
& community-building
 Collecting & exposing open
data
=> LinkedUp Data Catalog
 Profiling and linking of Web
Data for education
=> educational data graph
[ESWC2013], [ISWC2013],
 Disseminating knowledge &
building communities
(educators, computer
scientists, data engineers)
 Gathering stakeholder
feedback: use cases, and
requirements
http://linkedup-challenge.org/#usecases
http://linkedup-project.eu/events
http://www.linkedup-challenge.org/
http://data.linkededucation.org
European suport action to
advance take-up of open
data & related technologies
http://www.linkedup-project.eu
Stefan Dietze 27/03/14
17/09/2013 38
Who we areL
LinkedUp Network
LinkedUp Consortium
LinkedUp Advisory Board
LinkedUp Challenge: using open data (for learning)
 Open Data Competition to promote tools and applications that analyse / integrate (Linked)
Web data
 Organised by LinkedUp project over 2 years (“Veni”, “Vidi”, “Vici”) with 40.000 EUR awards
 Veni Competition - 22 submissions, 8 shortlisted for presentation at Open Knowledge
Conference (17 September, Geneva Switzerland)
http://linkedup-challenge.org
Stefan Dietze 27/03/14
 Open & focused track(s)
 Final events at ESWC2014
(May, Crete)
 Open Track only
 Final events at OKCon 2013
(September 2013, Geneva)
 Open track & focused tracks
 Submission details and calls to be
released soon
 Final events at ISWC2014
(October, Riva del Garda, Italy)
May –September 2013 October 2013 – May 2014 May 2014 – October 2014
?
The Veni shortlist & winners
DataConf.
KnowNodes
Mismuseos
ReCredible
YourHistory
27/03/14
http://www.globe-town.org/
WeShare - 3rd price / people‘s choice
GlobeTown - 2nd price
http://seek.cloud.gsic.tel.uva.es/weshare/
http://www.polimedia.nl/
PoliMedia – 1st price
data.l3s.de – a DataHub for the L3S
Learning Analytics & Knowledge Dataset & Challenge
Facilitating Research on Learning Analytics and EDM
a nutshell
Stefan Dietze 27/03/14
http://lak.linkededucation.org/
http://lak.linkededucation.org/
LAK Dataset (450 publications in RDF/R)
 ACM International Conference on Learning Analytics and
Knowledge (LAK) (2011-13)
 International Conference on Educational Data Mining (2008-13)
 Journal of Educational Data Mining (2008-12)
LAK Data Challenge
 Analyse, explore correlate the LAK Dataset
 At ACM LAK 2014 (April 2014, Indianapolis)
KEYSTONE COST ACTION
27/03/14 51Stefan Dietze
http://www.keystone-cost.eu/
 Research network focused on distributed search,
dataset profiling, to Semantic Web, Databases, etc.
 Running 2013-2017
 WG1: Representation of structured data sources
 WG2: Keyword search
 WG3: User interaction and query interpretation
 WG4: Research integration, showcases,
benchmarks, and evaluations
 Open to new members (even beyond Europe)
 Joint workshops (eg PROFILES2014 @ ESWC2014)
Ongoing/future work … and some upcoming events
Linked Data evolution, preservation, consistency
 In RDF graphs (eg LOD Cloud), „all“ nodes are connected
 LD preservation: which datasets to preserve (direct links
or even more distant neighbours)?
=> semantic relatedness as guidance for scalable
preservation strategies /data enrichment
 Link correctness in evolving LD
 Investigating impact of changes on link correctness
(weekly LOD crawls over 1 year time span)
 Application: informed preservation strategies
 Conflict detection and LD quality (link quality, impact of
conflicts in distant nodes)
 PROFILES workshop @ ESWC2014
(http://keystone-cost.eu/profiles2014)
 26 May 2014, Crete, Greece
 Linking User Data 2014 at UMAP2014
(http://liud.linkededucation.org)
 Deadline: 1 April
 Online Learning & LD Tutorial at WWW2014
(http://www2014.kr/)
 07 April, Seoul
Thank you!
WWW
See also (general)
 http://linkedup-project.eu
 http://linkededucation.org
 http://data.l3s.de
http://purl.org/dietze
See also (data)
 http://data.linkededucation.org
 http://data.linkededucation.org/linkedup/catalog/
 http://lak.linkededucation.org
27/03/14 54Stefan Dietze
 Besnik Fetahu (L3S)
 Bernardo Pereira Nunes (PUC Rio)
 Marco Casanova (PUC Rio)
 Luiz Andre Paes Leme (PUC Rio)
 Giseli Lopes (PUC Rio)
 Davide Taibi (CNR, IT)
 Mathieu d’Aquin (Open University, UK)
 and many more…
Acknowledgements

Contenu connexe

Tendances

Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourBeyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourKNOWeSCAPE2014
 
User Engagement in Research Data Curation
User Engagement in Research Data CurationUser Engagement in Research Data Curation
User Engagement in Research Data CurationUniversity of Edinburgh
 
Collaboration to Curation: The High Rise Project meets Edinburgh DataShare
Collaboration to Curation: The High Rise Project meets Edinburgh DataShare Collaboration to Curation: The High Rise Project meets Edinburgh DataShare
Collaboration to Curation: The High Rise Project meets Edinburgh DataShare University of Edinburgh
 
Semantic Web / Linked Data Technologies
Semantic Web / Linked Data TechnologiesSemantic Web / Linked Data Technologies
Semantic Web / Linked Data TechnologiesMathieu d'Aquin
 
Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!EDINA, University of Edinburgh
 
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challenge
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challengeScott Edmunds at OASP Asia: Open (and Big) Data – the next challenge
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challengeGigaScience, BGI Hong Kong
 
Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...Sören Auer
 
WDAqua ITN – Answering Questions using Web Data
WDAqua ITN – Answering Questions using Web DataWDAqua ITN – Answering Questions using Web Data
WDAqua ITN – Answering Questions using Web DataChristoph Lange
 
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)EUDAT
 
Data Management Planning at the DCC: a human factor
Data Management Planning at the DCC: a human factorData Management Planning at the DCC: a human factor
Data Management Planning at the DCC: a human factorMartin Donnelly
 
EDF2014: Vedran Sabol, Head of the Knowledge Visualisation Area, Know-Center,...
EDF2014: Vedran Sabol, Head of the Knowledge Visualisation Area, Know-Center,...EDF2014: Vedran Sabol, Head of the Knowledge Visualisation Area, Know-Center,...
EDF2014: Vedran Sabol, Head of the Knowledge Visualisation Area, Know-Center,...European Data Forum
 
Data management plans – EUDAT Best practices and case study | www.eudat.eu
Data management plans – EUDAT Best practices and case study | www.eudat.euData management plans – EUDAT Best practices and case study | www.eudat.eu
Data management plans – EUDAT Best practices and case study | www.eudat.euEUDAT
 
Interpreting Data Mining Results with Linked Data for Learning Analytics
Interpreting Data Mining Results with Linked Data for Learning AnalyticsInterpreting Data Mining Results with Linked Data for Learning Analytics
Interpreting Data Mining Results with Linked Data for Learning AnalyticsMathieu d'Aquin
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphSören Auer
 
Long-term data curation, aka data preservation - EUDAT Summer School (Marjan ...
Long-term data curation, aka data preservation - EUDAT Summer School (Marjan ...Long-term data curation, aka data preservation - EUDAT Summer School (Marjan ...
Long-term data curation, aka data preservation - EUDAT Summer School (Marjan ...EUDAT
 
Frankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee ProjeectFrankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee ProjeectGoethe Univeristy
 

Tendances (20)

Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourBeyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
 
Participatory Web
Participatory WebParticipatory Web
Participatory Web
 
User Engagement in Research Data Curation
User Engagement in Research Data CurationUser Engagement in Research Data Curation
User Engagement in Research Data Curation
 
Collaboration to Curation: The High Rise Project meets Edinburgh DataShare
Collaboration to Curation: The High Rise Project meets Edinburgh DataShare Collaboration to Curation: The High Rise Project meets Edinburgh DataShare
Collaboration to Curation: The High Rise Project meets Edinburgh DataShare
 
Semantic Web / Linked Data Technologies
Semantic Web / Linked Data TechnologiesSemantic Web / Linked Data Technologies
Semantic Web / Linked Data Technologies
 
Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!
 
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challenge
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challengeScott Edmunds at OASP Asia: Open (and Big) Data – the next challenge
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challenge
 
Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...
 
WDAqua ITN – Answering Questions using Web Data
WDAqua ITN – Answering Questions using Web DataWDAqua ITN – Answering Questions using Web Data
WDAqua ITN – Answering Questions using Web Data
 
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
 
Cognitive data
Cognitive dataCognitive data
Cognitive data
 
Data Management Planning at the DCC: a human factor
Data Management Planning at the DCC: a human factorData Management Planning at the DCC: a human factor
Data Management Planning at the DCC: a human factor
 
EDF2014: Vedran Sabol, Head of the Knowledge Visualisation Area, Know-Center,...
EDF2014: Vedran Sabol, Head of the Knowledge Visualisation Area, Know-Center,...EDF2014: Vedran Sabol, Head of the Knowledge Visualisation Area, Know-Center,...
EDF2014: Vedran Sabol, Head of the Knowledge Visualisation Area, Know-Center,...
 
Data management plans – EUDAT Best practices and case study | www.eudat.eu
Data management plans – EUDAT Best practices and case study | www.eudat.euData management plans – EUDAT Best practices and case study | www.eudat.eu
Data management plans – EUDAT Best practices and case study | www.eudat.eu
 
Interpreting Data Mining Results with Linked Data for Learning Analytics
Interpreting Data Mining Results with Linked Data for Learning AnalyticsInterpreting Data Mining Results with Linked Data for Learning Analytics
Interpreting Data Mining Results with Linked Data for Learning Analytics
 
Geospatial Metadata Workshop
Geospatial Metadata WorkshopGeospatial Metadata Workshop
Geospatial Metadata Workshop
 
Glasgow University Geo Metadata Workshop
Glasgow University Geo Metadata WorkshopGlasgow University Geo Metadata Workshop
Glasgow University Geo Metadata Workshop
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge Graph
 
Long-term data curation, aka data preservation - EUDAT Summer School (Marjan ...
Long-term data curation, aka data preservation - EUDAT Summer School (Marjan ...Long-term data curation, aka data preservation - EUDAT Summer School (Marjan ...
Long-term data curation, aka data preservation - EUDAT Summer School (Marjan ...
 
Frankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee ProjeectFrankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee Projeect
 

En vedette

Presentation nokobit
Presentation nokobitPresentation nokobit
Presentation nokobitnetsoxx
 
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesA Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesBesnik Fetahu
 
DURAARK at Bibliotheksymposium Wildau
DURAARK at Bibliotheksymposium WildauDURAARK at Bibliotheksymposium Wildau
DURAARK at Bibliotheksymposium Wildaupanitzm
 
DURAARK presentation at DEDICATE final seminar, October 21st 2013, Michelle L...
DURAARK presentation at DEDICATE final seminar, October 21st 2013, Michelle L...DURAARK presentation at DEDICATE final seminar, October 21st 2013, Michelle L...
DURAARK presentation at DEDICATE final seminar, October 21st 2013, Michelle L...lindlar
 
Quality criteria for architectural 3D data in usage and preservation processes
Quality criteria for architectural 3D data in usage and preservation processesQuality criteria for architectural 3D data in usage and preservation processes
Quality criteria for architectural 3D data in usage and preservation processeslindlar
 
A Domain-driven Approach to Digital Curation and Preservation of 3D Architect...
A Domain-driven Approach to Digital Curation and Preservation of 3D Architect...A Domain-driven Approach to Digital Curation and Preservation of 3D Architect...
A Domain-driven Approach to Digital Curation and Preservation of 3D Architect...lindlar
 
Presentation of the DURAARK project at Ex Libris conference, Berlin, Germany.
Presentation of the DURAARK project at Ex Libris conference, Berlin, Germany.Presentation of the DURAARK project at Ex Libris conference, Berlin, Germany.
Presentation of the DURAARK project at Ex Libris conference, Berlin, Germany.Lena Lindbäck
 
DURAARK presentation CIB W78 "Applications of IT in AEC" conference Beijing 2...
DURAARK presentation CIB W78 "Applications of IT in AEC" conference Beijing 2...DURAARK presentation CIB W78 "Applications of IT in AEC" conference Beijing 2...
DURAARK presentation CIB W78 "Applications of IT in AEC" conference Beijing 2...Jakob Beetz
 
Towards preservation of semantically enriched architectural knowledge
Towards preservation of semantically enriched architectural knowledgeTowards preservation of semantically enriched architectural knowledge
Towards preservation of semantically enriched architectural knowledgeStefan Dietze
 
DURAARK at IGeLU 2014
DURAARK at IGeLU 2014DURAARK at IGeLU 2014
DURAARK at IGeLU 2014panitzm
 
Grapp2014 presentation
Grapp2014 presentationGrapp2014 presentation
Grapp2014 presentationnetsoxx
 
DURAARK at AUdS 2015
DURAARK at AUdS 2015DURAARK at AUdS 2015
DURAARK at AUdS 2015panitzm
 
Preservation of 3 d objects of buildings
Preservation of 3 d objects of buildingsPreservation of 3 d objects of buildings
Preservation of 3 d objects of buildingsnetsoxx
 

En vedette (13)

Presentation nokobit
Presentation nokobitPresentation nokobit
Presentation nokobit
 
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesA Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
 
DURAARK at Bibliotheksymposium Wildau
DURAARK at Bibliotheksymposium WildauDURAARK at Bibliotheksymposium Wildau
DURAARK at Bibliotheksymposium Wildau
 
DURAARK presentation at DEDICATE final seminar, October 21st 2013, Michelle L...
DURAARK presentation at DEDICATE final seminar, October 21st 2013, Michelle L...DURAARK presentation at DEDICATE final seminar, October 21st 2013, Michelle L...
DURAARK presentation at DEDICATE final seminar, October 21st 2013, Michelle L...
 
Quality criteria for architectural 3D data in usage and preservation processes
Quality criteria for architectural 3D data in usage and preservation processesQuality criteria for architectural 3D data in usage and preservation processes
Quality criteria for architectural 3D data in usage and preservation processes
 
A Domain-driven Approach to Digital Curation and Preservation of 3D Architect...
A Domain-driven Approach to Digital Curation and Preservation of 3D Architect...A Domain-driven Approach to Digital Curation and Preservation of 3D Architect...
A Domain-driven Approach to Digital Curation and Preservation of 3D Architect...
 
Presentation of the DURAARK project at Ex Libris conference, Berlin, Germany.
Presentation of the DURAARK project at Ex Libris conference, Berlin, Germany.Presentation of the DURAARK project at Ex Libris conference, Berlin, Germany.
Presentation of the DURAARK project at Ex Libris conference, Berlin, Germany.
 
DURAARK presentation CIB W78 "Applications of IT in AEC" conference Beijing 2...
DURAARK presentation CIB W78 "Applications of IT in AEC" conference Beijing 2...DURAARK presentation CIB W78 "Applications of IT in AEC" conference Beijing 2...
DURAARK presentation CIB W78 "Applications of IT in AEC" conference Beijing 2...
 
Towards preservation of semantically enriched architectural knowledge
Towards preservation of semantically enriched architectural knowledgeTowards preservation of semantically enriched architectural knowledge
Towards preservation of semantically enriched architectural knowledge
 
DURAARK at IGeLU 2014
DURAARK at IGeLU 2014DURAARK at IGeLU 2014
DURAARK at IGeLU 2014
 
Grapp2014 presentation
Grapp2014 presentationGrapp2014 presentation
Grapp2014 presentation
 
DURAARK at AUdS 2015
DURAARK at AUdS 2015DURAARK at AUdS 2015
DURAARK at AUdS 2015
 
Preservation of 3 d objects of buildings
Preservation of 3 d objects of buildingsPreservation of 3 d objects of buildings
Preservation of 3 d objects of buildings
 

Similaire à What's all the data about? - Linking and Profiling of Linked Datasets

Web Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic WebWeb Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic WebStefan Dietze
 
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the WebBeyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the WebStefan Dietze
 
From Data to Knowledge - Profiling & Interlinking Web Datasets
From Data to Knowledge - Profiling & Interlinking Web DatasetsFrom Data to Knowledge - Profiling & Interlinking Web Datasets
From Data to Knowledge - Profiling & Interlinking Web DatasetsStefan Dietze
 
Semantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital LibrariesSemantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital LibrariesStefan Dietze
 
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Stefan Dietze
 
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons LearnedWWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons LearnedStefan Dietze
 
LinkedUp - Linked Data Europe Workshop 2014
LinkedUp - Linked Data Europe Workshop 2014LinkedUp - Linked Data Europe Workshop 2014
LinkedUp - Linked Data Europe Workshop 2014Stefan Dietze
 
Open Data Dialog 2013 - Linked Data in Education
Open Data Dialog 2013 - Linked Data in EducationOpen Data Dialog 2013 - Linked Data in Education
Open Data Dialog 2013 - Linked Data in EducationStefan Dietze
 
WWW2013 Tutorial: Linked Data & Education
WWW2013 Tutorial: Linked Data & EducationWWW2013 Tutorial: Linked Data & Education
WWW2013 Tutorial: Linked Data & EducationStefan Dietze
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Stefan Dietze
 
Mining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the WebMining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the WebStefan Dietze
 
Open Data & Education Seminar, ITMO, St Petersburg, March 2014
Open Data & Education Seminar, ITMO, St Petersburg, March 2014Open Data & Education Seminar, ITMO, St Petersburg, March 2014
Open Data & Education Seminar, ITMO, St Petersburg, March 2014Stefan Dietze
 
Towards research data knowledge graphs
Towards research data knowledge graphsTowards research data knowledge graphs
Towards research data knowledge graphsStefan Dietze
 
Dats nih-dccpc-kc7-april2018-prs-uoxf
Dats  nih-dccpc-kc7-april2018-prs-uoxfDats  nih-dccpc-kc7-april2018-prs-uoxf
Dats nih-dccpc-kc7-april2018-prs-uoxfPhilippe Rocca-Serra
 
Big Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday LearningBig Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday LearningStefan Dietze
 
Data integration in a Hadoop-based data lake: A bioinformatics case
Data integration in a Hadoop-based data lake: A bioinformatics caseData integration in a Hadoop-based data lake: A bioinformatics case
Data integration in a Hadoop-based data lake: A bioinformatics caseIJDKP
 
Data integration in a Hadoop-based data lake: A bioinformatics case
Data integration in a Hadoop-based data lake: A bioinformatics caseData integration in a Hadoop-based data lake: A bioinformatics case
Data integration in a Hadoop-based data lake: A bioinformatics caseIJDKP
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)Besnik Fetahu
 
A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...Patricia Tavares Boralli
 

Similaire à What's all the data about? - Linking and Profiling of Linked Datasets (20)

Web Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic WebWeb Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
 
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the WebBeyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
 
From Data to Knowledge - Profiling & Interlinking Web Datasets
From Data to Knowledge - Profiling & Interlinking Web DatasetsFrom Data to Knowledge - Profiling & Interlinking Web Datasets
From Data to Knowledge - Profiling & Interlinking Web Datasets
 
Semantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital LibrariesSemantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital Libraries
 
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
 
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons LearnedWWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
 
LinkedUp - Linked Data Europe Workshop 2014
LinkedUp - Linked Data Europe Workshop 2014LinkedUp - Linked Data Europe Workshop 2014
LinkedUp - Linked Data Europe Workshop 2014
 
Open Data Dialog 2013 - Linked Data in Education
Open Data Dialog 2013 - Linked Data in EducationOpen Data Dialog 2013 - Linked Data in Education
Open Data Dialog 2013 - Linked Data in Education
 
WWW2013 Tutorial: Linked Data & Education
WWW2013 Tutorial: Linked Data & EducationWWW2013 Tutorial: Linked Data & Education
WWW2013 Tutorial: Linked Data & Education
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)
 
Mining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the WebMining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the Web
 
Open Data & Education Seminar, ITMO, St Petersburg, March 2014
Open Data & Education Seminar, ITMO, St Petersburg, March 2014Open Data & Education Seminar, ITMO, St Petersburg, March 2014
Open Data & Education Seminar, ITMO, St Petersburg, March 2014
 
Towards research data knowledge graphs
Towards research data knowledge graphsTowards research data knowledge graphs
Towards research data knowledge graphs
 
Dats nih-dccpc-kc7-april2018-prs-uoxf
Dats  nih-dccpc-kc7-april2018-prs-uoxfDats  nih-dccpc-kc7-april2018-prs-uoxf
Dats nih-dccpc-kc7-april2018-prs-uoxf
 
Big Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday LearningBig Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday Learning
 
Data integration in a Hadoop-based data lake: A bioinformatics case
Data integration in a Hadoop-based data lake: A bioinformatics caseData integration in a Hadoop-based data lake: A bioinformatics case
Data integration in a Hadoop-based data lake: A bioinformatics case
 
Data integration in a Hadoop-based data lake: A bioinformatics case
Data integration in a Hadoop-based data lake: A bioinformatics caseData integration in a Hadoop-based data lake: A bioinformatics case
Data integration in a Hadoop-based data lake: A bioinformatics case
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)
 
A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...
 

Plus de Stefan Dietze

AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...Stefan Dietze
 
An interdisciplinary journey with the SAL spaceship – results and challenges ...
An interdisciplinary journey with the SAL spaceship – results and challenges ...An interdisciplinary journey with the SAL spaceship – results and challenges ...
An interdisciplinary journey with the SAL spaceship – results and challenges ...Stefan Dietze
 
Research Knowledge Graphs at NFDI4DS & GESIS
Research Knowledge Graphs at NFDI4DS & GESISResearch Knowledge Graphs at NFDI4DS & GESIS
Research Knowledge Graphs at NFDI4DS & GESISStefan Dietze
 
Research Knowledge Graphs at GESIS & NFDI4DataScience
Research Knowledge Graphs at GESIS & NFDI4DataScienceResearch Knowledge Graphs at GESIS & NFDI4DataScience
Research Knowledge Graphs at GESIS & NFDI4DataScienceStefan Dietze
 
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...Stefan Dietze
 
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...Stefan Dietze
 
Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...Stefan Dietze
 
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...Stefan Dietze
 
Using AI to understand everyday learning on the Web
Using AI to understand everyday learning on the WebUsing AI to understand everyday learning on the Web
Using AI to understand everyday learning on the WebStefan Dietze
 
Analysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online ActivitiesAnalysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online ActivitiesStefan Dietze
 
Analysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebAnalysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebStefan Dietze
 
Towards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the WebTowards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the WebStefan Dietze
 
Dietze linked data-vr-es
Dietze linked data-vr-esDietze linked data-vr-es
Dietze linked data-vr-esStefan Dietze
 

Plus de Stefan Dietze (13)

AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...
 
An interdisciplinary journey with the SAL spaceship – results and challenges ...
An interdisciplinary journey with the SAL spaceship – results and challenges ...An interdisciplinary journey with the SAL spaceship – results and challenges ...
An interdisciplinary journey with the SAL spaceship – results and challenges ...
 
Research Knowledge Graphs at NFDI4DS & GESIS
Research Knowledge Graphs at NFDI4DS & GESISResearch Knowledge Graphs at NFDI4DS & GESIS
Research Knowledge Graphs at NFDI4DS & GESIS
 
Research Knowledge Graphs at GESIS & NFDI4DataScience
Research Knowledge Graphs at GESIS & NFDI4DataScienceResearch Knowledge Graphs at GESIS & NFDI4DataScience
Research Knowledge Graphs at GESIS & NFDI4DataScience
 
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
 
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
 
Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...
 
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
 
Using AI to understand everyday learning on the Web
Using AI to understand everyday learning on the WebUsing AI to understand everyday learning on the Web
Using AI to understand everyday learning on the Web
 
Analysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online ActivitiesAnalysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online Activities
 
Analysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebAnalysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the Web
 
Towards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the WebTowards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the Web
 
Dietze linked data-vr-es
Dietze linked data-vr-esDietze linked data-vr-es
Dietze linked data-vr-es
 

Dernier

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 

Dernier (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 

What's all the data about? - Linking and Profiling of Linked Datasets

  • 1. What‘s all the data about – profiling and interlinking Web datasets Stefan Dietze L3S Research Center 27/03/14 1Stefan Dietze
  • 2. Recent work on Linked Data exploration/discovery/search  Entity interlinking & dataset interlinking recommendation  Dataset profiling  Data consistency & conflicts Research areas  Web science, Information Retrieval, Semantic Web & Linked Data, data & knowledge integration (mapping, classification, interlinking)  Application domains: education/TEL, Web archiving, … Some projects Introduction http://www.l3s.de/ Stefan Dietze 27/03/14 2  See also: http://purl.org/dietze
  • 3. …why are there so few datasets actually used?  Date reuse and in-links focused on trusted „reference graphs“ such as DBpedia, Freebase etc  Long tail of LD datasets which are neither reused nor linked to (LOD Cloud alone 300+ datasets, 50 bn triples)  Explanations? Linked Data is awesome, but... 27/03/14  „HTTP-accessibility“ (SPARQL, URI-dereferencing)  „Structure“ & „Semantics“ (=> shared/linked vocabularies)  „Interlinked“  „Persistent“ Hm, really? Stefan Dietze
  • 4. Linked data is more diverse than we think SPARQL Web-Querying Infrastructure: Ready for Action?, Carlos Buil-Aranda, Aidan Hogan, Jürgen Umbrich Pierre-Yves Vandenbussch, International Semantic Web Conference 2013, (ISWC2013). SPARQL endpoint availability over time [Buil-Aranda et al 2013] Accessibility of datasets?  Less than 50% of all SPARQL endpoints actually responsive at given point of time  “THE” SPARQL protocol? No, but many variants & subsets  … Shared vocabularies & schemas, but:  …still very heterogeneous [d’Aquin, WebSci13]  …data partially messy and not conformant (RDFS, schemas) [HoganJWS2012]  …even widely used reference datasets such as DBpedia noisy [Paulheim2013] Co-occurence graph of data types in 146 datasets: 144 Vocabularies, 588 highly overlapping types, 719 Properties Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013. Type Inference on Noisy RDF Data, Paulheim H., Bizer, C. Semantic Web – ISWC 2013, Lecture Notes in Computer Science Volume 8218, 2013, pp 510-525 An empirical survey of Linked Data conformance. Hogan, A., Umbrich, J., Harth, A., Cyganiak, R., Polleres, A., Decker., S., In the Journal of Web Semantics 14: pp. 14–44, 2012Stefan Dietze
  • 5. What about data consistency? Inconsistency and Incompleteness of Linked Datasets – a Case Study, Yuan, W., Demidova, E., Dietze, S., Zhu, X., Web Science 2014, WebSci14, under review. 27/03/14
  • 6. Too many/diverse datasets, too little information Stefan Dietze 27/03/14 ? ? ? ?? ?  Which datasets are useful & trustworthy for case XY (eg „learning about the solar system“) ? Which topics are covered?  Types: which datasets describe statistics, videos, slides, publications etc?  Currentness, dynamics, accessability/reliability, data quantity & quality?
  • 7. Data curation and dataset profiling Dataset Catalog/Registry Stefan Dietze 27/03/14  Catalog of data: classification of datasets according to resource types, disciplines/topics, data quality, accessability, etc  Infrastructure for distributed/federated querying describes  Which datasets are useful & trustworthy for case XY (eg „learning about the solar system“) ? Which topics are covered?  Types: which datasets describe statistics, videos, slides, publications etc?  Currentness, dynamics, accessability/reliability, data quantity & quality?
  • 8. db:Astro. Objects Dataset profiling: what’s all the data about Dataset Metadata Stefan Dietze 27/03/14 BIBO AAISO FOAF contains Entity disambiguation & linking [ESWC13] Topic profile extraction [WWW13, ESCW14] db:Astronomy db:Astro. Objects Dataset Catalog/Registry yov:Video po:Programme BBC Programme <po:Programme …> <po:Series>Wonders of the Solar System</.> <po:Actor>Brian Cox</…> </po:Programme…> <yo:Video …> <dc:title>Pluto & the Dwarf Planets</dc:title> … </yo:Video…> Yovisto Video bibo:Fil bibo:Fi bibo:Film Schema mappings [WebSci13]
  • 9. Schemas/vocabularies on the Web: XKCD 927 Stefan Dietze 27/03/14 https://xkcd.com/927/
  • 10. Schema assessment and mapping Co-occurence of data types (in 146 datasets: 144 Vocabularies, 588 highly overlapping types, 719 Properties) Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013. <po:Programme …> <po:title>Secret Universe – The Life of the Cell</po:title> … </po:Programme…> BBC Programme <sioc:Item …> <label>Viral diseases & bacteria</title> … </sioc:Item ….> SlideShare Set po:Programme sioc:Item ? http://datahub.io/group/linked-education Stefan Dietze 27/03/14
  • 11. Schema assessment and mapping Co-occurence of data types (in 146 datasets: 144 Vocabularies, 588 highly overlapping types, 719 Properties) Co-occurence after mapping into most frequent schemas (201 frequent types mapped into 79 classes) Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013. bibo:Slideshow bibo:Film bibo:Document <po:Programme …> <po:title>Secret Universe – The Life of the Cell</po:title> … </po:Programme…> BBC Programme <sioc:Item …> <label>Viral diseases & bacteria</title> … </sioc:Item ….> SlideShare Set po:Programme sioc:Item Stefan Dietze 27/03/14
  • 12. LinkedUp Data Catalog in a nutshell http://datahub.io/group/linked-education http://data.linkededucation.org/linkedup/catalog/  RDF (VoID) dataset catalog: browse & query distributed datasets  Live information about endpoint accessibility  Federated queries using type mappings Stefan Dietze 27/03/14 http://datahub.io/group/linked-education
  • 13. <yo:Video 8748720> <dc:title>Pluto & the Dwarf Planets</dc:title> … </yo:Video 8748720> Video <sioc:Item 2139393292> <title>Planetary motion & gravity</title> … </sioc:Item 2139393292> Slideset Topics/categories addressed? Relatedness of resources/entities? (types, semantics) <po:Programme519215> <po:Series>Wonders of the Solar System</po:Series> <po:Episode>Emp. of the Sun</po:Episode> <po:Actor>Brian Cox</po:Actor> </po:Programme519215 > Programme Combining a co-occurrence-based and a semantic measure for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R. Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended Semantic Web Conference, (May 2013). A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles, Fetahu, B., Dietze, S., Nunes, B. P., Casanova, M. A., Nejdl, W., 11th Extended Semantic Web Conference (ESWC2014), Crete, Greece, (2014). Challenge: semantics of resources/datasets? 15Stefan Dietze 27/03/14
  • 14. <yo:Video 8748720> <dc:title>Pluto & the Dwarf Planets</dc:title> … </yo:Video 8748720> Video <po:Programme519215> <po:Series>Wonders of the Solar System</po:Series> <po:Episode>Emp. of the Sun</po:Episode> <po:Actor>Brian Cox</po:Actor> </po:Programme519215 > Programme Data disambiguation (for linking & profiling) Brian Cox? Sun? Pluto? 16Stefan Dietze 27/03/14
  • 15. db:Pluto (Dwarf Planet) db:Astrono- mical Objects db:Sun Data disambiguation using background knowledge „Semantic relatetedness“ of resources? db:Astronomy 17 <po:Programme519215> <po:Series>Wonders of the Solar System</po:Series> <po:Episode>Emp. of the Sun</po:Episode> <po:Actor>Brian Cox</po:Actor> </po:Programme519215 > Programme <sioc:Item 2139393292> <title>Planetary motion & gravity</title> … </sioc:Item 2139393292> Slideset <yo:Video 8748720> <dc:title>Pluto & the Dwarf Planets</dc:title> … </yo:Video 8748720> Video Stefan Dietze 27/03/14
  • 16. db:Pluto (Dwarf Planet) db:Astrono- mical Objects <yov:Lecture8748720> <title>Pluto & the Dwarf Planets</title> … < yov:Lecture8748720> Online Lecture db:Astronomy  Computation of connectivity scores between resources/entities  Method: combination of a  (i) semantic (graph-based) connectivity score (SCS) with  (ii) a Web co-occurence-based measure (CBM) (similar to NGD)  For (i): adaptation of Katz-Index from SNA for (linked) data graphs (considering path number and path lengths of transversal properties) db:Sun SCS = 0.32 CBM = 0.24 http://purl.org/vol/doc/ http://purl.org/vol/ns/ 19/09/2013 19Stefan Dietze Combining a co-occurrence-based and a semantic measure for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R. Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended Semantic Web Conference, (May 2013). Entity linking: semantic relatedness <sioc:Item 2139393292> <title>Planetary motion & gravity</title> … </sioc:Item 2139393292> Slideset <po:Programme519215> <po:Series>Wonders of the Solar System</po:Series> <po:Episode>Emp. of the Sun</po:Episode> <po:Actor>Brian Cox</po:Actor> </po:Programme519215 > Programme
  • 17. Entity linking: evaluation 27/03/14 20Stefan Dietze  Evaluation based on USA Today News items (80.000 entity pairs)  Manually created gold standard (1000 entity pairs)  Baseline: Explicit Semantic Analysis (ESA) => CBM/SCS: „relatedness“; ESA: „similarity“ Precision/Recall/F1 for SCS, CBM, ESA. Combining a co-occurrence-based and a semantic measure for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R. Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended Semantic Web Conference, (May 2013).
  • 18. db:Astrono- mical Objects db:Astronomy db:Sun  Extracting representative metadata („topic profile“) for each dataset  Ranking of most representative (DBpedia) categories (= topics); applied to all responsive LOD datasets  Scalability vs representativeness: sampling & ranking for good scalability/accuracy balance DBpedia category graph Stefan Dietze 27/03/14 Dataset profiling: what‘s the data about? A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles, Fetahu, B., Dietze, S., Nunes, B. P., Casanova, M. A., Nejdl, W., 11th Extended Semantic Web Conference ,(ESWC2014), Crete, Greece, (2014). <po:Programme519215> <po:Series>Wonders of the Solar System</po:Series> <po:Episode>Emp. of the Sun</po:Episode> <po:Actor>Brian Cox</po:Actor> </po:Programme519215 > Programme
  • 19. Dataset profiling: approach Stefan Dietze 27/03/14 1. Sampling of resource instances (random sampling, weighted sampling, resource centrality sampling) 2. Entity and topic extraction (NER via DBpedia Spotlight, category mapping and expansion) 3. Normalisation and ranking (using graphical- models such as PageRank with Priors, HITS with Priors and K-Step Markov) => Result: weighted dataset-topic profile graph A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles, Fetahu, B., Dietze, S., Nunes, B. P., Casanova, M. A., Nejdl, W., 11th Extended Semantic Web Conference (ESWC2014), Crete, Greece, (2014).
  • 20. Dataset profiling: exploring LOD datasets/topics in a nutshell http://data-observatory.org/lod-profiles/ Stefan Dietze 27/03/14  Automatic extraction of dataset “topics” [ESWC2014]  Visualisation & exploration of dataset-topic graph (datasets, topics, relationships)  Includes all (responsive) datasets of LOD Cloud
  • 21. Dataset profiling: results evaluation Stefan Dietze 27/03/14 NDCG (averaged over all datasets) . Datasets & Ground Truth  Yovisto, Oxpoints, LAK Dataset, Semantic Web Dogfood  Crowd-sourced topic indicators from datasets (keywords, tags)  Manual mapping to entities & category extraction (ranking according to frequency) Baselines  1) LDA, 2) tf/idf (applied to entire datasets)  Topic extraction according to our approach, weighting/ranking based on term weight Measure  NDCG @ rank l  Performance (time/NDCG) for different sampling strategies/sizes etc
  • 23. Stefan Dietze 27/03/14 Diversity of category profile for a single paper Berners-Lee, Tim; Hendler, James, Ora Lassila (2001). "The Semantic Web". Scientific American Magazine. person document dbp:Tim_Berners-Lee dbp:Category:1955_births dbp:Category:People_from_London dbp:Category:Buzzwords dbp:Semantic_Web dbp:Category:Semantic_Web dbp:Category:Web_Services dbp:Category:HTTP dbp:Category:Unitarian_Universalists first-level categories (dcterms:subject) dbp:Category:World_Wide_Web dbp:Category:Royal_Medal_winners
  • 24.  DBpedia category graph not an ideal “topic” vocabulary:  Broad and noisy  “Categories” vs “topics” (for capturing disciplines, thesauri like UMBEL or UNESCO Thesaurus seem better suited)  Hierarchy ?  Filtering of certain partitions of category graph (too generic categories etc)  Mixing categories across resource types (document, person) creates “perceived noise”  But: broadness is useful as general vocabulary for categorisation of all sorts of resource types Stefan Dietze 27/03/14 Dataset profiling: some lessons learned
  • 25. Stefan Dietze 27/03/14 http://data-observatory.org/led-explorer/  Type specific views on datasets/ categories  “Document” (foaf:document)  “Person “ (foaf:person)  “Course” (aaiso:course)  Currently applied to datasets in LinkedUp Catalog only (as schema mappings already available here) Type-specific exploration of dataset categories
  • 26. Stefan Dietze 27/03/14 Dataset interlinking recommendation Candidate datasets for interlinking? 34 t Linkset1 Linkset2 Problem  Given dataset t, ranking datasets from D according to probability score (di, t) to contain linking candidates (entities)  Features:  Vocabulary overlap  Existing links (SNA)  Datasets more likely to contain linking candidates if they (a) share common schema elements, or (b) already link to t or datasets t links to (friend of a friend) Conclusions  Roughly 60% MAP for both approaches  Future work: quantity of links, more remote links, extraction of dataset links rather than data from DataHub Lopes, G.R., Paes Leme, L.A.P., Nunes, B.P., Casanova, M.A., Dietze, S., Recommending Tripleset Interlinking through a Social Network Approach, The 14th International Conference on Web Information System Engineering (WISE 2013), Nanjing, China, 2013. Paes Leme, L. A. P., Lopes, G. R., Nunes, B. P., Casanova, M.A., Dietze, S., Identifying candidate datasets for data interlinking, in Proceedings of the 13th International Conference on Web Engineering, (2013). Rank 1 DBLP 2 ACM 3 OAI 4 CiteSeer 5 IBM 6 Roma 7 IEEE 8 Ulm 9 Pisa ? ?
  • 27. Stefan Dietze 27/03/14 37 Success models: data & applications  LinkedUp Challenge to identify innovative tools & applications  Evaluation methods and approaches “LinkedUp” – Linking Web Data (for Education) L Data linking & curation Technology transfer & community-building  Collecting & exposing open data => LinkedUp Data Catalog  Profiling and linking of Web Data for education => educational data graph [ESWC2013], [ISWC2013],  Disseminating knowledge & building communities (educators, computer scientists, data engineers)  Gathering stakeholder feedback: use cases, and requirements http://linkedup-challenge.org/#usecases http://linkedup-project.eu/events http://www.linkedup-challenge.org/ http://data.linkededucation.org European suport action to advance take-up of open data & related technologies http://www.linkedup-project.eu
  • 28. Stefan Dietze 27/03/14 17/09/2013 38 Who we areL LinkedUp Network LinkedUp Consortium LinkedUp Advisory Board
  • 29. LinkedUp Challenge: using open data (for learning)  Open Data Competition to promote tools and applications that analyse / integrate (Linked) Web data  Organised by LinkedUp project over 2 years (“Veni”, “Vidi”, “Vici”) with 40.000 EUR awards  Veni Competition - 22 submissions, 8 shortlisted for presentation at Open Knowledge Conference (17 September, Geneva Switzerland) http://linkedup-challenge.org Stefan Dietze 27/03/14
  • 30.  Open & focused track(s)  Final events at ESWC2014 (May, Crete)  Open Track only  Final events at OKCon 2013 (September 2013, Geneva)  Open track & focused tracks  Submission details and calls to be released soon  Final events at ISWC2014 (October, Riva del Garda, Italy) May –September 2013 October 2013 – May 2014 May 2014 – October 2014 ?
  • 31. The Veni shortlist & winners DataConf. KnowNodes Mismuseos ReCredible YourHistory 27/03/14 http://www.globe-town.org/ WeShare - 3rd price / people‘s choice GlobeTown - 2nd price http://seek.cloud.gsic.tel.uva.es/weshare/ http://www.polimedia.nl/ PoliMedia – 1st price
  • 32. data.l3s.de – a DataHub for the L3S
  • 33. Learning Analytics & Knowledge Dataset & Challenge Facilitating Research on Learning Analytics and EDM a nutshell Stefan Dietze 27/03/14 http://lak.linkededucation.org/ http://lak.linkededucation.org/ LAK Dataset (450 publications in RDF/R)  ACM International Conference on Learning Analytics and Knowledge (LAK) (2011-13)  International Conference on Educational Data Mining (2008-13)  Journal of Educational Data Mining (2008-12) LAK Data Challenge  Analyse, explore correlate the LAK Dataset  At ACM LAK 2014 (April 2014, Indianapolis)
  • 34. KEYSTONE COST ACTION 27/03/14 51Stefan Dietze http://www.keystone-cost.eu/  Research network focused on distributed search, dataset profiling, to Semantic Web, Databases, etc.  Running 2013-2017  WG1: Representation of structured data sources  WG2: Keyword search  WG3: User interaction and query interpretation  WG4: Research integration, showcases, benchmarks, and evaluations  Open to new members (even beyond Europe)  Joint workshops (eg PROFILES2014 @ ESWC2014)
  • 35. Ongoing/future work … and some upcoming events Linked Data evolution, preservation, consistency  In RDF graphs (eg LOD Cloud), „all“ nodes are connected  LD preservation: which datasets to preserve (direct links or even more distant neighbours)? => semantic relatedness as guidance for scalable preservation strategies /data enrichment  Link correctness in evolving LD  Investigating impact of changes on link correctness (weekly LOD crawls over 1 year time span)  Application: informed preservation strategies  Conflict detection and LD quality (link quality, impact of conflicts in distant nodes)  PROFILES workshop @ ESWC2014 (http://keystone-cost.eu/profiles2014)  26 May 2014, Crete, Greece  Linking User Data 2014 at UMAP2014 (http://liud.linkededucation.org)  Deadline: 1 April  Online Learning & LD Tutorial at WWW2014 (http://www2014.kr/)  07 April, Seoul
  • 36. Thank you! WWW See also (general)  http://linkedup-project.eu  http://linkededucation.org  http://data.l3s.de http://purl.org/dietze See also (data)  http://data.linkededucation.org  http://data.linkededucation.org/linkedup/catalog/  http://lak.linkededucation.org 27/03/14 54Stefan Dietze  Besnik Fetahu (L3S)  Bernardo Pereira Nunes (PUC Rio)  Marco Casanova (PUC Rio)  Luiz Andre Paes Leme (PUC Rio)  Giseli Lopes (PUC Rio)  Davide Taibi (CNR, IT)  Mathieu d’Aquin (Open University, UK)  and many more… Acknowledgements