SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
Backup
Semantic Linking & Retrieval for Digital Libraries
Dr. Stefan Dietze
11.02.2016
Institut für Informatik/Universität Bonn
29/03/16 1Stefan Dietze
Stefan Dietze
Overview: research/application context
Information (types)
 Bibliographic (meta)data
 Research information
 Educational (meta)data
 Web & social data
Stakeholders
 Archival organisations
 Digital libraries
 Publishers
 Resource providers/
consumers
Domains
 Life Sciences
 Computer Science
 Learning Analytics
 ...
Data-centric tasks
 Publishing, preservation, annotation, crawling, search, retrieval ...
29/03/16 2Stefan Dietze
Overview: contents
Introduction & motivation
Publishing, linking and profiling
 Publishing & linking (bibliographic) data
 Dataset profiling & linking
Retrieval & search
 Entity retrieval in large graphs
 Embedded (bibliographic) Web data
 Entity summarisation from Web markup
Outlook and future directions
Stefan Dietze
Information (types)
 Bibliographic (meta)data
 Research information
 Educational (meta)data
 Web & social data
Stakeholders
 Archival organisations
 Digital libraries
 Publishers
....
Domains
 Life Sciences
 Computer Science
 Learning Analytics
 ...
Data-centric tasks
 Publishing, preservation, annotation, crawling, search, retrieval ...
29/03/16 3Stefan Dietze
Introduction & motivation
Publishing, linking and profiling
 Publishing & linking (bibliographic) data
 Dataset profiling & linking
Retrieval & search
 Entity retrieval in large graphs
 Embedded (bibliographic) Web data
 Entity summarisation from Web markup
Outlook and future directions
Overview: contents
knowledge graphs and linked data
beyond LD: embedded semantics
[ESWC13, ESCW14]
[ISWC15]
[WebSci13, SWJ15]
Stefan Dietze
Information (types)
 Bibliographic (meta)data
 Research information
 Educational (meta)data
 Web & social data
Stakeholders
 Archival organisations
 Digital libraries
 Publishers
....
Domains
 Life Sciences
 Computer Science
 Learning Analytics
 ...
Data-centric tasks
 Publishing, preservation, annotation, crawling, search, retrieval ...
[ongoing]
29/03/16 4Stefan Dietze
Linked Data diversity: example library & scholarly data
 Linked Data: W3C standards & de-facto standard for sharing data on the Web (roughly 1000 datasets, 100 bn
triples), adopted specifically by library/GLAM sector & life sciences
 Strong focus on established knowledge graphs, e.g. Yago, DBpedia, Freebase (still)
Vocabularies/Schemas
 BIBO, Bibliographic Ontology
 BIRO, Bibliographic Reference Ontology
 CITO, Citation Typing Ontology
 SPAR vocabularies (incl. CITO, BIRO)
 SWRC (Semantic Web Dogfood)
 Functional Req. for Bibliographic Records (FRBR)
 Nature Publishing Group Ontology
 mEducator Educational Resources
 ....
Datasets
 EUROPEANA
 British Library
 Deutsche-, Französische-, Spanische
Nationalbibliotheken
 Nature Publishing Group
 Hochschulbibliothekszentrum NRW
 Elsevier Scholarly Publications
 TED Talks
 mEducator Linked Educational Resources
 Open Courseware Consortium
 LAK Dataset
 ...
Initiatives
 W3C Library Linked Data Incubator Group
 Linked Library Data group on DataHub
 LinkedUniversities.org
 LinkedEducation.org
 W3C Linked Open Education Community Group
 ...
29/03/16 5Stefan Dietze
?
?
? ?? ?
Challenge: efficient search for suitable resources & datasets
 „Quality“: currency, dynamics, accessibility [Buil-Aranda2013],
correctness [Paulheim2013], schema compliance [Hogan2012]
 Domains/topics: which datasets/resources address topic XY (e.g.
„microbiology“) ?
 Types: statistical data, bibliographic resources, AV resources,
scholarly publications?
 Links: related datasets?
29/03/16 6Stefan Dietze
Data publishing, linking and profiling: LinkedUp
Dataset
Catalog/Registry
http://data.linkededucation.org/linkedup/catalog/
 LinkedUp project (FP7 project: L3S, OU, OKFN, Elsevier, Exact Learning solutions)
 LinkedUp Catalog: largest collection of LD/Open Data for educationally relevant resources (approx. 50 Datasets)
 Original datasets published with key content providers, automatically extracted metadata
29/03/16 7Stefan Dietze
Dietze, S., Kaldoudi, E., Dovrolis, E., Giordano, D.,
Spampinato, C., Hendrix, M., Protopsaltis, A., Taibi, D., Yu,
H. Q. (2013), Socio-semantic Integration of Educational
Resources – the Case of the mEducator Project, in
Journal of Universal Computer Science (J.UCS), Vol. 19,
No. 11, pp. 1543-1569.
Dietze, S., Taibi, D., Yu, H. Q., Dovrolis, N., A Linked
Dataset of Medical Educational Resources, British
Journal of Educational Technology (BJET), Volume 46,
Issue 5, pages 1123–1129, September 2015.
mEducator: medical educational resources
 EC-funded eContentPlus project (2009-2012)
 Exploratory search through semantic and clustering techniques
 Lifting/enriching/clustering medical metadata
 Common vocabularies (MESH, SNOMED, Bioportal etc)
 mEducator dataset: first Linked Data corpus of enriched OER
metadata, used by number of applications
29/03/16 8Stefan Dietze
LAK Dataset: facilitating scientometrics
Concept ofType #
Reference npg:Citation 7885
Author foaf:Person 1214
Conference Paper swrc:InProceedings 652
Organization foaf:Organization 365
Journal Paper bibo:Article 45
Proceedings Volume swrc:Proceedings 15
Journal Volume bibo:Journal 9
 Cooperation of
 Linked Data corpus of „Learning Analytics“publications
of last 5 years (~ 800 publications)
 Metadata, full-text & automated linking
(DBLP, SWDF, DBpedia)
 Wide adoption (http://lak.linkededucation.org)
1. Data extraction & vocabulary definition
2.3. Applications & analysis Entity co-reference resolution & linking
Facilitating Scientometrics in Learning Analytics and
Educational Data Mining - the LAK Dataset, Dietze, S.,
Taibi, D., D’Aquin, M.,Semantic Web Journal, 2015.
29/03/16 9Stefan Dietze
29/03/16 10Stefan Dietze
LinkedUp Catalog: dataset index & registry, federated searchn a
nutshell “Federated queries” through schema mappings
 Dataset accessability
 Linking & topic profiling
Schema/Types
Co-occurence of
types
(in 146 datasets:
144 vocabularies,
588 overlapping
types, 719
predicates)
Assessing the Educational Linked Data Landscape,
D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science
2013 (WebSci2013), Paris, France, May 2013.
po:Programme
yov:Video
?
bibo:Book
Schema analysis & mapping
29/03/16 11Stefan Dietze
typeX
typeX
Co-occurence after
mapping
(201 frequently
occuring types,
mapped into 79 types)
bibo:Film
bibo:Document
po:Programme
bibo:Book
foaf:Document
yov:Video
typeX
Assessing the Educational Linked Data Landscape,
D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science
2013 (WebSci2013), Paris, France, May 2013.
Schema analysis & mapping
Co-occurence of
types
(in 146 datasets:
144 vocabularies,
588 overlapping
types, 719
predicates)
29/03/16 12
29/03/16 13Stefan Dietze
http://data.linkededucation.org/linkedup/catalog/
LinkedUp Catalog: dataset index & registry, federated searchn a
nutshell “Federated queries” through schema mappings
 Dataset accessability
 Linking & topic profiling
Dataset topic
profiles
contains
yov:Video
<yo:Video …>
<dc:title> Lecture 29 –
Stem Cells </dc:title>
…
</yo:Video…>
Yovisto Video
db:Medicine
db:Rudolf
Virchow
db:Cell
Biology
 Linking entities/datasets through combination of (i)
„semantic (graph-based) connectivity score (SCS)“ (based
on Katz centrality) and „co-occurence-based measure
(CBM)“ (similar to Normalised Google Distance)
 Evaluation: outperforming Explicit Semantic Analysis (ESA)
SCS = 0.32
CBM = 0.24
Data(set) interlinking
bibo:Book
British Library Book
<bibo:Book …>
<bibo:title>Über den Hungertyphus</.>
<bibo:creator>Rudolf Virchov</…>
</bibo:Book…>
Combining a co-occurrence-based and a semantic
measure for entity linking, B. P. Nunes, S. Dietze, M.A.
Casanova, R. Kawase, B. Fetahu, and W. Nejdl., ESWC 2013
- 10th Extended Semantic Web Conference, (May 2013).
?
29/03/16 14
db:Cell
(Biology)
db:Cell(Micro-
processor)
Stefan Dietze
db:Biology
db:Cell biology
Dataset
Catalog/Registry
yov:Video
<yo:Video …>
<dc:title>Lecture 29 –
Stem Cells</dc:title>
…
</yo:Video…>
Yovisto Video
 Extraction of representative (DBpedia) categories („topic profile“) for arbitrary datasets
 Technically trivial, but scalability issues: LOD Cloud 1000+ datasets with <100 billion RDF statements
 Efficient approach: sampling & ranking for balance between scalability and precision /recall
Scalable profiling of datasets
A Scalable Approach for Efficiently Generating
Structured Dataset Topic Profiles, Fetahu, B.,
Dietze, S., Nunes, B. P., Casanova, M. A., Nejdl, W.,
11th Extended Semantic Web Conference
(ESWC2014), Crete, Greece, (2014).
db:Cell
(Biology)
29/03/16 15
db:Cell
(Biology)
Stefan Dietze
Efficient dataset profiling
1. Sampling of resources
(random sampling, weighted sampling, resource
centrality sampling)
2. Entity- & topic-extraction (NER via DBpedia Spotlight,
category mapping & -expansion)
3. Normalisation & ranking (graph-based models such as
PageRank with Priors, HITS with Priors & K-Step Markov)
 Result: weighted dataset-topic profile graph
A Scalable Approach for Efficiently Generating
Structured Dataset Topic Profiles, Fetahu, B.,
Dietze, S., Nunes, B. P., Casanova, M. A., Nejdl, W.,
11th Extended Semantic Web Conference
(ESWC2014), Crete, Greece, (2014).
29/03/16 16Stefan Dietze
Search & exploration of datasets through topic profiles
in a nutshell Applied to entire LOD cloud/graph
 Visual exploration of extracted RDF dataset profiles
(datasets, topics, relationships)
 Evaluation results: K-Step Markov (10% sampling size)
outperforms baselines (LDA, tf/idf on entire datasets)
http://data-observatory.org/lod-profiles/
29/03/16 17Stefan Dietze
Search: entity retrieval on large structured datasets?
in a nutshell
Challenges
 How to efficiently retrieve related entities/resources for given query ?
 Explicit entity links (owl:sameAs etc) are sparse yet important to facilitate state of the art methods
(eg BM25F, Blanco et al, ISWC2011)
 Query type affinity?
29/03/16 18Stefan Dietze
??
Large dataset/crawl
e.g. LinkedUp dataset graph, LIVIVO dataset, BTC2014
entities related to <James D. Watson>
?
BTC2014
Entity retrieval: approach
in a nutshell
(I) Offline processing (clustering to address link sparsity)
1. Feature vectors (lexical and structural features)
2. Bucketing: per type (LSH algorithm)
3. Clustering: X-means & Spectral clustering per bucket
Improving Entity Retrieval on Structured Data,
Fetahu, B., Gadiraju, U., Dietze, S., 14th International
Semantic Web Conference (ISWC2014), Bethlehem,
US, (2015).
(II) Online processing (retrieval)
1. Retrieval & expansion:
a) BM25F results
b) expansion from clusters (related entities)
2. Re-Ranking
(context terms & query type affinity)
29/03/16 19Stefan Dietze
Dataset
 BTC2014 (1.4 billion triples)
 92 SemSearch queries
Methods
 Our approaches: XM: Xmeans, SP: Spectral
 Baselines B: BM25F, S1: Tonon et al [SIGIR12]
Conclusions
 XM & SP outperform baselines
 Clustering to remedy link sparsity
 Relevance to query crucial
Improving Entity Retrieval on Structured Data,
Fetahu, B., Gadiraju, U., Dietze, S., 14th International
Semantic Web Conference (ISWC2014), Bethlehem,
US, (2015).
Entity retrieval: evaluation
29/03/16 20Stefan Dietze
Introduction & motivation
Publishing, linking and profiling
 Publishing & linking (bibliographic) data
 Dataset profiling & linking
Retrieval & search
 Entity retrieval in large graphs
 Embedded (bibliographic) Web data
 Entity summarisation from Web markup
Outlook and future directions
Overview: contents so far
29/03/16 21Stefan Dietze
[ESWC13, ESCW14]
[ISWC15]
[WebSci13, SWJ15]
Outcomes & impact ?
Tangible outcomes / impact
Open Datasets
Applications
Vocabularies & Schemas
Initiatives & Working Groups
VOL
+ vocabularies for educational resource & service modeling
 W3C Community Group
„Open Linked Education“
 DCMI Task Force on LRMI
 W3C Schema Bib Extend Group
 Tutorial & workshop series on
Linked Data & Learning
 LinkedUniversities, LinkedEducation.org
 KEYSTONE WG „Search and Profiling of LD“
 ….
http://linkeduniversties.org
29/03/16 22Stefan Dietze
Introduction & motivation
Publishing, linking and profiling
 Publishing & linking (bibliographic) data
 Dataset profiling & linking
Retrieval & search
 Entity retrieval in large graphs
 Embedded (bibliographic) Web data
 Entity summarisation from Web markup
Outlook and future directions
Overview: contents
beyond LD: embedded semantics
Stefan Dietze
Information (types)
 Bibliographic (meta)data
 Research information
 Educational (meta)data
 Web & social data
Stakeholders
 Archival organisations
 Digital libraries
 Publishers
....
Domains
 Life Sciences
 Computer Science
 Learning Analytics
 ...
Data-centric tasks
 Publishing, preservation, annotation, crawling, search, retrieval ...
29/03/16 23Stefan Dietze
 The Web: approx. 46.000.000.000.000 (46 trillion) Web pages indexed
by Google
vs
 Linked Data: approx. 1000 datasets & 100 billion statements
- different order of magnitude wrt scale & dynamics
 Other „semantics“ (structured facts) on the Web?
The Web as a knowledge base: semantics on the Web?
29/03/16 24Stefan Dietze
 Embedded markup (RDFa, Microdata, Microformats) for
interpretation of Web documents (search, retrieval)
 Arbitrary vocabularies; schema.org used at scale:
(700 classes, 1000 predicates)
 Adoption on the Web: 26 %
(2014 Google study of 12 bn Web pages)
 “Web Data Commons” (Meusel & Paulheim [ISWC2014])
• Markup from Common Crawl (2.2 billion pages):
17 billion RDF quads
• Markup in 26% of pages, 14% of PLDs in 2013
(increase from 6% in 2011)
 Same order of magnitude as “the Web”
Embedded semantics: Web page markup & schema.org
<div itemscope itemtype ="http://schema.org/Movie">
<h1 itemprop="name">Forrest Gump</h1>
<span>Actor: <span itemprop=„actor">Tom Hanks</span>
<span itemprop="genre">Drama</span>
...
</div>
29/03/16 25
RDF statements
node1 actor _node-x
node1 actor Robin Wright
node1 genre Comedy
node2 actor T. Hanks
node2 distributed by Paramount Pic.
node3 actor Tom Cruise
node3 distributed by Paramount Pic.
Stefan Dietze
29/03/16 26Stefan Dietze
Characteristics Example
Coreferences
18.000 results for <„Iphone 6“, type, s:Product>
(8,6 quads on average)
Redundancy
<s, schema:name, „Iphone 6“> occuring 1000
times in WDC2013
Lack of links Largely unlinked entity descriptions / subgraphs
Errors
(typos & schema
violations, see
Meusel et al
[ESWC2015])
Wrong namespaces, such as http://schma.org
Undefined types & predicates:
9,7 % in WDC, less common than in LOD
Confusion of datatype and object properties:
<s1, s:publisher, „Springer“>, 24,35 % object
property issues vs 8% in LOD
Data property range violations: e.g. literals vs
numbers (12,6% in WDC vs 4,6 in LOD)
Using markup as global knowledge base - state of the art
 Glimmer (http://glimmer.research.yahoo.com):
entity retrieval (BM25F) on WDC dataset
[Blanco, Mika & Vigna, ISWC2011]
 Challenges: specific characteristics of markup data
 Goal: obtaining entity summary (or entity-centric knowledge graph) for given query ?
 Tasks: document annotation, knowledge base augmentation, semantic enrichments
Using markup as global knowledge base/graph?
Web page
markup
29/03/16 27Stefan Dietze
Query
Nucleic Acids, type:(Article)
Entity Summary/Graph
Name
Molecular structure of nucleic
acids
author
James D. Watson
Francis Crick
publisher Nature
datePublished 1953
Web crawls, WDC or large (domain-specific) crawls:
e.g. publishers, universities, libraries etc
Candidate Facts
node1 name
Molecular structure
of nucleic acids
node1 author James D. Watson
node1 publisher Nature
node1 datePublished 1956
node1 datePublished 1953
node2 name Francis Crick
node2 name Cricks
 Extract (domain-specific) knowledge bases and knowledge graphs for digital libraries
 Experiments on WDC data: 87,6 % MAP, coverage: on average 57% additional facts compared to DBpedia
Ongoing work: entity summarisation from markup data
Query
Nucleic Acids, type:(Article) 1. Retrieval
2. Fact selection
Entity Summary/Graph
Name
Molecular structure of nucleic
acids
author
James D. Watson
Francis Crick
publisher Nature
datePublished 1953
29/03/16 28
New Queries
James D. Watson, type:(Person)
Francis Crick, type:(Person)
Nature, type:(Organization)
Stefan Dietze
Web crawls, WDC or large (domain-specific) crawls:
e.g. publishers, universities, libraries etc
Web page
markup
(clustering, heuristics, trained classifier)
1
10
100
1000
10000
100000
1000000
10000000
1 51 101 151 201
count(log)
PLD (ranked)
# entities # statements
Unprecedented source of bibliographic data
 Metadata about scholarly articles
(s:ScholarlyArticle): 6.793.764 quads, 1.184.623
entities, 429 distinct predicates (in WDC / 1 type
alone)
 Top 5 domains: Springer, MDPI, BMJ,
diabetesjournals.org, mendeley.com,
Biodiversitylibrary.org
Domains, topics, disciplines?
 Life Sciences and Computer Science predominant
 Top-10 article titles
 Most important publishers/journals, libraries
represented
=> Domain-specific & targeted crawls
= unprecedented source of data
Embedded data for digital libraries / life sciences?
29/03/16 29Stefan Dietze
Knowledge graphs and LD
(Yago, Freebase, Pubmed, DBLP etc)
Entity
node1 name
Molecular structure of
nucleic acids
node1 author James D. Watson
node1 publisher Nature
node1 datePublished 1956
node1 datePublished 1953
Future work: improving entity-centric tasks for digital libraries
29/03/16 30
Entity
node2 name Francis Crick
node2 name Cricks
node2 born 1916
Stefan Dietze
• Web data as knowledge resource
• Background knowledge/structured data
• Training data & ground truths
• ....
Embedded
data
Unstructured (Web)
documents
Linked Data
Improving data-centric tasks for large
(bibliographic/life sciences) corpora, eg LIVIVO
• KB construction & augmentation
• Document annotation
• Entity recognition, disambiguation, interlinking
• Search & retrieval ...
Acknowledgements: team
 Besnik Fetahu (L3S)
 Ivana Marenzi (L3S)
 Ujwal Gadiraju (L3S)
 Eelco Herder (L3S)
 Ran Yu (L3S)
 Ricardo Kawase (L3S)
 Pracheta Sahoo (L3S, IIT India)
 Bernardo Pereira Nunes (L3S, PUC Rio)
+ external collaborators
29/03/16 31Stefan Dietze
References (presented work)
Dietze, S., Taibi, D., D’Aquin, M., Facilitating Scientometrics in Learning Analytics and Educational Data Mining - the LAK Dataset,
Semantic Web Journal, 2016.
Dietze, S., Kaldoudi, E., Dovrolis, E., Giordano, D., Spampinato, C., Hendrix, M., Protopsaltis, A., Taibi, D., Yu, H. Q. (2013), Socio-
semantic Integration of Educational Resources – the Case of the mEducator Project, in Journal of Universal Computer Science (J.UCS),
Vol. 19, No. 11, pp. 1543-1569.
Dietze, S., Taibi, D., Yu, H. Q., Dovrolis, N., A Linked Dataset of Medical Educational Resources, British Journal of Educational
Technology (BJET), Volume 46, Issue 5, pages 1123–1129, September 2015.
Gadiraju, U., Demartini, G., Kawase, R., Dietze, S. Human beyond the Machine: Challenges and Opportunities of Microtask
Crowdsourcing. In: IEEE Intelligent Systems, Volume 30 Issue 4 – Jul/Aug 2015.
Gadiraju, U., Kawase, R., Dietze, S, Demartini, G., Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of Online
Surveys. ACM CHI Conference on Human Factors in Computing Systems (CHI2015), April 18-23, Seoul, Korea.
Fetahu, B., Gadiraju, U., Dietze, S., Improving Entity Retrieval on Structured Data, 14th International Semantic Web Conference
(ISWC2014), Bethlehem, US, (2015).
Fetahu, B., Dietze, S., Nunes, B. P., Casanova, M. A., Nejdl, W., A Scalable Approach for Efficiently Generating Structured Dataset Topic
Profiles, 11th Extended Semantic Web Conference (ESWC2014), Crete, Greece, (2014).
D’Aquin, M., Adamou, A., Dietze, S., Assessing the Educational Linked Data Landscape, ACM Web Science 2013 (WebSci2013), Paris,
France, May 2013.
Nunes, B. P., Dietze, S., Casanova, M.A., Kawase, R., Fetahu, B., Nejdl, W., Combining a co-occurrence-based and a semantic measure
for entity linking, in: The Semantic Web: Semantics and Big Data, Proceedings of the 10th Extended Semantic Web Conference
(ESWC2013), Lecture Notes in Computer Science Vol. 7882, Springer Berlin Heidelberg, 2013.
http://www.stefandietze.net
29/03/16 32Stefan Dietze
Selected related work
Entity retrieval
 Alberto Tonon, Gianluca Demartini, and Philippe Cudré-Mauroux. Combining Inverted Indices and Structured
Search for Ad-hoc Object Retrieval. In: 35th Annual ACM SIGIR Conference (SIGIR 2012), Portland, Oregon,
USA, August 2012.
 Roi Blanco, Peter Mika, Sebastiano Vigna: Effective and Efficient Entity Search in RDF Data. International
Semantic Web Conference (ISWC) 2011, pages 83-97.
Embedded markups & Web Data Commons
 Robert Meusel, Petar Petrovski, Christian Bizer: The WebDataCommons Microdata, RDFa and Microformat
Dataset Series. Proceedings of the 13th International Semantic Web Conference (ISWC 2014), RBDS Track,
Trentino, Italy, October 2014.
 Robert Meusel and Heiko Paulheim: Heuristics for Fixing Common Errors in Deployed schema.org Microdata.
Proceedings of the 12th Extended Semantic Web Conference (ESWC 2015), Portoroz, Slovenia, May 2015
Linked Data quality
 Carlos Buil-Aranda, Aidan Hogan, Jürgen Umbrich Pierre-Yves Vandenbussch, SPARQL Web-Querying
Infrastructure: Ready for Action?, International Semantic Web Conference 2013, (ISWC2013).
 Paulheim H., Bizer, C., Type Inference on Noisy RDF Data, Semantic Web – ISWC 2013, Lecture Notes in
Computer Science Volume 8218, 2013, pp 510-525
 Hogan, A., Umbrich, J., Harth, A., Cyganiak, R., Polleres, A., Decker., S., An empirical survey of Linked Data
conformance. Journal of Web Semantics 14, 2012
29/03/16 33Stefan Dietze
Thank you
29/03/16 34Stefan Dietze
• http://stefandietze.net
• http://data.l3s.de
• http://data.linkededucation.org/linkedup/catalog

Contenu connexe

Tendances

The library in the life of the user
The library in the life of the userThe library in the life of the user
The library in the life of the userlisld
 
Library futures: converging and diverging directions for public and academic ...
Library futures: converging and diverging directions for public and academic ...Library futures: converging and diverging directions for public and academic ...
Library futures: converging and diverging directions for public and academic ...lisld
 
Full Spectrum Stewardship of the Scholarly Record by Brian E. C. Schottlaende...
Full Spectrum Stewardship of the Scholarly Record by Brian E. C. Schottlaende...Full Spectrum Stewardship of the Scholarly Record by Brian E. C. Schottlaende...
Full Spectrum Stewardship of the Scholarly Record by Brian E. C. Schottlaende...Charleston Conference
 
The Library in the Life of the User: Two Collection Directions
The Library in the Life of the User: Two Collection DirectionsThe Library in the Life of the User: Two Collection Directions
The Library in the Life of the User: Two Collection Directionslisld
 
The facilitated collection: collections and collecting in a network environment
The facilitated collection: collections and collecting in a network environmentThe facilitated collection: collections and collecting in a network environment
The facilitated collection: collections and collecting in a network environmentlisld
 
The Evolving Scholarly Record Framing the Landscape
The Evolving Scholarly Record Framing the LandscapeThe Evolving Scholarly Record Framing the Landscape
The Evolving Scholarly Record Framing the LandscapeOCLC
 
Social metadata for libraries, archives and museums: Research findings from t...
Social metadata for libraries, archives and museums: Research findings from t...Social metadata for libraries, archives and museums: Research findings from t...
Social metadata for libraries, archives and museums: Research findings from t...Rose Holley
 
Collections unbound: collection directions and the RLUK collective collection
Collections unbound: collection directions and the RLUK collective collectionCollections unbound: collection directions and the RLUK collective collection
Collections unbound: collection directions and the RLUK collective collectionlisld
 
Library collections and the emerging scholarly record
Library collections and the emerging scholarly recordLibrary collections and the emerging scholarly record
Library collections and the emerging scholarly recordlisld
 
Working collaboratively: scaling infrastructure, services, learning and innov...
Working collaboratively: scaling infrastructure, services, learning and innov...Working collaboratively: scaling infrastructure, services, learning and innov...
Working collaboratively: scaling infrastructure, services, learning and innov...lisld
 
Libraries: technology as artifact and technology in practice
Libraries: technology as artifact and technology in practiceLibraries: technology as artifact and technology in practice
Libraries: technology as artifact and technology in practicelisld
 
Virtual Research Networks : Towards Research 2.0
Virtual Research Networks : Towards Research 2.0Virtual Research Networks : Towards Research 2.0
Virtual Research Networks : Towards Research 2.0Guus van den Brekel
 
Library discovery: past, present and some futures
Library discovery: past, present and some futuresLibrary discovery: past, present and some futures
Library discovery: past, present and some futureslisld
 
Challenges and opportunities for academic libraries
Challenges and opportunities for academic librariesChallenges and opportunities for academic libraries
Challenges and opportunities for academic librarieslisld
 
Describing Theses and Dissertations Using Schema.org
Describing Theses and Dissertations Using Schema.orgDescribing Theses and Dissertations Using Schema.org
Describing Theses and Dissertations Using Schema.orgOCLC
 
Environmental trends and OCLC Research, a presentation at the University of N...
Environmental trends and OCLC Research, a presentation at the University of N...Environmental trends and OCLC Research, a presentation at the University of N...
Environmental trends and OCLC Research, a presentation at the University of N...lisld
 
The Inside Out Library.
The Inside Out Library. The Inside Out Library.
The Inside Out Library. lisld
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...DeVonne Parks, CEM
 
OUR space: the new world of metadata
OUR space: the new world of metadataOUR space: the new world of metadata
OUR space: the new world of metadataKaren S Calhoun
 

Tendances (20)

The library in the life of the user
The library in the life of the userThe library in the life of the user
The library in the life of the user
 
Library futures: converging and diverging directions for public and academic ...
Library futures: converging and diverging directions for public and academic ...Library futures: converging and diverging directions for public and academic ...
Library futures: converging and diverging directions for public and academic ...
 
Full Spectrum Stewardship of the Scholarly Record by Brian E. C. Schottlaende...
Full Spectrum Stewardship of the Scholarly Record by Brian E. C. Schottlaende...Full Spectrum Stewardship of the Scholarly Record by Brian E. C. Schottlaende...
Full Spectrum Stewardship of the Scholarly Record by Brian E. C. Schottlaende...
 
The Library in the Life of the User: Two Collection Directions
The Library in the Life of the User: Two Collection DirectionsThe Library in the Life of the User: Two Collection Directions
The Library in the Life of the User: Two Collection Directions
 
The facilitated collection: collections and collecting in a network environment
The facilitated collection: collections and collecting in a network environmentThe facilitated collection: collections and collecting in a network environment
The facilitated collection: collections and collecting in a network environment
 
The Evolving Scholarly Record Framing the Landscape
The Evolving Scholarly Record Framing the LandscapeThe Evolving Scholarly Record Framing the Landscape
The Evolving Scholarly Record Framing the Landscape
 
Social metadata for libraries, archives and museums: Research findings from t...
Social metadata for libraries, archives and museums: Research findings from t...Social metadata for libraries, archives and museums: Research findings from t...
Social metadata for libraries, archives and museums: Research findings from t...
 
Collections unbound: collection directions and the RLUK collective collection
Collections unbound: collection directions and the RLUK collective collectionCollections unbound: collection directions and the RLUK collective collection
Collections unbound: collection directions and the RLUK collective collection
 
Library collections and the emerging scholarly record
Library collections and the emerging scholarly recordLibrary collections and the emerging scholarly record
Library collections and the emerging scholarly record
 
Working collaboratively: scaling infrastructure, services, learning and innov...
Working collaboratively: scaling infrastructure, services, learning and innov...Working collaboratively: scaling infrastructure, services, learning and innov...
Working collaboratively: scaling infrastructure, services, learning and innov...
 
Libraries: technology as artifact and technology in practice
Libraries: technology as artifact and technology in practiceLibraries: technology as artifact and technology in practice
Libraries: technology as artifact and technology in practice
 
Virtual Research Networks : Towards Research 2.0
Virtual Research Networks : Towards Research 2.0Virtual Research Networks : Towards Research 2.0
Virtual Research Networks : Towards Research 2.0
 
Library discovery: past, present and some futures
Library discovery: past, present and some futuresLibrary discovery: past, present and some futures
Library discovery: past, present and some futures
 
Challenges and opportunities for academic libraries
Challenges and opportunities for academic librariesChallenges and opportunities for academic libraries
Challenges and opportunities for academic libraries
 
Describing Theses and Dissertations Using Schema.org
Describing Theses and Dissertations Using Schema.orgDescribing Theses and Dissertations Using Schema.org
Describing Theses and Dissertations Using Schema.org
 
Environmental trends and OCLC Research, a presentation at the University of N...
Environmental trends and OCLC Research, a presentation at the University of N...Environmental trends and OCLC Research, a presentation at the University of N...
Environmental trends and OCLC Research, a presentation at the University of N...
 
The Inside Out Library.
The Inside Out Library. The Inside Out Library.
The Inside Out Library.
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
 
Redefining the Academic Library
Redefining the Academic LibraryRedefining the Academic Library
Redefining the Academic Library
 
OUR space: the new world of metadata
OUR space: the new world of metadataOUR space: the new world of metadata
OUR space: the new world of metadata
 

Similaire à Semantic Linking & Retrieval for Digital Libraries

WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons LearnedWWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons LearnedStefan Dietze
 
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the WebRetrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the WebStefan Dietze
 
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the WebBeyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the WebStefan Dietze
 
WWW2013 Tutorial: Linked Data & Education
WWW2013 Tutorial: Linked Data & EducationWWW2013 Tutorial: Linked Data & Education
WWW2013 Tutorial: Linked Data & EducationStefan Dietze
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)Besnik Fetahu
 
Mining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the WebMining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the WebStefan Dietze
 
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic WebWeb Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic WebStefan Dietze
 
Turning Data into Knowledge (KESW2014 Keynote)
Turning Data into Knowledge (KESW2014 Keynote)Turning Data into Knowledge (KESW2014 Keynote)
Turning Data into Knowledge (KESW2014 Keynote)Stefan Dietze
 
Riding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessRiding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessdatacite
 
What's all the data about? - Linking and Profiling of Linked Datasets
What's all the data about? - Linking and Profiling of Linked DatasetsWhat's all the data about? - Linking and Profiling of Linked Datasets
What's all the data about? - Linking and Profiling of Linked DatasetsStefan Dietze
 
Open Data and Institutional Repositories
Open Data and Institutional RepositoriesOpen Data and Institutional Repositories
Open Data and Institutional RepositoriesRobin Rice
 
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Natsuko Nicholls
 
LinkedUp - Linked Data Europe Workshop 2014
LinkedUp - Linked Data Europe Workshop 2014LinkedUp - Linked Data Europe Workshop 2014
LinkedUp - Linked Data Europe Workshop 2014Stefan Dietze
 
Open Data Dialog 2013 - Linked Data in Education
Open Data Dialog 2013 - Linked Data in EducationOpen Data Dialog 2013 - Linked Data in Education
Open Data Dialog 2013 - Linked Data in EducationStefan Dietze
 
LUCERO - Building the Open University Web of Linked Data
LUCERO - Building the Open University Web of Linked DataLUCERO - Building the Open University Web of Linked Data
LUCERO - Building the Open University Web of Linked DataMathieu d'Aquin
 
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryEdinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryRobin Rice
 
Demo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open DataDemo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open DataStefan Dietze
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsCarole Goble
 
Big Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday LearningBig Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday LearningStefan Dietze
 

Similaire à Semantic Linking & Retrieval for Digital Libraries (20)

WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons LearnedWWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
 
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the WebRetrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
 
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the WebBeyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
 
WWW2013 Tutorial: Linked Data & Education
WWW2013 Tutorial: Linked Data & EducationWWW2013 Tutorial: Linked Data & Education
WWW2013 Tutorial: Linked Data & Education
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)
 
Mining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the WebMining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the Web
 
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic WebWeb Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
 
Turning Data into Knowledge (KESW2014 Keynote)
Turning Data into Knowledge (KESW2014 Keynote)Turning Data into Knowledge (KESW2014 Keynote)
Turning Data into Knowledge (KESW2014 Keynote)
 
Riding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessRiding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information access
 
What's all the data about? - Linking and Profiling of Linked Datasets
What's all the data about? - Linking and Profiling of Linked DatasetsWhat's all the data about? - Linking and Profiling of Linked Datasets
What's all the data about? - Linking and Profiling of Linked Datasets
 
Open Data and Institutional Repositories
Open Data and Institutional RepositoriesOpen Data and Institutional Repositories
Open Data and Institutional Repositories
 
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
 
LinkedUp - Linked Data Europe Workshop 2014
LinkedUp - Linked Data Europe Workshop 2014LinkedUp - Linked Data Europe Workshop 2014
LinkedUp - Linked Data Europe Workshop 2014
 
Open Data Dialog 2013 - Linked Data in Education
Open Data Dialog 2013 - Linked Data in EducationOpen Data Dialog 2013 - Linked Data in Education
Open Data Dialog 2013 - Linked Data in Education
 
LUCERO - Building the Open University Web of Linked Data
LUCERO - Building the Open University Web of Linked DataLUCERO - Building the Open University Web of Linked Data
LUCERO - Building the Open University Web of Linked Data
 
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at ScaleFull Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
 
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryEdinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
 
Demo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open DataDemo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open Data
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
 
Big Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday LearningBig Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday Learning
 

Plus de Stefan Dietze

AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...Stefan Dietze
 
An interdisciplinary journey with the SAL spaceship – results and challenges ...
An interdisciplinary journey with the SAL spaceship – results and challenges ...An interdisciplinary journey with the SAL spaceship – results and challenges ...
An interdisciplinary journey with the SAL spaceship – results and challenges ...Stefan Dietze
 
Research Knowledge Graphs at NFDI4DS & GESIS
Research Knowledge Graphs at NFDI4DS & GESISResearch Knowledge Graphs at NFDI4DS & GESIS
Research Knowledge Graphs at NFDI4DS & GESISStefan Dietze
 
Research Knowledge Graphs at GESIS & NFDI4DataScience
Research Knowledge Graphs at GESIS & NFDI4DataScienceResearch Knowledge Graphs at GESIS & NFDI4DataScience
Research Knowledge Graphs at GESIS & NFDI4DataScienceStefan Dietze
 
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...Stefan Dietze
 
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...Stefan Dietze
 
Towards research data knowledge graphs
Towards research data knowledge graphsTowards research data knowledge graphs
Towards research data knowledge graphsStefan Dietze
 
Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...Stefan Dietze
 
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...Stefan Dietze
 
Using AI to understand everyday learning on the Web
Using AI to understand everyday learning on the WebUsing AI to understand everyday learning on the Web
Using AI to understand everyday learning on the WebStefan Dietze
 
Analysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online ActivitiesAnalysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online ActivitiesStefan Dietze
 
Analysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebAnalysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebStefan Dietze
 
Towards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the WebTowards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the WebStefan Dietze
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Stefan Dietze
 
Dietze linked data-vr-es
Dietze linked data-vr-esDietze linked data-vr-es
Dietze linked data-vr-esStefan Dietze
 
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Stefan Dietze
 
From Data to Knowledge - Profiling & Interlinking Web Datasets
From Data to Knowledge - Profiling & Interlinking Web DatasetsFrom Data to Knowledge - Profiling & Interlinking Web Datasets
From Data to Knowledge - Profiling & Interlinking Web DatasetsStefan Dietze
 
Open Data & Education Seminar, ITMO, St Petersburg, March 2014
Open Data & Education Seminar, ITMO, St Petersburg, March 2014Open Data & Education Seminar, ITMO, St Petersburg, March 2014
Open Data & Education Seminar, ITMO, St Petersburg, March 2014Stefan Dietze
 
Towards preservation of semantically enriched architectural knowledge
Towards preservation of semantically enriched architectural knowledgeTowards preservation of semantically enriched architectural knowledge
Towards preservation of semantically enriched architectural knowledgeStefan Dietze
 

Plus de Stefan Dietze (19)

AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...
 
An interdisciplinary journey with the SAL spaceship – results and challenges ...
An interdisciplinary journey with the SAL spaceship – results and challenges ...An interdisciplinary journey with the SAL spaceship – results and challenges ...
An interdisciplinary journey with the SAL spaceship – results and challenges ...
 
Research Knowledge Graphs at NFDI4DS & GESIS
Research Knowledge Graphs at NFDI4DS & GESISResearch Knowledge Graphs at NFDI4DS & GESIS
Research Knowledge Graphs at NFDI4DS & GESIS
 
Research Knowledge Graphs at GESIS & NFDI4DataScience
Research Knowledge Graphs at GESIS & NFDI4DataScienceResearch Knowledge Graphs at GESIS & NFDI4DataScience
Research Knowledge Graphs at GESIS & NFDI4DataScience
 
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
 
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
 
Towards research data knowledge graphs
Towards research data knowledge graphsTowards research data knowledge graphs
Towards research data knowledge graphs
 
Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...
 
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
 
Using AI to understand everyday learning on the Web
Using AI to understand everyday learning on the WebUsing AI to understand everyday learning on the Web
Using AI to understand everyday learning on the Web
 
Analysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online ActivitiesAnalysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online Activities
 
Analysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebAnalysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the Web
 
Towards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the WebTowards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the Web
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)
 
Dietze linked data-vr-es
Dietze linked data-vr-esDietze linked data-vr-es
Dietze linked data-vr-es
 
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
 
From Data to Knowledge - Profiling & Interlinking Web Datasets
From Data to Knowledge - Profiling & Interlinking Web DatasetsFrom Data to Knowledge - Profiling & Interlinking Web Datasets
From Data to Knowledge - Profiling & Interlinking Web Datasets
 
Open Data & Education Seminar, ITMO, St Petersburg, March 2014
Open Data & Education Seminar, ITMO, St Petersburg, March 2014Open Data & Education Seminar, ITMO, St Petersburg, March 2014
Open Data & Education Seminar, ITMO, St Petersburg, March 2014
 
Towards preservation of semantically enriched architectural knowledge
Towards preservation of semantically enriched architectural knowledgeTowards preservation of semantically enriched architectural knowledge
Towards preservation of semantically enriched architectural knowledge
 

Dernier

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Dernier (20)

Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

Semantic Linking & Retrieval for Digital Libraries

  • 1. Backup Semantic Linking & Retrieval for Digital Libraries Dr. Stefan Dietze 11.02.2016 Institut für Informatik/Universität Bonn 29/03/16 1Stefan Dietze
  • 2. Stefan Dietze Overview: research/application context Information (types)  Bibliographic (meta)data  Research information  Educational (meta)data  Web & social data Stakeholders  Archival organisations  Digital libraries  Publishers  Resource providers/ consumers Domains  Life Sciences  Computer Science  Learning Analytics  ... Data-centric tasks  Publishing, preservation, annotation, crawling, search, retrieval ... 29/03/16 2Stefan Dietze
  • 3. Overview: contents Introduction & motivation Publishing, linking and profiling  Publishing & linking (bibliographic) data  Dataset profiling & linking Retrieval & search  Entity retrieval in large graphs  Embedded (bibliographic) Web data  Entity summarisation from Web markup Outlook and future directions Stefan Dietze Information (types)  Bibliographic (meta)data  Research information  Educational (meta)data  Web & social data Stakeholders  Archival organisations  Digital libraries  Publishers .... Domains  Life Sciences  Computer Science  Learning Analytics  ... Data-centric tasks  Publishing, preservation, annotation, crawling, search, retrieval ... 29/03/16 3Stefan Dietze
  • 4. Introduction & motivation Publishing, linking and profiling  Publishing & linking (bibliographic) data  Dataset profiling & linking Retrieval & search  Entity retrieval in large graphs  Embedded (bibliographic) Web data  Entity summarisation from Web markup Outlook and future directions Overview: contents knowledge graphs and linked data beyond LD: embedded semantics [ESWC13, ESCW14] [ISWC15] [WebSci13, SWJ15] Stefan Dietze Information (types)  Bibliographic (meta)data  Research information  Educational (meta)data  Web & social data Stakeholders  Archival organisations  Digital libraries  Publishers .... Domains  Life Sciences  Computer Science  Learning Analytics  ... Data-centric tasks  Publishing, preservation, annotation, crawling, search, retrieval ... [ongoing] 29/03/16 4Stefan Dietze
  • 5. Linked Data diversity: example library & scholarly data  Linked Data: W3C standards & de-facto standard for sharing data on the Web (roughly 1000 datasets, 100 bn triples), adopted specifically by library/GLAM sector & life sciences  Strong focus on established knowledge graphs, e.g. Yago, DBpedia, Freebase (still) Vocabularies/Schemas  BIBO, Bibliographic Ontology  BIRO, Bibliographic Reference Ontology  CITO, Citation Typing Ontology  SPAR vocabularies (incl. CITO, BIRO)  SWRC (Semantic Web Dogfood)  Functional Req. for Bibliographic Records (FRBR)  Nature Publishing Group Ontology  mEducator Educational Resources  .... Datasets  EUROPEANA  British Library  Deutsche-, Französische-, Spanische Nationalbibliotheken  Nature Publishing Group  Hochschulbibliothekszentrum NRW  Elsevier Scholarly Publications  TED Talks  mEducator Linked Educational Resources  Open Courseware Consortium  LAK Dataset  ... Initiatives  W3C Library Linked Data Incubator Group  Linked Library Data group on DataHub  LinkedUniversities.org  LinkedEducation.org  W3C Linked Open Education Community Group  ... 29/03/16 5Stefan Dietze
  • 6. ? ? ? ?? ? Challenge: efficient search for suitable resources & datasets  „Quality“: currency, dynamics, accessibility [Buil-Aranda2013], correctness [Paulheim2013], schema compliance [Hogan2012]  Domains/topics: which datasets/resources address topic XY (e.g. „microbiology“) ?  Types: statistical data, bibliographic resources, AV resources, scholarly publications?  Links: related datasets? 29/03/16 6Stefan Dietze
  • 7. Data publishing, linking and profiling: LinkedUp Dataset Catalog/Registry http://data.linkededucation.org/linkedup/catalog/  LinkedUp project (FP7 project: L3S, OU, OKFN, Elsevier, Exact Learning solutions)  LinkedUp Catalog: largest collection of LD/Open Data for educationally relevant resources (approx. 50 Datasets)  Original datasets published with key content providers, automatically extracted metadata 29/03/16 7Stefan Dietze
  • 8. Dietze, S., Kaldoudi, E., Dovrolis, E., Giordano, D., Spampinato, C., Hendrix, M., Protopsaltis, A., Taibi, D., Yu, H. Q. (2013), Socio-semantic Integration of Educational Resources – the Case of the mEducator Project, in Journal of Universal Computer Science (J.UCS), Vol. 19, No. 11, pp. 1543-1569. Dietze, S., Taibi, D., Yu, H. Q., Dovrolis, N., A Linked Dataset of Medical Educational Resources, British Journal of Educational Technology (BJET), Volume 46, Issue 5, pages 1123–1129, September 2015. mEducator: medical educational resources  EC-funded eContentPlus project (2009-2012)  Exploratory search through semantic and clustering techniques  Lifting/enriching/clustering medical metadata  Common vocabularies (MESH, SNOMED, Bioportal etc)  mEducator dataset: first Linked Data corpus of enriched OER metadata, used by number of applications 29/03/16 8Stefan Dietze
  • 9. LAK Dataset: facilitating scientometrics Concept ofType # Reference npg:Citation 7885 Author foaf:Person 1214 Conference Paper swrc:InProceedings 652 Organization foaf:Organization 365 Journal Paper bibo:Article 45 Proceedings Volume swrc:Proceedings 15 Journal Volume bibo:Journal 9  Cooperation of  Linked Data corpus of „Learning Analytics“publications of last 5 years (~ 800 publications)  Metadata, full-text & automated linking (DBLP, SWDF, DBpedia)  Wide adoption (http://lak.linkededucation.org) 1. Data extraction & vocabulary definition 2.3. Applications & analysis Entity co-reference resolution & linking Facilitating Scientometrics in Learning Analytics and Educational Data Mining - the LAK Dataset, Dietze, S., Taibi, D., D’Aquin, M.,Semantic Web Journal, 2015. 29/03/16 9Stefan Dietze
  • 10. 29/03/16 10Stefan Dietze LinkedUp Catalog: dataset index & registry, federated searchn a nutshell “Federated queries” through schema mappings  Dataset accessability  Linking & topic profiling Schema/Types
  • 11. Co-occurence of types (in 146 datasets: 144 vocabularies, 588 overlapping types, 719 predicates) Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013. po:Programme yov:Video ? bibo:Book Schema analysis & mapping 29/03/16 11Stefan Dietze
  • 12. typeX typeX Co-occurence after mapping (201 frequently occuring types, mapped into 79 types) bibo:Film bibo:Document po:Programme bibo:Book foaf:Document yov:Video typeX Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013. Schema analysis & mapping Co-occurence of types (in 146 datasets: 144 vocabularies, 588 overlapping types, 719 predicates) 29/03/16 12
  • 13. 29/03/16 13Stefan Dietze http://data.linkededucation.org/linkedup/catalog/ LinkedUp Catalog: dataset index & registry, federated searchn a nutshell “Federated queries” through schema mappings  Dataset accessability  Linking & topic profiling Dataset topic profiles
  • 14. contains yov:Video <yo:Video …> <dc:title> Lecture 29 – Stem Cells </dc:title> … </yo:Video…> Yovisto Video db:Medicine db:Rudolf Virchow db:Cell Biology  Linking entities/datasets through combination of (i) „semantic (graph-based) connectivity score (SCS)“ (based on Katz centrality) and „co-occurence-based measure (CBM)“ (similar to Normalised Google Distance)  Evaluation: outperforming Explicit Semantic Analysis (ESA) SCS = 0.32 CBM = 0.24 Data(set) interlinking bibo:Book British Library Book <bibo:Book …> <bibo:title>Über den Hungertyphus</.> <bibo:creator>Rudolf Virchov</…> </bibo:Book…> Combining a co-occurrence-based and a semantic measure for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R. Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended Semantic Web Conference, (May 2013). ? 29/03/16 14 db:Cell (Biology) db:Cell(Micro- processor) Stefan Dietze
  • 15. db:Biology db:Cell biology Dataset Catalog/Registry yov:Video <yo:Video …> <dc:title>Lecture 29 – Stem Cells</dc:title> … </yo:Video…> Yovisto Video  Extraction of representative (DBpedia) categories („topic profile“) for arbitrary datasets  Technically trivial, but scalability issues: LOD Cloud 1000+ datasets with <100 billion RDF statements  Efficient approach: sampling & ranking for balance between scalability and precision /recall Scalable profiling of datasets A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles, Fetahu, B., Dietze, S., Nunes, B. P., Casanova, M. A., Nejdl, W., 11th Extended Semantic Web Conference (ESWC2014), Crete, Greece, (2014). db:Cell (Biology) 29/03/16 15 db:Cell (Biology) Stefan Dietze
  • 16. Efficient dataset profiling 1. Sampling of resources (random sampling, weighted sampling, resource centrality sampling) 2. Entity- & topic-extraction (NER via DBpedia Spotlight, category mapping & -expansion) 3. Normalisation & ranking (graph-based models such as PageRank with Priors, HITS with Priors & K-Step Markov)  Result: weighted dataset-topic profile graph A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles, Fetahu, B., Dietze, S., Nunes, B. P., Casanova, M. A., Nejdl, W., 11th Extended Semantic Web Conference (ESWC2014), Crete, Greece, (2014). 29/03/16 16Stefan Dietze
  • 17. Search & exploration of datasets through topic profiles in a nutshell Applied to entire LOD cloud/graph  Visual exploration of extracted RDF dataset profiles (datasets, topics, relationships)  Evaluation results: K-Step Markov (10% sampling size) outperforms baselines (LDA, tf/idf on entire datasets) http://data-observatory.org/lod-profiles/ 29/03/16 17Stefan Dietze
  • 18. Search: entity retrieval on large structured datasets? in a nutshell Challenges  How to efficiently retrieve related entities/resources for given query ?  Explicit entity links (owl:sameAs etc) are sparse yet important to facilitate state of the art methods (eg BM25F, Blanco et al, ISWC2011)  Query type affinity? 29/03/16 18Stefan Dietze ?? Large dataset/crawl e.g. LinkedUp dataset graph, LIVIVO dataset, BTC2014 entities related to <James D. Watson> ? BTC2014
  • 19. Entity retrieval: approach in a nutshell (I) Offline processing (clustering to address link sparsity) 1. Feature vectors (lexical and structural features) 2. Bucketing: per type (LSH algorithm) 3. Clustering: X-means & Spectral clustering per bucket Improving Entity Retrieval on Structured Data, Fetahu, B., Gadiraju, U., Dietze, S., 14th International Semantic Web Conference (ISWC2014), Bethlehem, US, (2015). (II) Online processing (retrieval) 1. Retrieval & expansion: a) BM25F results b) expansion from clusters (related entities) 2. Re-Ranking (context terms & query type affinity) 29/03/16 19Stefan Dietze
  • 20. Dataset  BTC2014 (1.4 billion triples)  92 SemSearch queries Methods  Our approaches: XM: Xmeans, SP: Spectral  Baselines B: BM25F, S1: Tonon et al [SIGIR12] Conclusions  XM & SP outperform baselines  Clustering to remedy link sparsity  Relevance to query crucial Improving Entity Retrieval on Structured Data, Fetahu, B., Gadiraju, U., Dietze, S., 14th International Semantic Web Conference (ISWC2014), Bethlehem, US, (2015). Entity retrieval: evaluation 29/03/16 20Stefan Dietze
  • 21. Introduction & motivation Publishing, linking and profiling  Publishing & linking (bibliographic) data  Dataset profiling & linking Retrieval & search  Entity retrieval in large graphs  Embedded (bibliographic) Web data  Entity summarisation from Web markup Outlook and future directions Overview: contents so far 29/03/16 21Stefan Dietze [ESWC13, ESCW14] [ISWC15] [WebSci13, SWJ15] Outcomes & impact ?
  • 22. Tangible outcomes / impact Open Datasets Applications Vocabularies & Schemas Initiatives & Working Groups VOL + vocabularies for educational resource & service modeling  W3C Community Group „Open Linked Education“  DCMI Task Force on LRMI  W3C Schema Bib Extend Group  Tutorial & workshop series on Linked Data & Learning  LinkedUniversities, LinkedEducation.org  KEYSTONE WG „Search and Profiling of LD“  …. http://linkeduniversties.org 29/03/16 22Stefan Dietze
  • 23. Introduction & motivation Publishing, linking and profiling  Publishing & linking (bibliographic) data  Dataset profiling & linking Retrieval & search  Entity retrieval in large graphs  Embedded (bibliographic) Web data  Entity summarisation from Web markup Outlook and future directions Overview: contents beyond LD: embedded semantics Stefan Dietze Information (types)  Bibliographic (meta)data  Research information  Educational (meta)data  Web & social data Stakeholders  Archival organisations  Digital libraries  Publishers .... Domains  Life Sciences  Computer Science  Learning Analytics  ... Data-centric tasks  Publishing, preservation, annotation, crawling, search, retrieval ... 29/03/16 23Stefan Dietze
  • 24.  The Web: approx. 46.000.000.000.000 (46 trillion) Web pages indexed by Google vs  Linked Data: approx. 1000 datasets & 100 billion statements - different order of magnitude wrt scale & dynamics  Other „semantics“ (structured facts) on the Web? The Web as a knowledge base: semantics on the Web? 29/03/16 24Stefan Dietze
  • 25.  Embedded markup (RDFa, Microdata, Microformats) for interpretation of Web documents (search, retrieval)  Arbitrary vocabularies; schema.org used at scale: (700 classes, 1000 predicates)  Adoption on the Web: 26 % (2014 Google study of 12 bn Web pages)  “Web Data Commons” (Meusel & Paulheim [ISWC2014]) • Markup from Common Crawl (2.2 billion pages): 17 billion RDF quads • Markup in 26% of pages, 14% of PLDs in 2013 (increase from 6% in 2011)  Same order of magnitude as “the Web” Embedded semantics: Web page markup & schema.org <div itemscope itemtype ="http://schema.org/Movie"> <h1 itemprop="name">Forrest Gump</h1> <span>Actor: <span itemprop=„actor">Tom Hanks</span> <span itemprop="genre">Drama</span> ... </div> 29/03/16 25 RDF statements node1 actor _node-x node1 actor Robin Wright node1 genre Comedy node2 actor T. Hanks node2 distributed by Paramount Pic. node3 actor Tom Cruise node3 distributed by Paramount Pic. Stefan Dietze
  • 26. 29/03/16 26Stefan Dietze Characteristics Example Coreferences 18.000 results for <„Iphone 6“, type, s:Product> (8,6 quads on average) Redundancy <s, schema:name, „Iphone 6“> occuring 1000 times in WDC2013 Lack of links Largely unlinked entity descriptions / subgraphs Errors (typos & schema violations, see Meusel et al [ESWC2015]) Wrong namespaces, such as http://schma.org Undefined types & predicates: 9,7 % in WDC, less common than in LOD Confusion of datatype and object properties: <s1, s:publisher, „Springer“>, 24,35 % object property issues vs 8% in LOD Data property range violations: e.g. literals vs numbers (12,6% in WDC vs 4,6 in LOD) Using markup as global knowledge base - state of the art  Glimmer (http://glimmer.research.yahoo.com): entity retrieval (BM25F) on WDC dataset [Blanco, Mika & Vigna, ISWC2011]  Challenges: specific characteristics of markup data
  • 27.  Goal: obtaining entity summary (or entity-centric knowledge graph) for given query ?  Tasks: document annotation, knowledge base augmentation, semantic enrichments Using markup as global knowledge base/graph? Web page markup 29/03/16 27Stefan Dietze Query Nucleic Acids, type:(Article) Entity Summary/Graph Name Molecular structure of nucleic acids author James D. Watson Francis Crick publisher Nature datePublished 1953 Web crawls, WDC or large (domain-specific) crawls: e.g. publishers, universities, libraries etc
  • 28. Candidate Facts node1 name Molecular structure of nucleic acids node1 author James D. Watson node1 publisher Nature node1 datePublished 1956 node1 datePublished 1953 node2 name Francis Crick node2 name Cricks  Extract (domain-specific) knowledge bases and knowledge graphs for digital libraries  Experiments on WDC data: 87,6 % MAP, coverage: on average 57% additional facts compared to DBpedia Ongoing work: entity summarisation from markup data Query Nucleic Acids, type:(Article) 1. Retrieval 2. Fact selection Entity Summary/Graph Name Molecular structure of nucleic acids author James D. Watson Francis Crick publisher Nature datePublished 1953 29/03/16 28 New Queries James D. Watson, type:(Person) Francis Crick, type:(Person) Nature, type:(Organization) Stefan Dietze Web crawls, WDC or large (domain-specific) crawls: e.g. publishers, universities, libraries etc Web page markup (clustering, heuristics, trained classifier)
  • 29. 1 10 100 1000 10000 100000 1000000 10000000 1 51 101 151 201 count(log) PLD (ranked) # entities # statements Unprecedented source of bibliographic data  Metadata about scholarly articles (s:ScholarlyArticle): 6.793.764 quads, 1.184.623 entities, 429 distinct predicates (in WDC / 1 type alone)  Top 5 domains: Springer, MDPI, BMJ, diabetesjournals.org, mendeley.com, Biodiversitylibrary.org Domains, topics, disciplines?  Life Sciences and Computer Science predominant  Top-10 article titles  Most important publishers/journals, libraries represented => Domain-specific & targeted crawls = unprecedented source of data Embedded data for digital libraries / life sciences? 29/03/16 29Stefan Dietze
  • 30. Knowledge graphs and LD (Yago, Freebase, Pubmed, DBLP etc) Entity node1 name Molecular structure of nucleic acids node1 author James D. Watson node1 publisher Nature node1 datePublished 1956 node1 datePublished 1953 Future work: improving entity-centric tasks for digital libraries 29/03/16 30 Entity node2 name Francis Crick node2 name Cricks node2 born 1916 Stefan Dietze • Web data as knowledge resource • Background knowledge/structured data • Training data & ground truths • .... Embedded data Unstructured (Web) documents Linked Data Improving data-centric tasks for large (bibliographic/life sciences) corpora, eg LIVIVO • KB construction & augmentation • Document annotation • Entity recognition, disambiguation, interlinking • Search & retrieval ...
  • 31. Acknowledgements: team  Besnik Fetahu (L3S)  Ivana Marenzi (L3S)  Ujwal Gadiraju (L3S)  Eelco Herder (L3S)  Ran Yu (L3S)  Ricardo Kawase (L3S)  Pracheta Sahoo (L3S, IIT India)  Bernardo Pereira Nunes (L3S, PUC Rio) + external collaborators 29/03/16 31Stefan Dietze
  • 32. References (presented work) Dietze, S., Taibi, D., D’Aquin, M., Facilitating Scientometrics in Learning Analytics and Educational Data Mining - the LAK Dataset, Semantic Web Journal, 2016. Dietze, S., Kaldoudi, E., Dovrolis, E., Giordano, D., Spampinato, C., Hendrix, M., Protopsaltis, A., Taibi, D., Yu, H. Q. (2013), Socio- semantic Integration of Educational Resources – the Case of the mEducator Project, in Journal of Universal Computer Science (J.UCS), Vol. 19, No. 11, pp. 1543-1569. Dietze, S., Taibi, D., Yu, H. Q., Dovrolis, N., A Linked Dataset of Medical Educational Resources, British Journal of Educational Technology (BJET), Volume 46, Issue 5, pages 1123–1129, September 2015. Gadiraju, U., Demartini, G., Kawase, R., Dietze, S. Human beyond the Machine: Challenges and Opportunities of Microtask Crowdsourcing. In: IEEE Intelligent Systems, Volume 30 Issue 4 – Jul/Aug 2015. Gadiraju, U., Kawase, R., Dietze, S, Demartini, G., Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of Online Surveys. ACM CHI Conference on Human Factors in Computing Systems (CHI2015), April 18-23, Seoul, Korea. Fetahu, B., Gadiraju, U., Dietze, S., Improving Entity Retrieval on Structured Data, 14th International Semantic Web Conference (ISWC2014), Bethlehem, US, (2015). Fetahu, B., Dietze, S., Nunes, B. P., Casanova, M. A., Nejdl, W., A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles, 11th Extended Semantic Web Conference (ESWC2014), Crete, Greece, (2014). D’Aquin, M., Adamou, A., Dietze, S., Assessing the Educational Linked Data Landscape, ACM Web Science 2013 (WebSci2013), Paris, France, May 2013. Nunes, B. P., Dietze, S., Casanova, M.A., Kawase, R., Fetahu, B., Nejdl, W., Combining a co-occurrence-based and a semantic measure for entity linking, in: The Semantic Web: Semantics and Big Data, Proceedings of the 10th Extended Semantic Web Conference (ESWC2013), Lecture Notes in Computer Science Vol. 7882, Springer Berlin Heidelberg, 2013. http://www.stefandietze.net 29/03/16 32Stefan Dietze
  • 33. Selected related work Entity retrieval  Alberto Tonon, Gianluca Demartini, and Philippe Cudré-Mauroux. Combining Inverted Indices and Structured Search for Ad-hoc Object Retrieval. In: 35th Annual ACM SIGIR Conference (SIGIR 2012), Portland, Oregon, USA, August 2012.  Roi Blanco, Peter Mika, Sebastiano Vigna: Effective and Efficient Entity Search in RDF Data. International Semantic Web Conference (ISWC) 2011, pages 83-97. Embedded markups & Web Data Commons  Robert Meusel, Petar Petrovski, Christian Bizer: The WebDataCommons Microdata, RDFa and Microformat Dataset Series. Proceedings of the 13th International Semantic Web Conference (ISWC 2014), RBDS Track, Trentino, Italy, October 2014.  Robert Meusel and Heiko Paulheim: Heuristics for Fixing Common Errors in Deployed schema.org Microdata. Proceedings of the 12th Extended Semantic Web Conference (ESWC 2015), Portoroz, Slovenia, May 2015 Linked Data quality  Carlos Buil-Aranda, Aidan Hogan, Jürgen Umbrich Pierre-Yves Vandenbussch, SPARQL Web-Querying Infrastructure: Ready for Action?, International Semantic Web Conference 2013, (ISWC2013).  Paulheim H., Bizer, C., Type Inference on Noisy RDF Data, Semantic Web – ISWC 2013, Lecture Notes in Computer Science Volume 8218, 2013, pp 510-525  Hogan, A., Umbrich, J., Harth, A., Cyganiak, R., Polleres, A., Decker., S., An empirical survey of Linked Data conformance. Journal of Web Semantics 14, 2012 29/03/16 33Stefan Dietze
  • 34. Thank you 29/03/16 34Stefan Dietze • http://stefandietze.net • http://data.l3s.de • http://data.linkededucation.org/linkedup/catalog