LOD2 webinar series: Virtuoso by OpenLink Software
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
1. Creating Knowledge out of Interlinked Data
LOD2 Paris Meeting:
WP3 Overview
Knowledge Base Creation, Enrichment and Repair
Jens Lehmann
AKSW, Universität Leipzig
LOD2 Presentation . 02.09.2010 . Page http://lod2.eu
2. Creating Knowledge out of Interlinked Data
Outline
• General WP3 Overview (Jens Lehmann)
• WP structure
• Deliverables
• Progress
• Task 3.2 Report: NLP2RDF + NIF (Sebastian Hellmann)
LOD2 Event . 06.09.2010 . 2Page 2 http://lod2.eu
3. Creating Knowledge out of Interlinked Data
WP 3 Task Overview
• Research WP, 76 PMs, InfAI (37), NUIG (10), FUB (17),
OpenLink (5), Exalead (7)
• 3.1: Provenance-Aware Extraction of Linked Data from Existing
Structured Formats
• 3.2: Provenance-Aware Extraction of Linked Data from
Unstructured and Semi-Structured Sources
• 3.3: Knowledge Base Schema Enrichment
• 3.4: Knowledge Base Repair
• 3.5: Web Linkage Validator
LOD2 Event . 06.09.2010 . 3Page 3 http://lod2.eu
4. Creating Knowledge out of Interlinked Data
WP 3 Goals
• General Goal: creation, improvement, repair of knowledge bases
• Focus: very large knowledge bases, diverse knowledge,
web/linked data
• Refine existing (Virtuoso Sponger, RDF Views, Triplify, D2R)
triplification approaches
• Improve schema of knowledge based on data
• Fix problems in knowledge bases e.g. inconsistencies
• Techniques: Semi-automatic machine learning, ontology
debugging, NLP, shallow parsing etc.
LOD2 Event . 06.09.2010 . 4Page 4 http://lod2.eu
5. Creating Knowledge out of Interlinked Data
WP 3 Task 3.1
• Provenance-Aware Extraction of Linked Data from Existing
Structured Formats (spreadsheets, relational databases, CMS,
logs, XML documents)
• Partners: FUB, InfAI, OpenLink, Exalead
• Provide: process description + tools
• Standardisation of RDB2RDF mapping
• Draws on existing tools/frameworks:
• D2R (FUB)
• Triplify (InfAI)
• Virtuoso Sponger (OpenLink)
• Deliverables: State-of-the Art Report (M6), D2R release (M20),
Triplify release (M20)
LOD2 Event . 06.09.2010 . 5Page 5 http://lod2.eu
6. Creating Knowledge out of Interlinked Data
WP 3 Task 3.1 - Progress
• D2R Server MetaData Extension (allows adding licencing and
provenance output to D2R server)
• Deliverable 3.1.1 completed: state of the art report about
knowledge extraction from structured sources
• 200+ tools collected at http://data.lod2.eu/2011/tools/
• http://en.wikipedia.org/wiki/Knowledge_extraction created
• Addition of RDF2Triggers to RDF Views in Virtuoso: enables
materialisation and synchronisation of RDF views as physical
triples
• Virtuoso sponger cartridges extended
LOD2 Event . 06.09.2010 . 6Page 6 http://lod2.eu
7. Creating Knowledge out of Interlinked Data
WP 3 Task 3.2
• Provenance-Aware Extraction of Linked Data from Unstructured
and Semi-Structured Sources (HTML, PDF+ Office documents with
metadata)
• Partners: FUB, InfAI, OpenLink, Exalead
• NLP techniques / text understanding (combine approaches, not
invent them)
• Draws on existing tools:
• NLP2RDF (InfAI)
• Stanford Parser, ASV toolkit, Zemanta, Ontos API (all external)
• DBpedia (FUB, InfAI, OpenLink)
• Deliverables: NLP2RDF release (M8), DBpedia Live (M8), DBpedia
Framework Extension (M20)
LOD2 Event . 06.09.2010 . 7Page 7 http://lod2.eu
8. Creating Knowledge out of Interlinked Data
WP 3 Task 3.2 - Progress
• NLP2RDF + NIF: presented by Sebastian
• DBpedia Live:
• New server acquired
• Running at http://live.dbpedia.org/sparql/ (beta version)
• DBpedia I18N committee founded and multi-language support
extended
• DBpedia Spotlight released (http://dbpedia.org/spotlight): tool for
annotating mentions of DBpedia resources in text
LOD2 Event . 06.09.2010 . 8Page 8 http://lod2.eu
9. Creating Knowledge out of Interlinked Data
WP 3 Task 3.3
• Knowledge Base Schema Enrichment
• Partners: InfAI
• Suggests OWL Schema Axioms to Knowlege Base Maintainers
(Definitions, Super Classes, Disjointness)
• Tightly coupled to Task 3.4
• Adapts existing approaches to work with very large Linked Data
knowledge bases
• Uses DL-Learner (InfAI) and external ontology learning approaches
• Deliverables: Enrichment Method Report (M12), User Interface
(M24), Evaluation (M36)
LOD2 Event . 06.09.2010 . 9Page 9 http://lod2.eu
10. Creating Knowledge out of Interlinked Data
WP 3 Task 3.4
• Knowledge Base Repair
• Partners: InfAI, NUIG
• Fix inconsistent knowledge bases, unsatisfiable classes, (some)
modelling errors, (some) reasoning performance problems
• Draws on a lot of existing work in ontology debugging and
extends it to knowledge bases in the LOD cloud
• Related to quality measures in WP4
• Result: ORE tool (together with Task 3.3)
• Deliverables: Report on Modelling Errors/Problems (M6), 1st ORE
Release (M28), 2nd ORE Release (M40)
LOD2 Event . 06.09.2010 . 10
Page 10 http://lod2.eu
11. Creating Knowledge out of Interlinked Data
WP 3 Task 3.4 - Progress
• Google Code project for ORE (Ontology Repair and Enrichment)
tool started: http://code.google.com/p/ore/
• Domain http://ore-tool.net/ with basic instructions
• ORE 0.2 released (desktop version – web version in development
at http://web.ore-tool.net)
• ORE paper accepted at ISWC
• Deliverable 3.4.1 completed (state of the art report on
detectable errors in knowledge bases)
• Preliminary work on algorithms for supporting debugging
SPARQL endpoints and Linked Data
LOD2 Event . 06.09.2010 . 11
Page 11 http://lod2.eu
12. Creating Knowledge out of Interlinked Data
WP 3 Task 3.5
• Web Linkage Validator
• Partners: NUIG
• Tightly coupled to Task 4.2 (Unsupervised Interlinking)
• Creates linkage reports for knowledge base maintainers
• Could suggest to add further properties, more specific property
values, better specify classes/properties for knowledge base
entitites
• Deliverables: Initial Release (M18), LOD2 Stack Component
Release (M28)
LOD2 Event . 06.09.2010 . 12
Page 12 http://lod2.eu
13. Creating Knowledge out of Interlinked Data
Thanks for your attention!
LOD2 Presentation . 02.09.2010 . Page http://lod2.eu