SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
Learning Conflict Resolution
Strategies for Cross-Language
Wikipedia Data Fusion
Volha Bryl, Christian Bizer
Data and Web Science Research Group
University of Mannheim
Germany
WebQuality @ WWW’2014, Seoul, Korea, April 7, 2014
Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 2
Outline
1. Motivation
• Linked Open Data fusion
• Wikipedia/DBpedia data fusion
2. Extracting provenance metadata
3. Data fusion with Sieve
4. Learning data fusion policies
5. Cross-language DBpedia use case
6. Conclusion
Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 3
Motivation: Linked Open Data Integration
• LOD – publishing and interlinking open datasets
on the web
• Tens of billions of facts (RDF subject-predicate-
object triples)
• Huge potential for applications
• Problem: varying quality, lack of data consistency
• Solution: data integration
• Our focus: data fusion
• Create a consistent representation of a real-world entity based on
multiple heterogeneous data sources
• Challenge: conflict resolution for improving data quality
fr
en
de
• Wikipedia contains lots of data conflicts across languages
• Improving its quality is crucial
• Identity resolution is solved by inter-language links
• Our focus is DBpedia: Wikipedia’s structured twin
Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 4
Motivation: Fusing Wikipedia Data
DBpedia: Wikipedia’s Structured Twin
• Extracts structured, multilingual, cross-domain knowledge from Wikipedia
• Crowd-sourced community project: http://dbpedia.org
• Provides querying and search capabilities over Wikipedia data
• Follows Linked Open Data principles
• Data is freely available, software is open-source
• Started in 2006, currently at version 3.9 (September 2013)
• 119 languages, 2.46 billion triples, 12.6 million unique things
• LOD cloud’s interlinking hub
Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 5
Seoul dbp-ont:populationTotal "10,447,719"@bg
"9,794,304"@ca
"10,400,000"@cs
"10,464,051"@el
"10,581,728"@en
"9,794,304"@eu
"10,464,051"@id
"14,794,304"@it
"10,528,774"@ko
"10,581,728"@pt
"10,464,051"@ru
"10,581,728"@sl
"24,500,000"@tr
● Querying DBpedia: What is the population of Seoul?
DBpedia: Data Conflicts
Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 6
Seoul dbp-ont:populationTotal "10,447,719"@bg
"9,794,304"@ca
"10,400,000"@cs
"10,464,051"@el
"10,581,728"@en
"9,794,304"@eu
"10,464,051"@id
"14,794,304"@it
"10,528,774"@ko
"10,581,728"@pt
"10,464,051"@ru
"10,581,728"@sl
"24,500,000"@tr
● Querying DBpedia: What is the population of Seoul?
DBpedia: Data Conflicts
Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 7
Edited 2012-09-28T11:59:15Z
Edited 2013-01-22T09:43:20Z
● Querying DBpedia: What is the population of Seoul?
DBpedia: Data Conflicts
Seoul dbp-ont:populationTotal "10,447,719"@bg
"9,794,304"@ca
"10,400,000"@cs
"10,464,051"@el
"10,581,728"@en
"9,794,304"@eu
"10,464,051"@id
"14,794,304"@it
"10,528,774"@ko
"10,581,728"@pt
"10,464,051"@ru
"10,581,728"@sl
"24,500,000"@tr
Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 8
Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 9
DBpedia: Extracting Provenance Metadata
● Provenance is crucial for data fusion
● Allows assessing data quality, e.g. decide whether the fact is up-to-date or
comes from the trusted source or author
● No provenance metadata provided for DBpedia at the moment
● Idea
● Extract provenance metadata from Wikipedia revision history
● Implementation
● https://github.com/VolhaBryl/DBpedia-provenance
● Extraction performed for 610K populated places in 10 languages
Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 10
DBpedia: Extracting Provenance Metadata
● Two types of metadata extracted: when and how many times, and by
whom the fact was edited
● Change history: last edit timestamp of a triple, number of edits
– Retrieved from Wikipedia revision dumps
● April 2013, corresponds to DBpedia 3.9 release
– Challenge: revision dumps are huge
● e.g. >6Tb for English, >2Tb for German
– We extracted metadata for geographical entities for 10 top languages
● 425K entities for English, ~150K for other languages
● Author: reputation-related metadata
– edit count, registration date, blocked or not, etc.
– Retrieved via MediaWiki API
Provenance metadata per triple
DBpedia: Extracting Provenance Metadata
Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 11
Edit traces
DBpedia: Extracting Provenance Metadata
Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 12
Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 13
Data Fusion with Sieve
● Our starting point
Sieve – Linked Data Quality Assessment and Fusion tool
http://sieve.wbsg.de/
● Functionality
● Input: RDF data + provenance metadata
● User creates an XML specification with
● quality assessment metrics (e.g. recency or trust in source)
● conflict resolution functions (e.g. vote, take maximum,
average, most recent, most trusted source)
● Fused dataset is produced
<dbp:Amsterdam> <dbp-ont:populationTotal> "820654" <en.wikipedia.org/wiki/Amsterdam:populationTotal:1> .
<dbp:Amsterdam> <dbp-ont:populationTotal> "790044" <ru.wikipedia.org/wiki/Amsterdam:populationTotal:1> .
<dbp:Amsterdam> <dbp-ont:populationTotal> "762" <es.wikipedia.org/wiki/Amsterdam:populationTotal:1> .
<dbp:Amsterdam> <dbp-ont:populationTotal> "57" <es.wikipedia.org/wiki/Amsterdam:populationTotal:2> .
<dbp:Amsterdam> <dbp-ont:populationTotal> "799406" <nl.wikipedia.org/wiki/Amsterdam:populationTotal:1> .
<dbp:Amsterdam> <dbp-ont:populationTotal> "1364422" <pt.wikipedia.org/wiki/Amsterdam:populationTotal:1> .
<dbp:Amsterdam> <dbp-ont:populationTotal> "758198" <ca.wikipedia.org/wiki/Amsterdam:populationTotal:1> .
<dbp:Amsterdam> <dbp-ont:populationTotal> "820654" <it.wikipedia.org/wiki/Amsterdam:populationTotal:1> .
<dbp:Vienna> <dbp-ont:populationTotal> "1731236" <en.wikipedia.org/wiki/Vienna:populationTotal:1> .
<dbp:Vienna> <dbp-ont:populationTotal> "1730278" <ru.wikipedia.org/wiki/Vienna:populationTotal:1> .
<dbp:Vienna> <dbp-ont:populationTotal> "1731236" <es.wikipedia.org/wiki/Vienna:populationTotal:1> .
<dbp:Vienna> <dbp-ont:populationTotal> "1680266" <ca.wikipedia.org/wiki/Vienna:populationTotal:1> .
<dbp:Vienna> <dbp-ont:populationTotal> "1731236" <it.wikipedia.org/wiki/Vienna:populationTotal:1> .
<dbp:Vienna> <dbp-ont:populationTotal> "1731286" <fr.wikipedia.org/wiki/Vienna:populationTotal:1> .
<dbp:Paris> <dbp-ont:populationTotal> "2234105" <en.wikipedia.org/wiki/Paris:populationTotal:1> .
<dbp:Paris> <dbp-ont:populationTotal> "2268265" <ru.wikipedia.org/wiki/Paris:populationTotal:1> .
<dbp:Paris> <dbp-ont:populationTotal> "2257981" <es.wikipedia.org/wiki/Paris:populationTotal:1> .
<dbp:Paris> <dbp-ont:populationTotal> "2257981" <nl.wikipedia.org/wiki/Paris:populationTotal:1> .
<dbp:Paris> <dbp-ont:populationTotal> "2257981" <nl.wikipedia.org/wiki/Paris:populationTotal:2> .
<dbp:Paris> <dbp-ont:populationTotal> "2211297" <pt.wikipedia.org/wiki/Paris:populationTotal:1> .
<dbp:Paris> <dbp-ont:populationTotal> "2257981" <it.wikipedia.org/wiki/Paris:populationTotal:1> .
<dbp:Paris> <dbp-ont:populationTotal> "2243833" <fr.wikipedia.org/wiki/Paris:populationTotal:1> .
<dbp:Paris> <dbp-ont:populationTotal> "10413386" <de.wikipedia.org/wiki/Paris:populationTotal:1> .
INPUT
23 values,
only 3 needed
Data Fusion with Sieve by Example
Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 14
Sieve specification: keep most recent population value
Data Fusion with Sieve by Example
Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 15
<dbp:Vienna> <dbp-ont:populationTotal> "1730278" <ru.wikipedia.org/wiki/Vienna:populationTotal:1> .
<dbp:Paris> <dbp-ont:populationTotal> "2257981" <nl.wikipedia.org/wiki/Paris:populationTotal:2> .
<dbp:Amsterdam> <dbp-ont:populationTotal> "57" <es.wikipedia.org/wiki/Amsterdam:populationTotal:2> .
RESULT (selected most recent population value)
Data Fusion with Sieve by Example
Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 16
<dbp:Vienna> <dbp-ont:populationTotal> "1730278" <ru.wikipedia.org/wiki/Vienna:populationTotal:1> .
<dbp:Paris> <dbp-ont:populationTotal> "2257981" <nl.wikipedia.org/wiki/Paris:populationTotal:2> .
<dbp:Amsterdam> <dbp-ont:populationTotal> "57" <es.wikipedia.org/wiki/Amsterdam:populationTotal:2> .
RESULT (selected most recent population value)
<dbp:Amsterdam> <dbp-ont:populationTotal> "820654" <en.wikipedia.org/wiki/Amsterdam:populationTotal:1> .
<dbp:Amsterdam> <dbp-ont:populationTotal> "790044" <ru.wikipedia.org/wiki/Amsterdam:populationTotal:1> .
<dbp:Amsterdam> <dbp-ont:populationTotal> "762" <es.wikipedia.org/wiki/Amsterdam:populationTotal:1> .
<dbp:Amsterdam> <dbp-ont:populationTotal> "57" <es.wikipedia.org/wiki/Amsterdam:populationTotal:2> .
<dbp:Amsterdam> <dbp-ont:populationTotal> "799406" <nl.wikipedia.org/wiki/Amsterdam:populationTotal:1> .
<dbp:Amsterdam> <dbp-ont:populationTotal> "1364422" <pt.wikipedia.org/wiki/Amsterdam:populationTotal:1> .
<dbp:Amsterdam> <dbp-ont:populationTotal> "758198" <ca.wikipedia.org/wiki/Amsterdam:populationTotal:1> .
<dbp:Amsterdam> <dbp-ont:populationTotal> "820654" <it.wikipedia.org/wiki/Amsterdam:populationTotal:1> .
Was it wrong to keep the most recent value?..
Data Fusion with Sieve by Example
Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 17
Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 18
Learning Conflict Resolution Strategies
• Problem
• In Sieve fusion functions for each property are manually defined
 good understanding of input data required
 optimal result is not guaranteed
• Solution
• Fusion Policy Learner
• Extension of Sieve for automatically learning conflict resolution
strategies based on a gold standard
• http://sieve.wbsg.de/FPL.html
Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 19
Learning Conflict Resolution Strategies
• Fusion Policy Learner
• Extension of Sieve for automatically learning conflict resolution
strategies based on a gold standard
• http://sieve.wbsg.de/FPL.html
• Learning algorithms
– Numeric properties
– Minimize the mean absolute error or
– Maximize the number of correct values
– Where a correct value deviates from the gold standard value by no more
than a predefined threshold (e.g. 5%)
– Nominal properties (strings or URIs)
– Maximize the number of exact matches
Learn which fusion functions to use
Input specification
Select one
of these functions
Based on the gold standard
Sieve Fusion Policy Learner: Example
Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 20
Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 21
Fusing DBpedia Data
● Top 10 DBpedia language editions, 610,017 entities
● 30% described in > 3 languages
● Gold standard: GeoNames - www.geonames.org
* selection method for population: minimize mean absolute error
Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 22
Fusing DBpedia Data
● Top 10 DBpedia language editions, 610,017 entities
● 30% described in > 3 languages
● Gold standard: GeoNames - www.geonames.org
* selection method for population: minimize mean absolute error
Based on provenance metadata
Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 23
Summary
• Motivation
• Data integration is crucial for boosting the quality and usage of LOD
• Objective
• Fusing Wikipedia/DBpedia data across languages
• Starting point
• Sieve, LOD quality assessment and fusion tool
• Results
• Fusion Policy Learner extension of Sieve for automatically learning
optimal conflict resolution strategies
• Fusing data about 610K populated places from 10 DBpedia language
editions
• Framework for DBpedia provenance metadata extraction
Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 24
Future Work
• Experimenting with other learning techniques
• regression for numerical values
• decision trees to learn complex fusion strategies,
e.g. choose the most recent among the most frequent values
• active learning when no or not enough labeled data available
• DBpedia use case
• Is there a cross-domain up-to-date gold standard?
• Gap filling, conflict resolution and data debugging on a large scale
• Other LOD use cases
• Allows DBpedia to be used for training

Contenu connexe

Tendances

Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Anja Jentzsch
 
Illuminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data SupportIlluminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data SupportPascal-Nicolas Becker
 
Cogapp Open Studios 2012 - Adventures with Linked Data
Cogapp Open Studios 2012 - Adventures with Linked DataCogapp Open Studios 2012 - Adventures with Linked Data
Cogapp Open Studios 2012 - Adventures with Linked DataCogapp
 
They have left the building: The Web Route to Library Users
They have left the building: The Web Route to Library UsersThey have left the building: The Web Route to Library Users
They have left the building: The Web Route to Library UsersRichard Wallis
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in LibrariesRichard Wallis
 
The Web of Data is Our Opportunity
The Web of Data is Our OpportunityThe Web of Data is Our Opportunity
The Web of Data is Our OpportunityRichard Wallis
 
semantic markup using schema.org
semantic markup using schema.orgsemantic markup using schema.org
semantic markup using schema.orgJoshua Shinavier
 
Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDMartin Klein
 
London HUG
London HUGLondon HUG
London HUGBoudicca
 
Contextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data FoundationContextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data FoundationRichard Wallis
 
Visualising the Australian open data and research data landscape
Visualising the Australian open data and research data landscapeVisualising the Australian open data and research data landscape
Visualising the Australian open data and research data landscapeJonathan Yu
 
Telling the World and Our Users What We Have
Telling the World and Our Users What We HaveTelling the World and Our Users What We Have
Telling the World and Our Users What We HaveRichard Wallis
 
How much does $1.7 billion buy?
How much does $1.7 billion buy?How much does $1.7 billion buy?
How much does $1.7 billion buy?Martin Klein
 
nanopub-java: A Java Library for Nanopublications
nanopub-java: A Java Library for Nanopublicationsnanopub-java: A Java Library for Nanopublications
nanopub-java: A Java Library for NanopublicationsTobias Kuhn
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web ArchivesMichael Nelson
 
Schema.org - An Extending Influence
Schema.org - An Extending InfluenceSchema.org - An Extending Influence
Schema.org - An Extending InfluenceRichard Wallis
 

Tendances (20)

Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
 
Signposting Overview
Signposting OverviewSignposting Overview
Signposting Overview
 
Illuminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data SupportIlluminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data Support
 
Cogapp Open Studios 2012 - Adventures with Linked Data
Cogapp Open Studios 2012 - Adventures with Linked DataCogapp Open Studios 2012 - Adventures with Linked Data
Cogapp Open Studios 2012 - Adventures with Linked Data
 
Probabilistic Topic models
Probabilistic Topic modelsProbabilistic Topic models
Probabilistic Topic models
 
They have left the building: The Web Route to Library Users
They have left the building: The Web Route to Library UsersThey have left the building: The Web Route to Library Users
They have left the building: The Web Route to Library Users
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in Libraries
 
The Web of Data is Our Opportunity
The Web of Data is Our OpportunityThe Web of Data is Our Opportunity
The Web of Data is Our Opportunity
 
semantic markup using schema.org
semantic markup using schema.orgsemantic markup using schema.org
semantic markup using schema.org
 
Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCID
 
London HUG
London HUGLondon HUG
London HUG
 
Graph databases
Graph databasesGraph databases
Graph databases
 
Contextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data FoundationContextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data Foundation
 
Visualising the Australian open data and research data landscape
Visualising the Australian open data and research data landscapeVisualising the Australian open data and research data landscape
Visualising the Australian open data and research data landscape
 
Telling the World and Our Users What We Have
Telling the World and Our Users What We HaveTelling the World and Our Users What We Have
Telling the World and Our Users What We Have
 
How much does $1.7 billion buy?
How much does $1.7 billion buy?How much does $1.7 billion buy?
How much does $1.7 billion buy?
 
nanopub-java: A Java Library for Nanopublications
nanopub-java: A Java Library for Nanopublicationsnanopub-java: A Java Library for Nanopublications
nanopub-java: A Java Library for Nanopublications
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
 
Schema.org - An Extending Influence
Schema.org - An Extending InfluenceSchema.org - An Extending Influence
Schema.org - An Extending Influence
 
PID Signposting Pattern
PID Signposting PatternPID Signposting Pattern
PID Signposting Pattern
 

En vedette

Chapter 4: Northern Ireland - Causes and Impacts
Chapter 4: Northern Ireland - Causes and ImpactsChapter 4: Northern Ireland - Causes and Impacts
Chapter 4: Northern Ireland - Causes and ImpactsGoh Bang Rui
 
strategies of conflict resolution
strategies of conflict resolutionstrategies of conflict resolution
strategies of conflict resolutionShahirah Zafirah
 
Sec3 chapter4 conflict in multi-ethnic societies (sri lanka)_slideshare
Sec3 chapter4 conflict in multi-ethnic societies (sri lanka)_slideshareSec3 chapter4 conflict in multi-ethnic societies (sri lanka)_slideshare
Sec3 chapter4 conflict in multi-ethnic societies (sri lanka)_slideshareAdrian Peeris
 
Consequences Of Conflict In Sri Lanka
Consequences Of Conflict In Sri LankaConsequences Of Conflict In Sri Lanka
Consequences Of Conflict In Sri Lankamissfoo
 
Impact Of Conflict In Northern Ireland
Impact Of Conflict In Northern IrelandImpact Of Conflict In Northern Ireland
Impact Of Conflict In Northern Irelandmissfoo
 
02a types of international conflict
02a types of international conflict02a types of international conflict
02a types of international conflictfatima d
 
The technology zeitgeist
The technology zeitgeistThe technology zeitgeist
The technology zeitgeistMartin Geddes
 
Streamlining MT for Asian Languages, by Natsuki Wakabayashi, ISE and Tetsuzo...
 Streamlining MT for Asian Languages, by Natsuki Wakabayashi, ISE and Tetsuzo... Streamlining MT for Asian Languages, by Natsuki Wakabayashi, ISE and Tetsuzo...
Streamlining MT for Asian Languages, by Natsuki Wakabayashi, ISE and Tetsuzo...TAUS - The Language Data Network
 
CAMPO DUNAR DE CONCÓN , GESTION DE PROYECTOS
CAMPO DUNAR DE CONCÓN , GESTION DE PROYECTOSCAMPO DUNAR DE CONCÓN , GESTION DE PROYECTOS
CAMPO DUNAR DE CONCÓN , GESTION DE PROYECTOSVictor Orellana Fredes
 
Cypress/VSAC Presentation at HIMSS13
Cypress/VSAC Presentation at HIMSS13Cypress/VSAC Presentation at HIMSS13
Cypress/VSAC Presentation at HIMSS13Saul Kravitz
 
"Basket-case" to Miracle? Bangladesh 1971-2021, June 2013
"Basket-case" to Miracle?  Bangladesh 1971-2021,  June 2013"Basket-case" to Miracle?  Bangladesh 1971-2021,  June 2013
"Basket-case" to Miracle? Bangladesh 1971-2021, June 2013Robert C. Terry
 
Aasav wines - business plan - winery - entrepreneurship - 2009
Aasav wines - business plan - winery - entrepreneurship - 2009Aasav wines - business plan - winery - entrepreneurship - 2009
Aasav wines - business plan - winery - entrepreneurship - 2009Sandeep Vadnere
 
01 Why Belize - Our Story and Vision - Mayan Plantation, Belize
01 Why Belize - Our Story and Vision - Mayan Plantation,  Belize01 Why Belize - Our Story and Vision - Mayan Plantation,  Belize
01 Why Belize - Our Story and Vision - Mayan Plantation, BelizeGerhart W. Walch, AMDP
 
Características del Cerrajero Profesional
Características del Cerrajero ProfesionalCaracterísticas del Cerrajero Profesional
Características del Cerrajero Profesionalsovillegasc
 
2010 January 2010 The Inside Pitch
2010 January 2010 The Inside Pitch2010 January 2010 The Inside Pitch
2010 January 2010 The Inside PitchCentral Virginia ASA
 
Tour fotográfico
Tour fotográficoTour fotográfico
Tour fotográficomyrmulrom
 
Avances y desafiìos en el cultivo de embriones de rumiantes final
Avances y desafiìos en el cultivo de embriones de rumiantes finalAvances y desafiìos en el cultivo de embriones de rumiantes final
Avances y desafiìos en el cultivo de embriones de rumiantes finalAlfredo Chica Arrieta
 

En vedette (20)

Chapter 4: Northern Ireland - Causes and Impacts
Chapter 4: Northern Ireland - Causes and ImpactsChapter 4: Northern Ireland - Causes and Impacts
Chapter 4: Northern Ireland - Causes and Impacts
 
strategies of conflict resolution
strategies of conflict resolutionstrategies of conflict resolution
strategies of conflict resolution
 
Sec3 chapter4 conflict in multi-ethnic societies (sri lanka)_slideshare
Sec3 chapter4 conflict in multi-ethnic societies (sri lanka)_slideshareSec3 chapter4 conflict in multi-ethnic societies (sri lanka)_slideshare
Sec3 chapter4 conflict in multi-ethnic societies (sri lanka)_slideshare
 
Consequences Of Conflict In Sri Lanka
Consequences Of Conflict In Sri LankaConsequences Of Conflict In Sri Lanka
Consequences Of Conflict In Sri Lanka
 
Impact Of Conflict In Northern Ireland
Impact Of Conflict In Northern IrelandImpact Of Conflict In Northern Ireland
Impact Of Conflict In Northern Ireland
 
02a types of international conflict
02a types of international conflict02a types of international conflict
02a types of international conflict
 
Causes of the Conflict in Sri Lanka
Causes of the Conflict in Sri LankaCauses of the Conflict in Sri Lanka
Causes of the Conflict in Sri Lanka
 
Chapter 4 Ethnic Conflict
Chapter 4 Ethnic ConflictChapter 4 Ethnic Conflict
Chapter 4 Ethnic Conflict
 
The technology zeitgeist
The technology zeitgeistThe technology zeitgeist
The technology zeitgeist
 
Streamlining MT for Asian Languages, by Natsuki Wakabayashi, ISE and Tetsuzo...
 Streamlining MT for Asian Languages, by Natsuki Wakabayashi, ISE and Tetsuzo... Streamlining MT for Asian Languages, by Natsuki Wakabayashi, ISE and Tetsuzo...
Streamlining MT for Asian Languages, by Natsuki Wakabayashi, ISE and Tetsuzo...
 
CAMPO DUNAR DE CONCÓN , GESTION DE PROYECTOS
CAMPO DUNAR DE CONCÓN , GESTION DE PROYECTOSCAMPO DUNAR DE CONCÓN , GESTION DE PROYECTOS
CAMPO DUNAR DE CONCÓN , GESTION DE PROYECTOS
 
Cypress/VSAC Presentation at HIMSS13
Cypress/VSAC Presentation at HIMSS13Cypress/VSAC Presentation at HIMSS13
Cypress/VSAC Presentation at HIMSS13
 
"Basket-case" to Miracle? Bangladesh 1971-2021, June 2013
"Basket-case" to Miracle?  Bangladesh 1971-2021,  June 2013"Basket-case" to Miracle?  Bangladesh 1971-2021,  June 2013
"Basket-case" to Miracle? Bangladesh 1971-2021, June 2013
 
Aasav wines - business plan - winery - entrepreneurship - 2009
Aasav wines - business plan - winery - entrepreneurship - 2009Aasav wines - business plan - winery - entrepreneurship - 2009
Aasav wines - business plan - winery - entrepreneurship - 2009
 
01 Why Belize - Our Story and Vision - Mayan Plantation, Belize
01 Why Belize - Our Story and Vision - Mayan Plantation,  Belize01 Why Belize - Our Story and Vision - Mayan Plantation,  Belize
01 Why Belize - Our Story and Vision - Mayan Plantation, Belize
 
Características del Cerrajero Profesional
Características del Cerrajero ProfesionalCaracterísticas del Cerrajero Profesional
Características del Cerrajero Profesional
 
2010 January 2010 The Inside Pitch
2010 January 2010 The Inside Pitch2010 January 2010 The Inside Pitch
2010 January 2010 The Inside Pitch
 
Tour fotográfico
Tour fotográficoTour fotográfico
Tour fotográfico
 
CURRICULUM VITAE with SIGNATURE - UPDATED 11.29.16
CURRICULUM VITAE with SIGNATURE - UPDATED 11.29.16CURRICULUM VITAE with SIGNATURE - UPDATED 11.29.16
CURRICULUM VITAE with SIGNATURE - UPDATED 11.29.16
 
Avances y desafiìos en el cultivo de embriones de rumiantes final
Avances y desafiìos en el cultivo de embriones de rumiantes finalAvances y desafiìos en el cultivo de embriones de rumiantes final
Avances y desafiìos en el cultivo de embriones de rumiantes final
 

Similaire à Learning Conflict Resolution Strategies for Cross-Language Wikipedia Data Fusion

DBpedia - An Interlinking Hub in the Web of Data
DBpedia - An Interlinking Hub in the Web of DataDBpedia - An Interlinking Hub in the Web of Data
DBpedia - An Interlinking Hub in the Web of DataChris Bizer
 
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinDBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinAnja Jentzsch
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeEdward Baker
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeVince Smith
 
DBpedia talk at Fjord Berlin
DBpedia talk at Fjord BerlinDBpedia talk at Fjord Berlin
DBpedia talk at Fjord BerlinGeorgi Kobilarov
 
“Library 2.0: Let's get connected!”
“Library 2.0: Let's get connected!”“Library 2.0: Let's get connected!”
“Library 2.0: Let's get connected!”bridgingworlds2008
 
Lifting the Lid on Linked Data
Lifting the Lid on Linked DataLifting the Lid on Linked Data
Lifting the Lid on Linked DataJane Stevenson
 
Gathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesGathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesHeiko Paulheim
 
Sw 3 bizer etal-d bpedia-crystallization-point-jws-preprint
Sw 3 bizer etal-d bpedia-crystallization-point-jws-preprintSw 3 bizer etal-d bpedia-crystallization-point-jws-preprint
Sw 3 bizer etal-d bpedia-crystallization-point-jws-preprintokeee
 
Introduction to Linked Data
Introduction to Linked DataIntroduction to Linked Data
Introduction to Linked DataThomas Meehan
 
Das Semantische Daten Web für Unternehmen
Das Semantische Daten Web für UnternehmenDas Semantische Daten Web für Unternehmen
Das Semantische Daten Web für UnternehmenSören Auer
 
Creating Visualizations with Linked Open Data
Creating Visualizations with Linked Open DataCreating Visualizations with Linked Open Data
Creating Visualizations with Linked Open DataAlvaro Graves
 
Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?OCLC
 
The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?Anna Fensel
 
DBpedia 2014: Highlights and Issues of the New Release
DBpedia 2014: Highlights and Issues of the New ReleaseDBpedia 2014: Highlights and Issues of the New Release
DBpedia 2014: Highlights and Issues of the New ReleaseVolha Bryl
 
Methodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataMethodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataBoris Villazón-Terrazas
 

Similaire à Learning Conflict Resolution Strategies for Cross-Language Wikipedia Data Fusion (20)

DBpedia - An Interlinking Hub in the Web of Data
DBpedia - An Interlinking Hub in the Web of DataDBpedia - An Interlinking Hub in the Web of Data
DBpedia - An Interlinking Hub in the Web of Data
 
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinDBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 
Linked Data
Linked DataLinked Data
Linked Data
 
The Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of LeipzigThe Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of Leipzig
 
DBpedia talk at Fjord Berlin
DBpedia talk at Fjord BerlinDBpedia talk at Fjord Berlin
DBpedia talk at Fjord Berlin
 
“Library 2.0: Let's get connected!”
“Library 2.0: Let's get connected!”“Library 2.0: Let's get connected!”
“Library 2.0: Let's get connected!”
 
Lifting the Lid on Linked Data
Lifting the Lid on Linked DataLifting the Lid on Linked Data
Lifting the Lid on Linked Data
 
Gathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesGathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia Entities
 
Sw 3 bizer etal-d bpedia-crystallization-point-jws-preprint
Sw 3 bizer etal-d bpedia-crystallization-point-jws-preprintSw 3 bizer etal-d bpedia-crystallization-point-jws-preprint
Sw 3 bizer etal-d bpedia-crystallization-point-jws-preprint
 
20140521 sem-tech-biz-guest-lecture
20140521 sem-tech-biz-guest-lecture20140521 sem-tech-biz-guest-lecture
20140521 sem-tech-biz-guest-lecture
 
Introduction to Linked Data
Introduction to Linked DataIntroduction to Linked Data
Introduction to Linked Data
 
Das Semantische Daten Web für Unternehmen
Das Semantische Daten Web für UnternehmenDas Semantische Daten Web für Unternehmen
Das Semantische Daten Web für Unternehmen
 
Creating Visualizations with Linked Open Data
Creating Visualizations with Linked Open DataCreating Visualizations with Linked Open Data
Creating Visualizations with Linked Open Data
 
Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?
 
The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?
 
DBpedia 2014: Highlights and Issues of the New Release
DBpedia 2014: Highlights and Issues of the New ReleaseDBpedia 2014: Highlights and Issues of the New Release
DBpedia 2014: Highlights and Issues of the New Release
 
Methodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataMethodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked Data
 

Dernier

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 

Dernier (20)

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 

Learning Conflict Resolution Strategies for Cross-Language Wikipedia Data Fusion

  • 1. Learning Conflict Resolution Strategies for Cross-Language Wikipedia Data Fusion Volha Bryl, Christian Bizer Data and Web Science Research Group University of Mannheim Germany WebQuality @ WWW’2014, Seoul, Korea, April 7, 2014
  • 2. Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 2 Outline 1. Motivation • Linked Open Data fusion • Wikipedia/DBpedia data fusion 2. Extracting provenance metadata 3. Data fusion with Sieve 4. Learning data fusion policies 5. Cross-language DBpedia use case 6. Conclusion
  • 3. Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 3 Motivation: Linked Open Data Integration • LOD – publishing and interlinking open datasets on the web • Tens of billions of facts (RDF subject-predicate- object triples) • Huge potential for applications • Problem: varying quality, lack of data consistency • Solution: data integration • Our focus: data fusion • Create a consistent representation of a real-world entity based on multiple heterogeneous data sources • Challenge: conflict resolution for improving data quality
  • 4. fr en de • Wikipedia contains lots of data conflicts across languages • Improving its quality is crucial • Identity resolution is solved by inter-language links • Our focus is DBpedia: Wikipedia’s structured twin Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 4 Motivation: Fusing Wikipedia Data
  • 5. DBpedia: Wikipedia’s Structured Twin • Extracts structured, multilingual, cross-domain knowledge from Wikipedia • Crowd-sourced community project: http://dbpedia.org • Provides querying and search capabilities over Wikipedia data • Follows Linked Open Data principles • Data is freely available, software is open-source • Started in 2006, currently at version 3.9 (September 2013) • 119 languages, 2.46 billion triples, 12.6 million unique things • LOD cloud’s interlinking hub Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 5
  • 8. Edited 2012-09-28T11:59:15Z Edited 2013-01-22T09:43:20Z ● Querying DBpedia: What is the population of Seoul? DBpedia: Data Conflicts Seoul dbp-ont:populationTotal "10,447,719"@bg "9,794,304"@ca "10,400,000"@cs "10,464,051"@el "10,581,728"@en "9,794,304"@eu "10,464,051"@id "14,794,304"@it "10,528,774"@ko "10,581,728"@pt "10,464,051"@ru "10,581,728"@sl "24,500,000"@tr Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 8
  • 9. Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 9 DBpedia: Extracting Provenance Metadata ● Provenance is crucial for data fusion ● Allows assessing data quality, e.g. decide whether the fact is up-to-date or comes from the trusted source or author ● No provenance metadata provided for DBpedia at the moment ● Idea ● Extract provenance metadata from Wikipedia revision history ● Implementation ● https://github.com/VolhaBryl/DBpedia-provenance ● Extraction performed for 610K populated places in 10 languages
  • 10. Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 10 DBpedia: Extracting Provenance Metadata ● Two types of metadata extracted: when and how many times, and by whom the fact was edited ● Change history: last edit timestamp of a triple, number of edits – Retrieved from Wikipedia revision dumps ● April 2013, corresponds to DBpedia 3.9 release – Challenge: revision dumps are huge ● e.g. >6Tb for English, >2Tb for German – We extracted metadata for geographical entities for 10 top languages ● 425K entities for English, ~150K for other languages ● Author: reputation-related metadata – edit count, registration date, blocked or not, etc. – Retrieved via MediaWiki API
  • 11. Provenance metadata per triple DBpedia: Extracting Provenance Metadata Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 11
  • 12. Edit traces DBpedia: Extracting Provenance Metadata Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 12
  • 13. Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 13 Data Fusion with Sieve ● Our starting point Sieve – Linked Data Quality Assessment and Fusion tool http://sieve.wbsg.de/ ● Functionality ● Input: RDF data + provenance metadata ● User creates an XML specification with ● quality assessment metrics (e.g. recency or trust in source) ● conflict resolution functions (e.g. vote, take maximum, average, most recent, most trusted source) ● Fused dataset is produced
  • 14. <dbp:Amsterdam> <dbp-ont:populationTotal> "820654" <en.wikipedia.org/wiki/Amsterdam:populationTotal:1> . <dbp:Amsterdam> <dbp-ont:populationTotal> "790044" <ru.wikipedia.org/wiki/Amsterdam:populationTotal:1> . <dbp:Amsterdam> <dbp-ont:populationTotal> "762" <es.wikipedia.org/wiki/Amsterdam:populationTotal:1> . <dbp:Amsterdam> <dbp-ont:populationTotal> "57" <es.wikipedia.org/wiki/Amsterdam:populationTotal:2> . <dbp:Amsterdam> <dbp-ont:populationTotal> "799406" <nl.wikipedia.org/wiki/Amsterdam:populationTotal:1> . <dbp:Amsterdam> <dbp-ont:populationTotal> "1364422" <pt.wikipedia.org/wiki/Amsterdam:populationTotal:1> . <dbp:Amsterdam> <dbp-ont:populationTotal> "758198" <ca.wikipedia.org/wiki/Amsterdam:populationTotal:1> . <dbp:Amsterdam> <dbp-ont:populationTotal> "820654" <it.wikipedia.org/wiki/Amsterdam:populationTotal:1> . <dbp:Vienna> <dbp-ont:populationTotal> "1731236" <en.wikipedia.org/wiki/Vienna:populationTotal:1> . <dbp:Vienna> <dbp-ont:populationTotal> "1730278" <ru.wikipedia.org/wiki/Vienna:populationTotal:1> . <dbp:Vienna> <dbp-ont:populationTotal> "1731236" <es.wikipedia.org/wiki/Vienna:populationTotal:1> . <dbp:Vienna> <dbp-ont:populationTotal> "1680266" <ca.wikipedia.org/wiki/Vienna:populationTotal:1> . <dbp:Vienna> <dbp-ont:populationTotal> "1731236" <it.wikipedia.org/wiki/Vienna:populationTotal:1> . <dbp:Vienna> <dbp-ont:populationTotal> "1731286" <fr.wikipedia.org/wiki/Vienna:populationTotal:1> . <dbp:Paris> <dbp-ont:populationTotal> "2234105" <en.wikipedia.org/wiki/Paris:populationTotal:1> . <dbp:Paris> <dbp-ont:populationTotal> "2268265" <ru.wikipedia.org/wiki/Paris:populationTotal:1> . <dbp:Paris> <dbp-ont:populationTotal> "2257981" <es.wikipedia.org/wiki/Paris:populationTotal:1> . <dbp:Paris> <dbp-ont:populationTotal> "2257981" <nl.wikipedia.org/wiki/Paris:populationTotal:1> . <dbp:Paris> <dbp-ont:populationTotal> "2257981" <nl.wikipedia.org/wiki/Paris:populationTotal:2> . <dbp:Paris> <dbp-ont:populationTotal> "2211297" <pt.wikipedia.org/wiki/Paris:populationTotal:1> . <dbp:Paris> <dbp-ont:populationTotal> "2257981" <it.wikipedia.org/wiki/Paris:populationTotal:1> . <dbp:Paris> <dbp-ont:populationTotal> "2243833" <fr.wikipedia.org/wiki/Paris:populationTotal:1> . <dbp:Paris> <dbp-ont:populationTotal> "10413386" <de.wikipedia.org/wiki/Paris:populationTotal:1> . INPUT 23 values, only 3 needed Data Fusion with Sieve by Example Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 14
  • 15. Sieve specification: keep most recent population value Data Fusion with Sieve by Example Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 15
  • 16. <dbp:Vienna> <dbp-ont:populationTotal> "1730278" <ru.wikipedia.org/wiki/Vienna:populationTotal:1> . <dbp:Paris> <dbp-ont:populationTotal> "2257981" <nl.wikipedia.org/wiki/Paris:populationTotal:2> . <dbp:Amsterdam> <dbp-ont:populationTotal> "57" <es.wikipedia.org/wiki/Amsterdam:populationTotal:2> . RESULT (selected most recent population value) Data Fusion with Sieve by Example Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 16
  • 17. <dbp:Vienna> <dbp-ont:populationTotal> "1730278" <ru.wikipedia.org/wiki/Vienna:populationTotal:1> . <dbp:Paris> <dbp-ont:populationTotal> "2257981" <nl.wikipedia.org/wiki/Paris:populationTotal:2> . <dbp:Amsterdam> <dbp-ont:populationTotal> "57" <es.wikipedia.org/wiki/Amsterdam:populationTotal:2> . RESULT (selected most recent population value) <dbp:Amsterdam> <dbp-ont:populationTotal> "820654" <en.wikipedia.org/wiki/Amsterdam:populationTotal:1> . <dbp:Amsterdam> <dbp-ont:populationTotal> "790044" <ru.wikipedia.org/wiki/Amsterdam:populationTotal:1> . <dbp:Amsterdam> <dbp-ont:populationTotal> "762" <es.wikipedia.org/wiki/Amsterdam:populationTotal:1> . <dbp:Amsterdam> <dbp-ont:populationTotal> "57" <es.wikipedia.org/wiki/Amsterdam:populationTotal:2> . <dbp:Amsterdam> <dbp-ont:populationTotal> "799406" <nl.wikipedia.org/wiki/Amsterdam:populationTotal:1> . <dbp:Amsterdam> <dbp-ont:populationTotal> "1364422" <pt.wikipedia.org/wiki/Amsterdam:populationTotal:1> . <dbp:Amsterdam> <dbp-ont:populationTotal> "758198" <ca.wikipedia.org/wiki/Amsterdam:populationTotal:1> . <dbp:Amsterdam> <dbp-ont:populationTotal> "820654" <it.wikipedia.org/wiki/Amsterdam:populationTotal:1> . Was it wrong to keep the most recent value?.. Data Fusion with Sieve by Example Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 17
  • 18. Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 18 Learning Conflict Resolution Strategies • Problem • In Sieve fusion functions for each property are manually defined  good understanding of input data required  optimal result is not guaranteed • Solution • Fusion Policy Learner • Extension of Sieve for automatically learning conflict resolution strategies based on a gold standard • http://sieve.wbsg.de/FPL.html
  • 19. Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 19 Learning Conflict Resolution Strategies • Fusion Policy Learner • Extension of Sieve for automatically learning conflict resolution strategies based on a gold standard • http://sieve.wbsg.de/FPL.html • Learning algorithms – Numeric properties – Minimize the mean absolute error or – Maximize the number of correct values – Where a correct value deviates from the gold standard value by no more than a predefined threshold (e.g. 5%) – Nominal properties (strings or URIs) – Maximize the number of exact matches
  • 20. Learn which fusion functions to use Input specification Select one of these functions Based on the gold standard Sieve Fusion Policy Learner: Example Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 20
  • 21. Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 21 Fusing DBpedia Data ● Top 10 DBpedia language editions, 610,017 entities ● 30% described in > 3 languages ● Gold standard: GeoNames - www.geonames.org * selection method for population: minimize mean absolute error
  • 22. Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 22 Fusing DBpedia Data ● Top 10 DBpedia language editions, 610,017 entities ● 30% described in > 3 languages ● Gold standard: GeoNames - www.geonames.org * selection method for population: minimize mean absolute error Based on provenance metadata
  • 23. Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 23 Summary • Motivation • Data integration is crucial for boosting the quality and usage of LOD • Objective • Fusing Wikipedia/DBpedia data across languages • Starting point • Sieve, LOD quality assessment and fusion tool • Results • Fusion Policy Learner extension of Sieve for automatically learning optimal conflict resolution strategies • Fusing data about 610K populated places from 10 DBpedia language editions • Framework for DBpedia provenance metadata extraction
  • 24. Cross-Language Wikipedia Data Fusion, Volha Bryl, Chris Bizer 24 Future Work • Experimenting with other learning techniques • regression for numerical values • decision trees to learn complex fusion strategies, e.g. choose the most recent among the most frequent values • active learning when no or not enough labeled data available • DBpedia use case • Is there a cross-domain up-to-date gold standard? • Gap filling, conflict resolution and data debugging on a large scale • Other LOD use cases • Allows DBpedia to be used for training