SlideShare une entreprise Scribd logo
1  sur  1
Télécharger pour lire hors ligne
DRETA: EXTRACTING RDF FROM WIKITABLES
Emir Muñoz, Aidan Hogan, Alessandra Mileo
National University of Ireland, Galway

WIKITABLE SURVEY

MOTIVATION
TABLE
TAXONOMY:

DISTRIBUTIONS:

QUERY:
SELECT ?player
WHERE {
?player dbp:currentclub dbr:Manchester_United_F.C .
}
player
http://dbpedia.org/resource/David_de_Gea
http://dbpedia.org/resource/Rafael_Pereira_da_Silva_(footballer_born_1990)

RESULTS

http://dbpedia.org/resource/Patrice_Evra
….
http://dbpedia.org/resource/Fabio_Pereira_da_Silva
http://dbpedia.org/resource/Tom_Cleverley
http://dbpedia.org/resource/Darren_Fletcher

… INCOMPLETE RESULTS!

(1) EXTRACTED 34.9 MILLION UNIQUE & NOVEL TRIPLES
FROM 1.14 MILLION WIKITABLES
(8 MACHINES: 4GB RAM, 2.2 GHZ SINGLE CORE; 12 DAYS)

(2) INITIAL EVALUATION:

PROPOSAL

(MANUAL ANNOTATION; THREE JUDGES; 750 TRIPLES EACH)

http://dbpedia.org/resource/Manchester_United_F.C.
dbp:currentclub
http://dbpedia.org/resource/David_de_Gea

dbp:position
http://dbpedia.org/resource/Spain

http://dbpedia.org/resource/Goalkeeper_(association_football)

(3) MACHINE LEARNING CLASSIFIERS:
(CONSENSUS GOLD STANDARD; VARIETY OF FEATURES)

…
http://dbpedia.org/resource/Wayne_Rooney
dbo:birthPlace
dbp:position
http://dbpedia.org/resource/England

http://dbpedia.org/resource/Forward_(association_football)

…
http://dbpedia.org/resource/Fabio_Pereira_da_Silva

dbp:position
http://dbpedia.org/resource/Brazil

SUGGESTED
TRIPLES:

http://dbpedia.org/resource/Defender_(association_football)

(1) dbr:David_de_Gea dbo:birthPlace dbr:Spain .
(2) dbr:Fabio_Pereira_de_Silva dbo:birthPlace dbr:Brazil .
(3) dbr:Fabio_Pereira_de_Silva dbp:currentclub dbr:Manchester_United_F.C .

FROM 1.14 MILLION WIKITABLES:
BAGGING DECISION TREES:
SUPPORT VECTOR MACHINES:

1.14 MILLION WIKITABLES:

7.9 MILLION TRIPLES @81.5% PREC.
15.3 MILLION TRIPLES @72.4% PREC.

DEMO … http://emunoz.org/wikitables
Enabling Networked Knowledge

ACKNOWLEDGEMENTS: This work was funded in part by Science Foundation Ireland under Grant No. SFI/08/CE/I1380 (Lion-2).

Contenu connexe

Plus de Emir Muñoz

Reading Group 2014
Reading Group 2014Reading Group 2014
Reading Group 2014
Emir Muñoz
 
WikiTables DERI Talk
WikiTables DERI TalkWikiTables DERI Talk
WikiTables DERI Talk
Emir Muñoz
 

Plus de Emir Muñoz (11)

A Linked Data-Based Decision Tree Classifier to Review Movies
A Linked Data-Based Decision Tree Classifier to Review MoviesA Linked Data-Based Decision Tree Classifier to Review Movies
A Linked Data-Based Decision Tree Classifier to Review Movies
 
The Philosophical Aspects of Data Modelling
The Philosophical Aspects of Data ModellingThe Philosophical Aspects of Data Modelling
The Philosophical Aspects of Data Modelling
 
Web Intelligence - 2010
Web Intelligence - 2010Web Intelligence - 2010
Web Intelligence - 2010
 
μRaptor: A DOM-based system with appetite for hCard elements
μRaptor: A DOM-based system with appetite for hCard elementsμRaptor: A DOM-based system with appetite for hCard elements
μRaptor: A DOM-based system with appetite for hCard elements
 
Learning Content Patterns from Linked Data
Learning Content Patterns from Linked DataLearning Content Patterns from Linked Data
Learning Content Patterns from Linked Data
 
Claves XML: Una Implementación de Algoritmos de Implicación y Validación
Claves XML: Una Implementación de Algoritmos de Implicación y ValidaciónClaves XML: Una Implementación de Algoritmos de Implicación y Validación
Claves XML: Una Implementación de Algoritmos de Implicación y Validación
 
Using Linked Data to Mine RDF from Wikipedia's Tables
Using Linked Data to Mine RDF from Wikipedia's TablesUsing Linked Data to Mine RDF from Wikipedia's Tables
Using Linked Data to Mine RDF from Wikipedia's Tables
 
Reading Group 2014
Reading Group 2014Reading Group 2014
Reading Group 2014
 
Soft Cardinality Constraints on XML Data
Soft Cardinality Constraints on XML DataSoft Cardinality Constraints on XML Data
Soft Cardinality Constraints on XML Data
 
DEXA 2012 Talk
DEXA 2012 TalkDEXA 2012 Talk
DEXA 2012 Talk
 
WikiTables DERI Talk
WikiTables DERI TalkWikiTables DERI Talk
WikiTables DERI Talk
 

Dernier

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

DRETa: Extracting RDF From Wikitables

  • 1. DRETA: EXTRACTING RDF FROM WIKITABLES Emir Muñoz, Aidan Hogan, Alessandra Mileo National University of Ireland, Galway WIKITABLE SURVEY MOTIVATION TABLE TAXONOMY: DISTRIBUTIONS: QUERY: SELECT ?player WHERE { ?player dbp:currentclub dbr:Manchester_United_F.C . } player http://dbpedia.org/resource/David_de_Gea http://dbpedia.org/resource/Rafael_Pereira_da_Silva_(footballer_born_1990) RESULTS http://dbpedia.org/resource/Patrice_Evra …. http://dbpedia.org/resource/Fabio_Pereira_da_Silva http://dbpedia.org/resource/Tom_Cleverley http://dbpedia.org/resource/Darren_Fletcher … INCOMPLETE RESULTS! (1) EXTRACTED 34.9 MILLION UNIQUE & NOVEL TRIPLES FROM 1.14 MILLION WIKITABLES (8 MACHINES: 4GB RAM, 2.2 GHZ SINGLE CORE; 12 DAYS) (2) INITIAL EVALUATION: PROPOSAL (MANUAL ANNOTATION; THREE JUDGES; 750 TRIPLES EACH) http://dbpedia.org/resource/Manchester_United_F.C. dbp:currentclub http://dbpedia.org/resource/David_de_Gea dbp:position http://dbpedia.org/resource/Spain http://dbpedia.org/resource/Goalkeeper_(association_football) (3) MACHINE LEARNING CLASSIFIERS: (CONSENSUS GOLD STANDARD; VARIETY OF FEATURES) … http://dbpedia.org/resource/Wayne_Rooney dbo:birthPlace dbp:position http://dbpedia.org/resource/England http://dbpedia.org/resource/Forward_(association_football) … http://dbpedia.org/resource/Fabio_Pereira_da_Silva dbp:position http://dbpedia.org/resource/Brazil SUGGESTED TRIPLES: http://dbpedia.org/resource/Defender_(association_football) (1) dbr:David_de_Gea dbo:birthPlace dbr:Spain . (2) dbr:Fabio_Pereira_de_Silva dbo:birthPlace dbr:Brazil . (3) dbr:Fabio_Pereira_de_Silva dbp:currentclub dbr:Manchester_United_F.C . FROM 1.14 MILLION WIKITABLES: BAGGING DECISION TREES: SUPPORT VECTOR MACHINES: 1.14 MILLION WIKITABLES: 7.9 MILLION TRIPLES @81.5% PREC. 15.3 MILLION TRIPLES @72.4% PREC. DEMO … http://emunoz.org/wikitables Enabling Networked Knowledge ACKNOWLEDGEMENTS: This work was funded in part by Science Foundation Ireland under Grant No. SFI/08/CE/I1380 (Lion-2).