SlideShare a Scribd company logo
1 of 28
Download to read offline
Extending DBpedia (LOD) using
         WikiTables

              Emir Muñoz
   Unit for Reasoning and Querying
         emir.munoz@deri.org
Linked Open Data




Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

                                                      October 12, 2012 -- E. Muñoz
Linked Open Data

• DBpedia, an export of Wikipedia’s structured data




DBpedia provides RDF version of all wikipedia structured data (infoboxes)



                                October 12, 2012 -- E. Muñoz
Linked Open Data

• DBpedia, an export of Wikipedia’s structured data




DBpedia provides RDF version of all wikipedia structured data (infoboxes)

        But not yet a version of all normal Wikipedia tables or wikitables

                                October 12, 2012 -- E. Muñoz
Tables as a source of LOD
      Tables are inherently concise                                            Infoboxes
       as well as information rich                                            (attr-value)


   The values                  Column header represents
    represent                     types of information                               Caption as
instances of that                                                                   another row
      types




     http://en.wikipedia.org/wiki/Dublin

                                                                    http://en.wikipedia.org/wiki/Galway


                                           October 12, 2012 -- E. Muñoz
Reasoning over Wikipedia Tables

   Recovering Table Semantics …
Dublin is twinned with the following places:
                                                                  http://en.wikipedia.org/wiki/Dublin




                                   October 12, 2012 -- E. Muñoz
Reasoning over Wikipedia Tables

 Entity annotation for cells, mappings to DBpedia resources
                                                                                   http://en.wikipedia.org/wiki/Dublin

       dbpedia.org/property/city                     dbpedia.org/property/nation                 dbpedia.org/property/since

dbpedia.org/resource/San_Jose,_California         dbpedia.org/resource/United_States


     dbpedia.org/resource/Liverpool              dbpedia.org/resource/United_Kingdom

 dbpedia.org/resource/Matsue,_Shimane                 dbpedia.org/resource/Japan


    dbpedia.org/resource/Barcelona                    dbpedia.org/resource/Spain

      dbpedia.org/resource/Beijing          dbpedia.org/resource/People’s_Republic_of_China


                                                                                                    (xsd:integer)



                                            October 12, 2012 -- E. Muñoz
Reasoning over Wikipedia Tables

                         dbpedia.org/ontology/country
                     dbpedia.org/property/subdivisionName
                                                                          Extracting relations
                                                                                   http://en.wikipedia.org/wiki/Dublin

       dbpedia.org/property/city                     dbpedia.org/property/nation                 dbpedia.org/property/since

dbpedia.org/resource/San_Jose,_California         dbpedia.org/resource/United_States


     dbpedia.org/resource/Liverpool              dbpedia.org/resource/United_Kingdom

 dbpedia.org/resource/Matsue,_Shimane                 dbpedia.org/resource/Japan


    dbpedia.org/resource/Barcelona                    dbpedia.org/resource/Spain

      dbpedia.org/resource/Beijing          dbpedia.org/resource/People’s_Republic_of_China


                                                                                                    (xsd:integer)

                       is dbpedia.org/ontology/country of

                                            October 12, 2012 -- E. Muñoz
•   <http://dbpedia.org/resource/San_Jose,_California>
    <http://dbpedia.org/property/subdivisionName>
    <http://dbpedia.org/resource/United_States> .
•              Reasoning over Wikipedia Tables
    <http://dbpedia.org/resource/San_Jose,_California>
    <http://dbpedia.org/ontology/country>
    <http://dbpedia.org/resource/United_States> .
•   <http://dbpedia.org/resource/Liverpool>
    <http://dbpedia.org/property/subdivisionName>
    <http://dbpedia.org/resource/United_Kingdom> .
•   <http://dbpedia.org/resource/Liverpool>
    <http://dbpedia.org/ontology/country>
    <http://dbpedia.org/resource/United_Kingdom> .
•   <http://dbpedia.org/resource/Matsue,_Shimane>
    <http://dbpedia.org/property/subdivisionName>
    <http://dbpedia.org/resource/Japan> .
•   <http://dbpedia.org/resource/Matsue,_Shimane>
    <http://dbpedia.org/ontology/country>
    <http://dbpedia.org/resource/Japan> .
•   <http://dbpedia.org/resource/Barcelona>
    <http://dbpedia.org/property/subdivisionName>
    <http://dbpedia.org/resource/Spain> .
•   <http://dbpedia.org/resource/Barcelona>
    <http://dbpedia.org/ontology/country>
    <http://dbpedia.org/resource/Spain> .
•   <http://dbpedia.org/resource/Beijing>
    <http://dbpedia.org/property/subdivisionName>
    <http://dbpedia.org/resource/People's_Republic_of_China> .
•   <http://dbpedia.org/resource/Beijing>
    <http://dbpedia.org/ontology/country>
    <http://dbpedia.org/resource/People's_Republic_of_China> .

                                    October 12, 2012 -- E. Muñoz
•   <http://dbpedia.org/resource/San_Jose,_California>
    <http://dbpedia.org/property/subdivisionName>
    <http://dbpedia.org/resource/United_States> .
•              Reasoning over Wikipedia Tables
    <http://dbpedia.org/resource/San_Jose,_California>
    <http://dbpedia.org/ontology/country>
    <http://dbpedia.org/resource/United_States> .
•   <http://dbpedia.org/resource/Liverpool>
    <http://dbpedia.org/property/subdivisionName>
    <http://dbpedia.org/resource/United_Kingdom> .
•   <http://dbpedia.org/resource/Liverpool>
    <http://dbpedia.org/ontology/country>
    <http://dbpedia.org/resource/United_Kingdom> .
•   <http://dbpedia.org/resource/Matsue,_Shimane>
    <http://dbpedia.org/property/subdivisionName>
    <http://dbpedia.org/resource/Japan> .
•   <http://dbpedia.org/resource/Matsue,_Shimane>
    <http://dbpedia.org/ontology/country>
    <http://dbpedia.org/resource/Japan> .
•   <http://dbpedia.org/resource/Barcelona>
    <http://dbpedia.org/property/subdivisionName>
    <http://dbpedia.org/resource/Spain> .
•   <http://dbpedia.org/resource/Barcelona>
    <http://dbpedia.org/ontology/country>
    <http://dbpedia.org/resource/Spain> .
•   <http://dbpedia.org/resource/Beijing>
    <http://dbpedia.org/property/subdivisionName>
    <http://dbpedia.org/resource/People's_Republic_of_China> .
•   <http://dbpedia.org/resource/Beijing>
    <http://dbpedia.org/ontology/country>
    <http://dbpedia.org/resource/People's_Republic_of_China> .

                                    October 12, 2012 -- E. Muñoz
Reasoning over Wikipedia Tables

• Let’s analyze these cases …

• Liverpool

• Matsue

• Beijing


                  October 12, 2012 -- E. Muñoz
Not that simple…

• Web tables usually don’t have explicit semantics
  by themselves.
• Main issues:
  –   Complex tables with spans
  –   Captions inside the table as another row
  –   Not well-formed tables (i.e., not a matrix)
  –   We need filters (e.g., min 2 columns, 2 rows)
• We are extracting relations at row level and
  between the main entity and the table resources

                       October 12, 2012 -- E. Muñoz
Parsing: Extracting Tables

First step: parsing Wiki format                                         Caption as
                                                                       another row




                                         http://en.wikipedia.org/wiki/People%27s_Republic_of_China



 Rowspans           Table split
with pictures

                          October 12, 2012 -- E. Muñoz
Parsing: Extracting Tables

• Problems with parsing the cell’s content

                                         http://en.wikipedia.org/wiki/Danny_Kaye




                  October 12, 2012 -- E. Muñoz
Parsing: Extracting Tables

• Problems with parsing the cell’s content

                                         http://en.wikipedia.org/wiki/Danny_Kaye




                  October 12, 2012 -- E. Muñoz
Parsing: Extracting Tables

                         Same page link                              Many different
                                                                       formats




Anchor text
    vs.
Content text




                    http://en.wikipedia.org/wiki/List_of_animated_television_series_of_the_1990s

                         October 12, 2012 -- E. Muñoz
Extracting Relations

                                                http://en.wikipedia.org/wiki/AFC_Ajax

     A table
containing tables




                        October 12, 2012 -- E. Muñoz
Extracting Relations

• Also relations between the main entity and
  the entities in the table      http://en.wikipedia.org/wiki/AFC_Ajax


                                           16 players
dbpedia.org/resource/AFC_Ajax

14   dbpedia.org/ontology/team
14   dbpedia.org/property/clubs
11   dbpedia.org/property/currentclub
3    dbpedia.org/property/youthclubs

                            In his dbpedia page
                            there is no mention
                                 to AFC Ajax


                          October 12, 2012 -- E. Muñoz
dbpedia.org/resource/Christian_Eriksen




                                                               http://en.wikipedia.org/wiki/AFC_Ajax
Disambiguation page
dbpedia.org/resource/Ajax




                                    October 12, 2012 -- E. Muñoz
Our Dataset

• enwiki dump from 2012-09-03 02:17:37
• 8.6 GB of Wikipedia pages that comprise
  – 10,531,986 documents (HTML pages)
  – Only 413,256 HTML contains tables
  – 2,989,098 tables
  – 905,929 tables after the filter
     • 27.7% of the whole tables
  – 0.46 tables per page (or 2.15 discarding pages
    without tables)

                     October 12, 2012 -- E. Muñoz
Methodology




 October 12, 2012 -- E. Muñoz
Ranking of Relationships

• The current ranking function is naïve
                          𝑓 𝑟𝑒𝑙                              http://en.wikipedia.org/wiki/AFC_Ajax
              𝑠𝑐𝑜𝑟𝑒 =
                        𝑛 𝑟𝑜𝑤𝑠
                                                16 players

freq             relationship                   score
 14       dbpedia.org/ontology/team             0,875
 14       dbpedia.org/property/clubs            0,875
 11    dbpedia.org/property/currentclub         0,6875
 3     dbpedia.org/property/youthclubs          0,1875




                                  October 12, 2012 -- E. Muñoz
Ranking of Relationships

• For this cases is not good and 𝑠𝑐𝑜𝑟𝑒 ∉ [0,1]

                                         http://en.wikipedia.org/wiki/Danny_Kaye




                  October 12, 2012 -- E. Muñoz
Ongoing Work and Challenges

• Improve the ranking function for relations.
• Store the 5.5M DBpedia (transitive) redirects
  locally (optimizing time).
• Statistical analysis of Wikipedia tables
  – Number of columns, rows
  – Headers, Captions
  – External and internal links
• The big following challenge is the evaluation.

                    October 12, 2012 -- E. Muñoz
What’s next?

• Some ideas in mind:
  – Use the extracted relations to classify WikiTables
  – Define a similarity function for WikiTables




                     English       Italian


                    October 12, 2012 -- E. Muñoz
What’s next?

http://en.wikipedia.org/wiki/Electronegativity




                   What means                       Here there is no reference to those numbers!
                   this number?



                                             October 12, 2012 -- E. Muñoz
What’s next?
                                                                      http://dbpedia.org/page/Chlorous_acid


http://en.wikipedia.org/wiki/Electronegativity




                                             Chlorous acid is a chlorite


                                                                           http://en.wikipedia.org/wiki/Chlorine




                                             October 12, 2012 -- E. Muñoz
Open problems

•   Handle multiple-entities in the same cell
•   Improve the ranking function
                      Thanks!
•   Handle redirects before querying DBpedia
                       Q&A
•   How to evaluate the outcome

                                             Thanks!
                                           Emir Muñoz
                                Unit for Reasoning and Querying
                                      emir.munoz@deri.org


                    October 12, 2012 -- E. Muñoz

More Related Content

Similar to WikiTables DERI Talk

Demo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open DataDemo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open DataStefan Dietze
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupalemmanuel_jamin
 
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic WebWeb Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic WebStefan Dietze
 
06 gioca-ontologies
06 gioca-ontologies06 gioca-ontologies
06 gioca-ontologiesnidzokus
 
Extracting Data from Historical Documents: Crowdsourcing Annotations on Wikis...
Extracting Data from Historical Documents: Crowdsourcing Annotations on Wikis...Extracting Data from Historical Documents: Crowdsourcing Annotations on Wikis...
Extracting Data from Historical Documents: Crowdsourcing Annotations on Wikis...Gaurav Vaidya
 
Metadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data RepositoriesMetadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data Repositoriesandrea huang
 
Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010Juan Sequeda
 
TEDDY - Thesaurus Editor: Design and Definition Yarn
TEDDY - Thesaurus Editor: Design and Definition YarnTEDDY - Thesaurus Editor: Design and Definition Yarn
TEDDY - Thesaurus Editor: Design and Definition YarnClaudiu Mihăilă
 
Wikipedia as source of collaboratively created Knowledge Organization Systems
Wikipedia as source of collaboratively created Knowledge Organization SystemsWikipedia as source of collaboratively created Knowledge Organization Systems
Wikipedia as source of collaboratively created Knowledge Organization SystemsJakob .
 
Sw 3 bizer etal-d bpedia-crystallization-point-jws-preprint
Sw 3 bizer etal-d bpedia-crystallization-point-jws-preprintSw 3 bizer etal-d bpedia-crystallization-point-jws-preprint
Sw 3 bizer etal-d bpedia-crystallization-point-jws-preprintokeee
 
Wikipedia as Knowledge Organization System
Wikipedia as Knowledge Organization SystemWikipedia as Knowledge Organization System
Wikipedia as Knowledge Organization SystemJakob .
 
Wikipedia takes angkor ppt & demo - final 20121003
Wikipedia takes angkor   ppt & demo - final 20121003Wikipedia takes angkor   ppt & demo - final 20121003
Wikipedia takes angkor ppt & demo - final 20121003Kounila Keo
 
121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1manujam
 
Archives & the Semantic Web
Archives & the Semantic WebArchives & the Semantic Web
Archives & the Semantic WebMark Matienzo
 
¿ARCHIVO?
¿ARCHIVO?¿ARCHIVO?
¿ARCHIVO?ESPOL
 
que hisciste el verano pasado
que hisciste el verano pasadoque hisciste el verano pasado
que hisciste el verano pasadoespol
 

Similar to WikiTables DERI Talk (20)

Demo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open DataDemo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open Data
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupal
 
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic WebWeb Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
 
06 gioca-ontologies
06 gioca-ontologies06 gioca-ontologies
06 gioca-ontologies
 
Extracting Data from Historical Documents: Crowdsourcing Annotations on Wikis...
Extracting Data from Historical Documents: Crowdsourcing Annotations on Wikis...Extracting Data from Historical Documents: Crowdsourcing Annotations on Wikis...
Extracting Data from Historical Documents: Crowdsourcing Annotations on Wikis...
 
Metadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data RepositoriesMetadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data Repositories
 
Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010
 
TEDDY - Thesaurus Editor: Design and Definition Yarn
TEDDY - Thesaurus Editor: Design and Definition YarnTEDDY - Thesaurus Editor: Design and Definition Yarn
TEDDY - Thesaurus Editor: Design and Definition Yarn
 
Social Work Subject Guide
Social Work Subject GuideSocial Work Subject Guide
Social Work Subject Guide
 
Resources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the WebResources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the Web
 
Wikipedia as source of collaboratively created Knowledge Organization Systems
Wikipedia as source of collaboratively created Knowledge Organization SystemsWikipedia as source of collaboratively created Knowledge Organization Systems
Wikipedia as source of collaboratively created Knowledge Organization Systems
 
Sw 3 bizer etal-d bpedia-crystallization-point-jws-preprint
Sw 3 bizer etal-d bpedia-crystallization-point-jws-preprintSw 3 bizer etal-d bpedia-crystallization-point-jws-preprint
Sw 3 bizer etal-d bpedia-crystallization-point-jws-preprint
 
Towards Knowledge-Enabled Society
Towards Knowledge-Enabled SocietyTowards Knowledge-Enabled Society
Towards Knowledge-Enabled Society
 
Wikipedia as Knowledge Organization System
Wikipedia as Knowledge Organization SystemWikipedia as Knowledge Organization System
Wikipedia as Knowledge Organization System
 
Linked Open Data stuff
Linked Open Data stuffLinked Open Data stuff
Linked Open Data stuff
 
Wikipedia takes angkor ppt & demo - final 20121003
Wikipedia takes angkor   ppt & demo - final 20121003Wikipedia takes angkor   ppt & demo - final 20121003
Wikipedia takes angkor ppt & demo - final 20121003
 
121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1
 
Archives & the Semantic Web
Archives & the Semantic WebArchives & the Semantic Web
Archives & the Semantic Web
 
¿ARCHIVO?
¿ARCHIVO?¿ARCHIVO?
¿ARCHIVO?
 
que hisciste el verano pasado
que hisciste el verano pasadoque hisciste el verano pasado
que hisciste el verano pasado
 

More from Emir Muñoz

A Linked Data-Based Decision Tree Classifier to Review Movies
A Linked Data-Based Decision Tree Classifier to Review MoviesA Linked Data-Based Decision Tree Classifier to Review Movies
A Linked Data-Based Decision Tree Classifier to Review MoviesEmir Muñoz
 
Web Intelligence - 2010
Web Intelligence - 2010Web Intelligence - 2010
Web Intelligence - 2010Emir Muñoz
 
μRaptor: A DOM-based system with appetite for hCard elements
μRaptor: A DOM-based system with appetite for hCard elementsμRaptor: A DOM-based system with appetite for hCard elements
μRaptor: A DOM-based system with appetite for hCard elementsEmir Muñoz
 
Learning Content Patterns from Linked Data
Learning Content Patterns from Linked DataLearning Content Patterns from Linked Data
Learning Content Patterns from Linked DataEmir Muñoz
 
Claves XML: Una Implementación de Algoritmos de Implicación y Validación
Claves XML: Una Implementación de Algoritmos de Implicación y ValidaciónClaves XML: Una Implementación de Algoritmos de Implicación y Validación
Claves XML: Una Implementación de Algoritmos de Implicación y ValidaciónEmir Muñoz
 
Using Linked Data to Mine RDF from Wikipedia's Tables
Using Linked Data to Mine RDF from Wikipedia's TablesUsing Linked Data to Mine RDF from Wikipedia's Tables
Using Linked Data to Mine RDF from Wikipedia's TablesEmir Muñoz
 
Reading Group 2014
Reading Group 2014Reading Group 2014
Reading Group 2014Emir Muñoz
 
Soft Cardinality Constraints on XML Data
Soft Cardinality Constraints on XML DataSoft Cardinality Constraints on XML Data
Soft Cardinality Constraints on XML DataEmir Muñoz
 
DRETa: Extracting RDF From Wikitables
DRETa: Extracting RDF From WikitablesDRETa: Extracting RDF From Wikitables
DRETa: Extracting RDF From WikitablesEmir Muñoz
 

More from Emir Muñoz (10)

A Linked Data-Based Decision Tree Classifier to Review Movies
A Linked Data-Based Decision Tree Classifier to Review MoviesA Linked Data-Based Decision Tree Classifier to Review Movies
A Linked Data-Based Decision Tree Classifier to Review Movies
 
Web Intelligence - 2010
Web Intelligence - 2010Web Intelligence - 2010
Web Intelligence - 2010
 
μRaptor: A DOM-based system with appetite for hCard elements
μRaptor: A DOM-based system with appetite for hCard elementsμRaptor: A DOM-based system with appetite for hCard elements
μRaptor: A DOM-based system with appetite for hCard elements
 
Learning Content Patterns from Linked Data
Learning Content Patterns from Linked DataLearning Content Patterns from Linked Data
Learning Content Patterns from Linked Data
 
Claves XML: Una Implementación de Algoritmos de Implicación y Validación
Claves XML: Una Implementación de Algoritmos de Implicación y ValidaciónClaves XML: Una Implementación de Algoritmos de Implicación y Validación
Claves XML: Una Implementación de Algoritmos de Implicación y Validación
 
Using Linked Data to Mine RDF from Wikipedia's Tables
Using Linked Data to Mine RDF from Wikipedia's TablesUsing Linked Data to Mine RDF from Wikipedia's Tables
Using Linked Data to Mine RDF from Wikipedia's Tables
 
Reading Group 2014
Reading Group 2014Reading Group 2014
Reading Group 2014
 
Soft Cardinality Constraints on XML Data
Soft Cardinality Constraints on XML DataSoft Cardinality Constraints on XML Data
Soft Cardinality Constraints on XML Data
 
DRETa: Extracting RDF From Wikitables
DRETa: Extracting RDF From WikitablesDRETa: Extracting RDF From Wikitables
DRETa: Extracting RDF From Wikitables
 
DEXA 2012 Talk
DEXA 2012 TalkDEXA 2012 Talk
DEXA 2012 Talk
 

Recently uploaded

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 

Recently uploaded (20)

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 

WikiTables DERI Talk

  • 1. Extending DBpedia (LOD) using WikiTables Emir Muñoz Unit for Reasoning and Querying emir.munoz@deri.org
  • 2. Linked Open Data Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ October 12, 2012 -- E. Muñoz
  • 3. Linked Open Data • DBpedia, an export of Wikipedia’s structured data DBpedia provides RDF version of all wikipedia structured data (infoboxes) October 12, 2012 -- E. Muñoz
  • 4. Linked Open Data • DBpedia, an export of Wikipedia’s structured data DBpedia provides RDF version of all wikipedia structured data (infoboxes) But not yet a version of all normal Wikipedia tables or wikitables October 12, 2012 -- E. Muñoz
  • 5. Tables as a source of LOD Tables are inherently concise Infoboxes as well as information rich (attr-value) The values Column header represents represent types of information Caption as instances of that another row types http://en.wikipedia.org/wiki/Dublin http://en.wikipedia.org/wiki/Galway October 12, 2012 -- E. Muñoz
  • 6. Reasoning over Wikipedia Tables Recovering Table Semantics … Dublin is twinned with the following places: http://en.wikipedia.org/wiki/Dublin October 12, 2012 -- E. Muñoz
  • 7. Reasoning over Wikipedia Tables Entity annotation for cells, mappings to DBpedia resources http://en.wikipedia.org/wiki/Dublin dbpedia.org/property/city dbpedia.org/property/nation dbpedia.org/property/since dbpedia.org/resource/San_Jose,_California dbpedia.org/resource/United_States dbpedia.org/resource/Liverpool dbpedia.org/resource/United_Kingdom dbpedia.org/resource/Matsue,_Shimane dbpedia.org/resource/Japan dbpedia.org/resource/Barcelona dbpedia.org/resource/Spain dbpedia.org/resource/Beijing dbpedia.org/resource/People’s_Republic_of_China (xsd:integer) October 12, 2012 -- E. Muñoz
  • 8. Reasoning over Wikipedia Tables dbpedia.org/ontology/country dbpedia.org/property/subdivisionName Extracting relations http://en.wikipedia.org/wiki/Dublin dbpedia.org/property/city dbpedia.org/property/nation dbpedia.org/property/since dbpedia.org/resource/San_Jose,_California dbpedia.org/resource/United_States dbpedia.org/resource/Liverpool dbpedia.org/resource/United_Kingdom dbpedia.org/resource/Matsue,_Shimane dbpedia.org/resource/Japan dbpedia.org/resource/Barcelona dbpedia.org/resource/Spain dbpedia.org/resource/Beijing dbpedia.org/resource/People’s_Republic_of_China (xsd:integer) is dbpedia.org/ontology/country of October 12, 2012 -- E. Muñoz
  • 9. <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_States> . • Reasoning over Wikipedia Tables <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_States> . • <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_Kingdom> . • <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_Kingdom> . • <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Japan> . • <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Japan> . • <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Spain> . • <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Spain> . • <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/People's_Republic_of_China> . • <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/People's_Republic_of_China> . October 12, 2012 -- E. Muñoz
  • 10. <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_States> . • Reasoning over Wikipedia Tables <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_States> . • <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_Kingdom> . • <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_Kingdom> . • <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Japan> . • <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Japan> . • <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Spain> . • <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Spain> . • <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/People's_Republic_of_China> . • <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/People's_Republic_of_China> . October 12, 2012 -- E. Muñoz
  • 11. Reasoning over Wikipedia Tables • Let’s analyze these cases … • Liverpool • Matsue • Beijing October 12, 2012 -- E. Muñoz
  • 12. Not that simple… • Web tables usually don’t have explicit semantics by themselves. • Main issues: – Complex tables with spans – Captions inside the table as another row – Not well-formed tables (i.e., not a matrix) – We need filters (e.g., min 2 columns, 2 rows) • We are extracting relations at row level and between the main entity and the table resources October 12, 2012 -- E. Muñoz
  • 13. Parsing: Extracting Tables First step: parsing Wiki format Caption as another row http://en.wikipedia.org/wiki/People%27s_Republic_of_China Rowspans Table split with pictures October 12, 2012 -- E. Muñoz
  • 14. Parsing: Extracting Tables • Problems with parsing the cell’s content http://en.wikipedia.org/wiki/Danny_Kaye October 12, 2012 -- E. Muñoz
  • 15. Parsing: Extracting Tables • Problems with parsing the cell’s content http://en.wikipedia.org/wiki/Danny_Kaye October 12, 2012 -- E. Muñoz
  • 16. Parsing: Extracting Tables Same page link Many different formats Anchor text vs. Content text http://en.wikipedia.org/wiki/List_of_animated_television_series_of_the_1990s October 12, 2012 -- E. Muñoz
  • 17. Extracting Relations http://en.wikipedia.org/wiki/AFC_Ajax A table containing tables October 12, 2012 -- E. Muñoz
  • 18. Extracting Relations • Also relations between the main entity and the entities in the table http://en.wikipedia.org/wiki/AFC_Ajax 16 players dbpedia.org/resource/AFC_Ajax 14 dbpedia.org/ontology/team 14 dbpedia.org/property/clubs 11 dbpedia.org/property/currentclub 3 dbpedia.org/property/youthclubs In his dbpedia page there is no mention to AFC Ajax October 12, 2012 -- E. Muñoz
  • 19. dbpedia.org/resource/Christian_Eriksen http://en.wikipedia.org/wiki/AFC_Ajax Disambiguation page dbpedia.org/resource/Ajax October 12, 2012 -- E. Muñoz
  • 20. Our Dataset • enwiki dump from 2012-09-03 02:17:37 • 8.6 GB of Wikipedia pages that comprise – 10,531,986 documents (HTML pages) – Only 413,256 HTML contains tables – 2,989,098 tables – 905,929 tables after the filter • 27.7% of the whole tables – 0.46 tables per page (or 2.15 discarding pages without tables) October 12, 2012 -- E. Muñoz
  • 21. Methodology October 12, 2012 -- E. Muñoz
  • 22. Ranking of Relationships • The current ranking function is naïve 𝑓 𝑟𝑒𝑙 http://en.wikipedia.org/wiki/AFC_Ajax 𝑠𝑐𝑜𝑟𝑒 = 𝑛 𝑟𝑜𝑤𝑠 16 players freq relationship score 14 dbpedia.org/ontology/team 0,875 14 dbpedia.org/property/clubs 0,875 11 dbpedia.org/property/currentclub 0,6875 3 dbpedia.org/property/youthclubs 0,1875 October 12, 2012 -- E. Muñoz
  • 23. Ranking of Relationships • For this cases is not good and 𝑠𝑐𝑜𝑟𝑒 ∉ [0,1] http://en.wikipedia.org/wiki/Danny_Kaye October 12, 2012 -- E. Muñoz
  • 24. Ongoing Work and Challenges • Improve the ranking function for relations. • Store the 5.5M DBpedia (transitive) redirects locally (optimizing time). • Statistical analysis of Wikipedia tables – Number of columns, rows – Headers, Captions – External and internal links • The big following challenge is the evaluation. October 12, 2012 -- E. Muñoz
  • 25. What’s next? • Some ideas in mind: – Use the extracted relations to classify WikiTables – Define a similarity function for WikiTables English Italian October 12, 2012 -- E. Muñoz
  • 26. What’s next? http://en.wikipedia.org/wiki/Electronegativity What means Here there is no reference to those numbers! this number? October 12, 2012 -- E. Muñoz
  • 27. What’s next? http://dbpedia.org/page/Chlorous_acid http://en.wikipedia.org/wiki/Electronegativity Chlorous acid is a chlorite http://en.wikipedia.org/wiki/Chlorine October 12, 2012 -- E. Muñoz
  • 28. Open problems • Handle multiple-entities in the same cell • Improve the ranking function Thanks! • Handle redirects before querying DBpedia Q&A • How to evaluate the outcome Thanks! Emir Muñoz Unit for Reasoning and Querying emir.munoz@deri.org October 12, 2012 -- E. Muñoz