SlideShare une entreprise Scribd logo
1  sur  14
Extending DBpedia
with Wikipedia List Pages

10/22/13 Paulheim, Simone Paolo Simone Paolo Ponzetto
Heiko Paulheim, Ponzetto
Heiko

1
Disclaimer
•

This presentation shows an idea
– after all, it says “position paper”
– We don't know if it works!
– (but we are quite confident)

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

2
Lists in Wikipedia
•

Wikipedia loves lists

•

As of June 2013, there are almost 600,000 list pages

•

Lists organize Wikipedia pages
– that correspond to DBpedia instances

•

Example:
– List of African-American writers

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

3
Lists in Wikipedia

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

4
Lists in Wikipedia
•

Different types of lists
– simple bullet point lists
– broken bullet point lists (i.e., different sections)
• sometimes, the sections are semantically meaningful
– tables
– ...

Simple Bullet List
Broken Bullet List
Table
Other

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

5
Lists in Wikipedia
•

What information is in a list?
– the linked things have the same “type”

•

The type can be a complex construct
– e.g., Writer∩∀ nationality. {United States}∩∀ ethnicity.{African American}

•

Sometimes, there are more information bits
– e.g., birth dates for persons

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

6
Extracting Information from Lists
•

Goal:
– find the common characteristics of all things in the list

•

Example: African-American writers
– all instances are writers

25%

– all instances have nationality=United_States
– all instances have ethnicity=African_American

•

12%
3%

Information in DBpedia is far from complete
– makes extraction difficult
– but: big potential to add information to DBpedia

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

7
Extracting Information from Lists
•

Possible approach: finding characteristics with high TF-IDF
– TF: percentage of instances in the list that carry characteristic
– IDF: 1 / (percentage of all DBpedia instances that carry characteristic)

•

Rationale: only going by frequency would rate owl:Thing the highest

•

Example: African-American writers
– type=Writer: 0.608 (maximal across all possible classes)
– nationality=United_States: 0.277
– ethnicity=African_American: 0.127

•

But:
– deathPlace=New_York_City: 0.157 :-(

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

8
Extracting Information from Lists
•

Example: African-American writers
– ethnicity=African_American: 0.127
– deathPlace=New_York_City: 0.157

•

Exploit further information from list page
– e.g., wiki:African_American is linked from page, New_York_City is not
– e.g., analyze list page title, e.g., using DBpedia Spotlight
• African_American is recognized as an entity

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

9
Lists of Lists in Wikipedia
•

Wikipedia also knows ~600 lists of lists
– organize lists
– form a hierachy

•

E.g.:
– Lists of Writers
– Lists of American writers
– List of African American writers

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

10
From Lists of Lists to an Extended Ontology
•

Idea:
– find corresponding lists of... pages for DBpedia classes
– extend hierarchy
owl:Thing
...

Agent
...

Person

Corresponding Wikipedia page:

Artist

...
DBpedia Ontology

...

Extended Ontology ...

Lists of Writers

Writer

African-American Writer

10/22/13

Lists of American Writers

American Writer
...

List of African-American Writers

Heiko Paulheim, Simone Paolo Ponzetto

11
Potential of the Idea
•

Given that we extract everything correctly from
List of African American writers, we get
– 814 new type statements (only DBpedia ontology)
– 1409 new property assertions
– two entirely new instances

•

...and there are ~600,000 list pages
– extrapolation: we can roughly double the information in DBpedia

•

many list pages contain extra information
– e.g., birth places and birth dates of persons

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

12
Challenges
•

Robust extraction of instances
– from different kinds of list pages
– e.g., picking the right column in a table
– tables and bullet point lists already make for 75%

•

Picking good scoring functions
– TF-IDF seems not bad at first glance

•

Combining statistical and textual evidence

•

Scalable implementation
– Advantage: perfectly parallelizable

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

13
Extending DBpedia
with Wikipedia List Pages

10/22/13 Paulheim, Christian Bizer
Heiko Paulheim, Simone Paolo Ponzetto
Heiko

14

Contenu connexe

Tendances

Enhancing authority records to aid copyright review
Enhancing authority records to aid copyright reviewEnhancing authority records to aid copyright review
Enhancing authority records to aid copyright review
Judith Ahronheim
 
finding info for film industry
finding info for film industryfinding info for film industry
finding info for film industry
gulab sharma
 

Tendances (20)

Another one like this
Another one like thisAnother one like this
Another one like this
 
English Postgraduates introduction to the library
English Postgraduates introduction to the libraryEnglish Postgraduates introduction to the library
English Postgraduates introduction to the library
 
Searching Workshop
Searching WorkshopSearching Workshop
Searching Workshop
 
Heritage University Newspaper Resources
Heritage University Newspaper ResourcesHeritage University Newspaper Resources
Heritage University Newspaper Resources
 
English Session 1: finding quality information for your course
English Session 1: finding quality information for your courseEnglish Session 1: finding quality information for your course
English Session 1: finding quality information for your course
 
Digital Library exploration evaluation
Digital Library exploration evaluationDigital Library exploration evaluation
Digital Library exploration evaluation
 
Enhancing authority records to aid copyright review
Enhancing authority records to aid copyright reviewEnhancing authority records to aid copyright review
Enhancing authority records to aid copyright review
 
PIE-J - NISO Update Jan 2014
PIE-J - NISO Update Jan 2014PIE-J - NISO Update Jan 2014
PIE-J - NISO Update Jan 2014
 
Find articles theatre 1313
Find articles theatre 1313Find articles theatre 1313
Find articles theatre 1313
 
Library resources
Library resourcesLibrary resources
Library resources
 
Part V Documenting Your Sources
Part V Documenting Your SourcesPart V Documenting Your Sources
Part V Documenting Your Sources
 
NCompass Live: Making Your Catalog Work for Your Community: How to Develop Lo...
NCompass Live: Making Your Catalog Work for Your Community: How to Develop Lo...NCompass Live: Making Your Catalog Work for Your Community: How to Develop Lo...
NCompass Live: Making Your Catalog Work for Your Community: How to Develop Lo...
 
RDA and Hebraica: Applying RDA in one cataloging community
RDA and Hebraica: Applying RDA in one cataloging communityRDA and Hebraica: Applying RDA in one cataloging community
RDA and Hebraica: Applying RDA in one cataloging community
 
Engr185 fall 2011
Engr185 fall 2011Engr185 fall 2011
Engr185 fall 2011
 
finding info for film industry
finding info for film industryfinding info for film industry
finding info for film industry
 
Mla citation
Mla citationMla citation
Mla citation
 
UW Libraries: Research Smarter, Not Harder
UW Libraries: Research Smarter, Not HarderUW Libraries: Research Smarter, Not Harder
UW Libraries: Research Smarter, Not Harder
 
Custom source types
Custom source typesCustom source types
Custom source types
 
Making Your Catalog Work for Your Community: How to Develop Local Cataloging ...
Making Your Catalog Work for Your Community: How to Develop Local Cataloging ...Making Your Catalog Work for Your Community: How to Develop Local Cataloging ...
Making Your Catalog Work for Your Community: How to Develop Local Cataloging ...
 
Getting Started with Ancestry Library Edition
Getting Started with Ancestry Library EditionGetting Started with Ancestry Library Edition
Getting Started with Ancestry Library Edition
 

Similaire à Extending DBpedia with Wikipedia List Pages

The essay parts & explanation
The essay parts & explanationThe essay parts & explanation
The essay parts & explanation
Armando Castillo
 
Canadian history 1
Canadian history 1Canadian history 1
Canadian history 1
lakehead1
 
Canadian history 2301
Canadian history  2301Canadian history  2301
Canadian history 2301
lakehead1
 
Biographical Reference Sources
Biographical Reference SourcesBiographical Reference Sources
Biographical Reference Sources
mkwalsh55
 
Research strategies update 2011
Research strategies update 2011Research strategies update 2011
Research strategies update 2011
Sue Bennett
 
swib12 lightning talk
swib12 lightning talkswib12 lightning talk
swib12 lightning talk
phibaa
 
How do you research art part 2
How do you research art part 2How do you research art part 2
How do you research art part 2
charlottefrost
 
Works Cited Modern Language AssociationModern Lang.docx
Works Cited Modern Language AssociationModern Lang.docxWorks Cited Modern Language AssociationModern Lang.docx
Works Cited Modern Language AssociationModern Lang.docx
dunnramage
 

Similaire à Extending DBpedia with Wikipedia List Pages (20)

1 hf research_journey
1 hf research_journey1 hf research_journey
1 hf research_journey
 
1 Hf Research Journey
1 Hf Research Journey1 Hf Research Journey
1 Hf Research Journey
 
The essay parts & explanation
The essay parts & explanationThe essay parts & explanation
The essay parts & explanation
 
Canadian history 1
Canadian history 1Canadian history 1
Canadian history 1
 
Canadian history 2301
Canadian history  2301Canadian history  2301
Canadian history 2301
 
Wikimedia Workshop
Wikimedia WorkshopWikimedia Workshop
Wikimedia Workshop
 
Biographical Reference Sources
Biographical Reference SourcesBiographical Reference Sources
Biographical Reference Sources
 
Research strategies update 2011
Research strategies update 2011Research strategies update 2011
Research strategies update 2011
 
Research strategies update 2011
Research strategies update 2011Research strategies update 2011
Research strategies update 2011
 
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
 
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
 
Writing a bibleography
Writing a bibleographyWriting a bibleography
Writing a bibleography
 
Research: Multicultural Education
Research: Multicultural EducationResearch: Multicultural Education
Research: Multicultural Education
 
swib12 lightning talk
swib12 lightning talkswib12 lightning talk
swib12 lightning talk
 
Types of information sources module
Types of information sources moduleTypes of information sources module
Types of information sources module
 
Reading Comprehension (Language Curriculum)
Reading Comprehension (Language Curriculum)Reading Comprehension (Language Curriculum)
Reading Comprehension (Language Curriculum)
 
How do you research art part 2
How do you research art part 2How do you research art part 2
How do you research art part 2
 
Works Cited Modern Language AssociationModern Lang.docx
Works Cited Modern Language AssociationModern Lang.docxWorks Cited Modern Language AssociationModern Lang.docx
Works Cited Modern Language AssociationModern Lang.docx
 
Basic search skills training
Basic search skills trainingBasic search skills training
Basic search skills training
 
Research skills final revision
Research skills final revisionResearch skills final revision
Research skills final revision
 

Plus de Heiko Paulheim

Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Heiko Paulheim
 

Plus de Heiko Paulheim (20)

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge Graphs
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge Graphs
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!
 
How much is a Triple?
How much is a Triple?How much is a Triple?
How much is a Triple?
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge Graphs
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on Twitter
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph Profiling
 
Knowledge Graphs on the Web
Knowledge Graphs on the WebKnowledge Graphs on the Web
Knowledge Graphs on the Web
 
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyData-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine Learning
 

Dernier

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Dernier (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Extending DBpedia with Wikipedia List Pages

  • 1. Extending DBpedia with Wikipedia List Pages 10/22/13 Paulheim, Simone Paolo Simone Paolo Ponzetto Heiko Paulheim, Ponzetto Heiko 1
  • 2. Disclaimer • This presentation shows an idea – after all, it says “position paper” – We don't know if it works! – (but we are quite confident) 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 2
  • 3. Lists in Wikipedia • Wikipedia loves lists • As of June 2013, there are almost 600,000 list pages • Lists organize Wikipedia pages – that correspond to DBpedia instances • Example: – List of African-American writers 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 3
  • 4. Lists in Wikipedia 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 4
  • 5. Lists in Wikipedia • Different types of lists – simple bullet point lists – broken bullet point lists (i.e., different sections) • sometimes, the sections are semantically meaningful – tables – ... Simple Bullet List Broken Bullet List Table Other 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 5
  • 6. Lists in Wikipedia • What information is in a list? – the linked things have the same “type” • The type can be a complex construct – e.g., Writer∩∀ nationality. {United States}∩∀ ethnicity.{African American} • Sometimes, there are more information bits – e.g., birth dates for persons 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 6
  • 7. Extracting Information from Lists • Goal: – find the common characteristics of all things in the list • Example: African-American writers – all instances are writers 25% – all instances have nationality=United_States – all instances have ethnicity=African_American • 12% 3% Information in DBpedia is far from complete – makes extraction difficult – but: big potential to add information to DBpedia 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 7
  • 8. Extracting Information from Lists • Possible approach: finding characteristics with high TF-IDF – TF: percentage of instances in the list that carry characteristic – IDF: 1 / (percentage of all DBpedia instances that carry characteristic) • Rationale: only going by frequency would rate owl:Thing the highest • Example: African-American writers – type=Writer: 0.608 (maximal across all possible classes) – nationality=United_States: 0.277 – ethnicity=African_American: 0.127 • But: – deathPlace=New_York_City: 0.157 :-( 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 8
  • 9. Extracting Information from Lists • Example: African-American writers – ethnicity=African_American: 0.127 – deathPlace=New_York_City: 0.157 • Exploit further information from list page – e.g., wiki:African_American is linked from page, New_York_City is not – e.g., analyze list page title, e.g., using DBpedia Spotlight • African_American is recognized as an entity 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 9
  • 10. Lists of Lists in Wikipedia • Wikipedia also knows ~600 lists of lists – organize lists – form a hierachy • E.g.: – Lists of Writers – Lists of American writers – List of African American writers 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 10
  • 11. From Lists of Lists to an Extended Ontology • Idea: – find corresponding lists of... pages for DBpedia classes – extend hierarchy owl:Thing ... Agent ... Person Corresponding Wikipedia page: Artist ... DBpedia Ontology ... Extended Ontology ... Lists of Writers Writer African-American Writer 10/22/13 Lists of American Writers American Writer ... List of African-American Writers Heiko Paulheim, Simone Paolo Ponzetto 11
  • 12. Potential of the Idea • Given that we extract everything correctly from List of African American writers, we get – 814 new type statements (only DBpedia ontology) – 1409 new property assertions – two entirely new instances • ...and there are ~600,000 list pages – extrapolation: we can roughly double the information in DBpedia • many list pages contain extra information – e.g., birth places and birth dates of persons 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 12
  • 13. Challenges • Robust extraction of instances – from different kinds of list pages – e.g., picking the right column in a table – tables and bullet point lists already make for 75% • Picking good scoring functions – TF-IDF seems not bad at first glance • Combining statistical and textual evidence • Scalable implementation – Advantage: perfectly parallelizable 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 13
  • 14. Extending DBpedia with Wikipedia List Pages 10/22/13 Paulheim, Christian Bizer Heiko Paulheim, Simone Paolo Ponzetto Heiko 14