SlideShare une entreprise Scribd logo
1  sur  24
But what do we actually know?
On knowledge base recall
Simon Razniewski
Free University of Bozen-Bolzano, Italy
Background
• TU Dresden, Germany: Diplom (Master) 2010
• Free University of Bozen-Bolzano, Italy
• 2011 - 2014: PhD (on reasoning about data completeness)
• 2014 - now: Fixed term assistant professor
• Research visits at UCSD (2012), AT&T Labs-Research (2013), UQ (2015)
Bolzano
Trilingual Ötzi 1/8th of EU apples 2
How complete are knowledge bases?
3
=recall
KBs are pretty incomplete
DBpedia: contains 6 out of 35
Dijkstra Prize winners 
YAGO: the average number of children
per person is 0.02 
Google Knowledge Graph:
``Points of Interest’’ – Completeness? 
4
KBs are pretty complete
Wikidata: 2 out of 2
children of Obama 
Google Knowledge Graph: 36 out of 48 Tarantino movies 
DBpedia: 167 out of 199 Nobel laureates in Physics 
5
So, how complete are KBs?
6
[Dong et al., KDD 2014]
KB engineers have only tried to make
KBs bigger. The point, however is to
understand what they are actually
trying to approximate.
There are known knowns; there are
things we know we know. We also know
there are known unknowns; that is to
say we know there are some things we
do not know. But there are also
unknown unknowns – the ones we
don't know we don't know.
7
Knowledge Bases as seen by [Rumsfeld, 2002]
Known knowns: The plain facts in a KB
• Trump’s birth date
• Hillary’s nationality
• …
Known unknown: The easy stuff
• NULL values/blank nodes
• Missing functional/mandatory values
Unknown unknowns: The interesting rest
• Are all children of John there?
• Does Mary play a musical instrument?
• Does Bob have other nationalities?
• …
8
Not KB completion!
• What other children does John have?
• Which instruments does Mary play?
• Which nationalities does Bob have?
Outline
1. Assessing completeness from inside the KB
a) Rule mining
b) Classification
2. Assessing completeness using text
c) Cardinalities
d) Recall-aware information extraction
3. Presenting the completeness of KBs
4. The meaning of it all
e) When is an entity complete?
f) When is an entity more complete than another?
g) Are interesting facts complete?
9
1. Asessing completeness from inside the KB
10
1a) Rule Mining [Galarraga et al., WSDM 2017]
hockeyPlayer(x)  Incomplete(x, hasChild)
scientist(x), hasWonNobelPrize(x)  Complete(x, graduatedFrom)
Challenge: No proper theory for consensus across multiple rules
human(x)  Complete(x, graduatedFrom)
teacher(x)  Incomplete(x, graduatedFrom)
professor(x)  Complete(x, graduatedFrom)
John ∈ (human, teacher, professor)  Complete(John, graduatedFrom)?
 Maybe the wrong approach?
11
1b) A classification problem
Input:
Entity e
Predicate p
Question:
Are all triples (e, p, _) in the KB?
Output:
Yes/No
Features: Facts, popularity measures, textual context, …
Training data: Crowdsourcing under constraints, deletion, popularity, …
Obama
hasChild
(Obama, hasChild , _)
Yes (Wikidata)
12
2. Assessing completeness using text
13
2c) Cardinality extraction [Mirza et al., Poster@ISWC 2016]
Text: “Barack and Michelle have two children, and […]”
Manually created patterns to extract children cardinalities from Wikipedia
 Found that about 2k entities have complete children, 84k have incomplete children
 Found evidence for 178% more children than currently in Wikidata
• Especially intriguing for long-tail entities
Open: Automation, other relations
KB: 0 KB: 1 KB: 2
Recall: 0% Recall: 50% Recall: 100%
14
2d) Recall-aware Information Extraction
Textual information extraction is usually precision-aware
“John was born in Malmö, Sweden, on […].” citizenship(John, Sweden) – precision 95%
“John grew up in Malmö, Sweden and […]” citizenship(John, Sweden) – precision 80%
What about making it recall-aware?
“John has a son, Tom, and a daughter, Susan.” hc(John, Tom), hc(John, Susan) – recall?
“John brought his children Susan and Tom to school.” hc(John, Tom), hc(John, Susan) – recall?
15
3. Presenting the completeness of KBs
16
How complete is Wikidata for children?
17
hasChild
date of birth
party membership
….
Facets
Occupation
 Politician 7.5%
 Soccer player 3.3%
 Lawyer 8.1%
 Other 2.2%
Nationality
 USA 3.8%
 India 2.7%
 China 2.2%
 England 5.5%
 …
Century of birth
 <15th century 1.1%
 16th century 1.4%
 …
Gender
 Male 4.3%
 Female 3.9%
Select attribute to analyse
Extrapolated completeness: 30.8%
Known completeness: 2.7%
Based on:
• There are 5371 people of this kind
• For these, 231 have children
• For these, Wikipedia says there should be 750 children
• Average number of children of complete entities is 2.3
• Average number of children of unknown people is 0.01
• …..
18
4. The meaning of it all
19
4e) When is data about an entity complete?
Complete(Justin Bieber)?
• Musician: birth date, musical instrument played, band
• Scientist: alma mater, field, advisor, awards
• Politician: Party, public positions held
 What about musicians playing in an orchestra?
 What about scientists that are also engaged in politics?
…..
• Interestingness is relative (“birth date more interesting than handedness”)
• Long tail of rare properties
 Some work on ranking predicates by relevance
Shortcoming: Mostly descriptive (see e.g. Wikidata Property Suggestor)
20
4f) When is an entity more complete than another?
Is data about Obama more complete than about Trump?
Goal: A notion of relative completeness
Is data about Ronaldo more complete than about Justin Bieber?
…..
Crowd studies: relative completeness = fact count?
Available as user script for Wikidata (Recoin - Relative Completeness Indicator)
21
https://www.wikidata.org/wiki/User:Ls1g/Recoin
4g) Are interesting facts complete?
LIGO:
Proved gravitation waves that were predicted by Einstein 80 years ago
Galileo Galilei:
Contrary to the dogma of the time, postulates that the earth orbits the sun
Reinhold Messner:
First person to climb all mountains >8000mt without supplemental oxygen
These are not elementary triples
FirstPersonToClimbAllMountainsAbove8000Without(Supplemental oxygen, Reinhold Messner)
1. What are these? Events? Sets of triples? Queries?
2. Where can we get the interestingness score from? Entropy? Pagerank? Text frequency?
3. Completeness depends on completeness of context!
23
Summary
1. Assessing completeness from inside the KB
a) Rule mining
b) Classification
2. Assessing completeness using text
c) Cardinalities
d) Recall-aware information extraction
3. Presenting the completeness of KBs
4. The meaning of it all
e) When is an entity complete?
f) When is an entity more complete than another?
g) Are interesting facts complete?
…meet me in room 416

Contenu connexe

Similaire à But what do we actually know - On knowledge base recall

What knowledge bases know (and what they don't)
What knowledge bases know (and what they don't)What knowledge bases know (and what they don't)
What knowledge bases know (and what they don't)srazniewski
 
Big Data Case Studies
Big Data Case Studies Big Data Case Studies
Big Data Case Studies UIResearchPark
 
Present wed march05_2014
Present wed march05_2014Present wed march05_2014
Present wed march05_2014lacalla
 
Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index
Real-Time Open-Domain Question Answering with Dense-Sparse Phrase IndexReal-Time Open-Domain Question Answering with Dense-Sparse Phrase Index
Real-Time Open-Domain Question Answering with Dense-Sparse Phrase IndexMinjoon Seo
 
Media, learning and Urban Myths #MandL16
Media, learning and Urban Myths #MandL16Media, learning and Urban Myths #MandL16
Media, learning and Urban Myths #MandL16Pedro De Bruyckere
 
(One Possible) Future of Scholarly Communication
(One Possible) Future of Scholarly Communication(One Possible) Future of Scholarly Communication
(One Possible) Future of Scholarly CommunicationMicah Altman
 
Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Roi Blanco
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingIla Group
 
Knowing what AI Systems Don't know and Why it matters
Knowing what AI  Systems Don't know and Why it mattersKnowing what AI  Systems Don't know and Why it matters
Knowing what AI Systems Don't know and Why it mattersJames Hendler
 
From Text To Reasoning - Marko Grobelnik - SWANK Workshop Stanford - 16 Apr 2014
From Text To Reasoning - Marko Grobelnik - SWANK Workshop Stanford - 16 Apr 2014From Text To Reasoning - Marko Grobelnik - SWANK Workshop Stanford - 16 Apr 2014
From Text To Reasoning - Marko Grobelnik - SWANK Workshop Stanford - 16 Apr 2014Marko Grobelnik
 
What is discourse analysis
What is discourse analysisWhat is discourse analysis
What is discourse analysisholmanisme
 
l2r.cs.uiuc.edu
l2r.cs.uiuc.edul2r.cs.uiuc.edu
l2r.cs.uiuc.edubutest
 
WASBO 2007 - It's STILL A Flat, Flat World
WASBO 2007 - It's STILL A Flat, Flat WorldWASBO 2007 - It's STILL A Flat, Flat World
WASBO 2007 - It's STILL A Flat, Flat Worldderrylyons
 
AI3391 Artificial Intelligence session 24 knowledge representation.pptx
AI3391 Artificial Intelligence session 24 knowledge representation.pptxAI3391 Artificial Intelligence session 24 knowledge representation.pptx
AI3391 Artificial Intelligence session 24 knowledge representation.pptxAsst.prof M.Gokilavani
 
Brains@Bay Meetup: Open-ended Skill Acquisition in Humans and Machines: An Ev...
Brains@Bay Meetup: Open-ended Skill Acquisition in Humans and Machines: An Ev...Brains@Bay Meetup: Open-ended Skill Acquisition in Humans and Machines: An Ev...
Brains@Bay Meetup: Open-ended Skill Acquisition in Humans and Machines: An Ev...Numenta
 
BACK TO THE DRAWING BOARD - The Myth of Data-Driven NLU and How to go Forward...
BACK TO THE DRAWING BOARD - The Myth of Data-Driven NLU and How to go Forward...BACK TO THE DRAWING BOARD - The Myth of Data-Driven NLU and How to go Forward...
BACK TO THE DRAWING BOARD - The Myth of Data-Driven NLU and How to go Forward...Walid Saba
 

Similaire à But what do we actually know - On knowledge base recall (20)

What knowledge bases know (and what they don't)
What knowledge bases know (and what they don't)What knowledge bases know (and what they don't)
What knowledge bases know (and what they don't)
 
Big Data Case Studies
Big Data Case Studies Big Data Case Studies
Big Data Case Studies
 
Present wed march05_2014
Present wed march05_2014Present wed march05_2014
Present wed march05_2014
 
Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index
Real-Time Open-Domain Question Answering with Dense-Sparse Phrase IndexReal-Time Open-Domain Question Answering with Dense-Sparse Phrase Index
Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index
 
Media, learning and Urban Myths #MandL16
Media, learning and Urban Myths #MandL16Media, learning and Urban Myths #MandL16
Media, learning and Urban Myths #MandL16
 
(One Possible) Future of Scholarly Communication
(One Possible) Future of Scholarly Communication(One Possible) Future of Scholarly Communication
(One Possible) Future of Scholarly Communication
 
Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Knowing what AI Systems Don't know and Why it matters
Knowing what AI  Systems Don't know and Why it mattersKnowing what AI  Systems Don't know and Why it matters
Knowing what AI Systems Don't know and Why it matters
 
From Text To Reasoning - Marko Grobelnik - SWANK Workshop Stanford - 16 Apr 2014
From Text To Reasoning - Marko Grobelnik - SWANK Workshop Stanford - 16 Apr 2014From Text To Reasoning - Marko Grobelnik - SWANK Workshop Stanford - 16 Apr 2014
From Text To Reasoning - Marko Grobelnik - SWANK Workshop Stanford - 16 Apr 2014
 
What is discourse analysis
What is discourse analysisWhat is discourse analysis
What is discourse analysis
 
What is discourse analysis
What is discourse analysisWhat is discourse analysis
What is discourse analysis
 
l2r.cs.uiuc.edu
l2r.cs.uiuc.edul2r.cs.uiuc.edu
l2r.cs.uiuc.edu
 
lecture1.ppt
lecture1.pptlecture1.ppt
lecture1.ppt
 
Nelinet
NelinetNelinet
Nelinet
 
WASBO 2007 - It's STILL A Flat, Flat World
WASBO 2007 - It's STILL A Flat, Flat WorldWASBO 2007 - It's STILL A Flat, Flat World
WASBO 2007 - It's STILL A Flat, Flat World
 
AI3391 Artificial Intelligence session 24 knowledge representation.pptx
AI3391 Artificial Intelligence session 24 knowledge representation.pptxAI3391 Artificial Intelligence session 24 knowledge representation.pptx
AI3391 Artificial Intelligence session 24 knowledge representation.pptx
 
LIS DREaM 1 Keynote: “… And into the zone of quasi-rationality”
LIS DREaM 1 Keynote: “… And into the zone of quasi-rationality”LIS DREaM 1 Keynote: “… And into the zone of quasi-rationality”
LIS DREaM 1 Keynote: “… And into the zone of quasi-rationality”
 
Brains@Bay Meetup: Open-ended Skill Acquisition in Humans and Machines: An Ev...
Brains@Bay Meetup: Open-ended Skill Acquisition in Humans and Machines: An Ev...Brains@Bay Meetup: Open-ended Skill Acquisition in Humans and Machines: An Ev...
Brains@Bay Meetup: Open-ended Skill Acquisition in Humans and Machines: An Ev...
 
BACK TO THE DRAWING BOARD - The Myth of Data-Driven NLU and How to go Forward...
BACK TO THE DRAWING BOARD - The Myth of Data-Driven NLU and How to go Forward...BACK TO THE DRAWING BOARD - The Myth of Data-Driven NLU and How to go Forward...
BACK TO THE DRAWING BOARD - The Myth of Data-Driven NLU and How to go Forward...
 

Dernier

Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 

Dernier (20)

Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 

But what do we actually know - On knowledge base recall

  • 1. But what do we actually know? On knowledge base recall Simon Razniewski Free University of Bozen-Bolzano, Italy
  • 2. Background • TU Dresden, Germany: Diplom (Master) 2010 • Free University of Bozen-Bolzano, Italy • 2011 - 2014: PhD (on reasoning about data completeness) • 2014 - now: Fixed term assistant professor • Research visits at UCSD (2012), AT&T Labs-Research (2013), UQ (2015) Bolzano Trilingual Ötzi 1/8th of EU apples 2
  • 3. How complete are knowledge bases? 3 =recall
  • 4. KBs are pretty incomplete DBpedia: contains 6 out of 35 Dijkstra Prize winners  YAGO: the average number of children per person is 0.02  Google Knowledge Graph: ``Points of Interest’’ – Completeness?  4
  • 5. KBs are pretty complete Wikidata: 2 out of 2 children of Obama  Google Knowledge Graph: 36 out of 48 Tarantino movies  DBpedia: 167 out of 199 Nobel laureates in Physics  5
  • 6. So, how complete are KBs? 6
  • 7. [Dong et al., KDD 2014] KB engineers have only tried to make KBs bigger. The point, however is to understand what they are actually trying to approximate. There are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – the ones we don't know we don't know. 7
  • 8. Knowledge Bases as seen by [Rumsfeld, 2002] Known knowns: The plain facts in a KB • Trump’s birth date • Hillary’s nationality • … Known unknown: The easy stuff • NULL values/blank nodes • Missing functional/mandatory values Unknown unknowns: The interesting rest • Are all children of John there? • Does Mary play a musical instrument? • Does Bob have other nationalities? • … 8 Not KB completion! • What other children does John have? • Which instruments does Mary play? • Which nationalities does Bob have?
  • 9. Outline 1. Assessing completeness from inside the KB a) Rule mining b) Classification 2. Assessing completeness using text c) Cardinalities d) Recall-aware information extraction 3. Presenting the completeness of KBs 4. The meaning of it all e) When is an entity complete? f) When is an entity more complete than another? g) Are interesting facts complete? 9
  • 10. 1. Asessing completeness from inside the KB 10
  • 11. 1a) Rule Mining [Galarraga et al., WSDM 2017] hockeyPlayer(x)  Incomplete(x, hasChild) scientist(x), hasWonNobelPrize(x)  Complete(x, graduatedFrom) Challenge: No proper theory for consensus across multiple rules human(x)  Complete(x, graduatedFrom) teacher(x)  Incomplete(x, graduatedFrom) professor(x)  Complete(x, graduatedFrom) John ∈ (human, teacher, professor)  Complete(John, graduatedFrom)?  Maybe the wrong approach? 11
  • 12. 1b) A classification problem Input: Entity e Predicate p Question: Are all triples (e, p, _) in the KB? Output: Yes/No Features: Facts, popularity measures, textual context, … Training data: Crowdsourcing under constraints, deletion, popularity, … Obama hasChild (Obama, hasChild , _) Yes (Wikidata) 12
  • 13. 2. Assessing completeness using text 13
  • 14. 2c) Cardinality extraction [Mirza et al., Poster@ISWC 2016] Text: “Barack and Michelle have two children, and […]” Manually created patterns to extract children cardinalities from Wikipedia  Found that about 2k entities have complete children, 84k have incomplete children  Found evidence for 178% more children than currently in Wikidata • Especially intriguing for long-tail entities Open: Automation, other relations KB: 0 KB: 1 KB: 2 Recall: 0% Recall: 50% Recall: 100% 14
  • 15. 2d) Recall-aware Information Extraction Textual information extraction is usually precision-aware “John was born in Malmö, Sweden, on […].” citizenship(John, Sweden) – precision 95% “John grew up in Malmö, Sweden and […]” citizenship(John, Sweden) – precision 80% What about making it recall-aware? “John has a son, Tom, and a daughter, Susan.” hc(John, Tom), hc(John, Susan) – recall? “John brought his children Susan and Tom to school.” hc(John, Tom), hc(John, Susan) – recall? 15
  • 16. 3. Presenting the completeness of KBs 16
  • 17. How complete is Wikidata for children? 17
  • 18. hasChild date of birth party membership …. Facets Occupation  Politician 7.5%  Soccer player 3.3%  Lawyer 8.1%  Other 2.2% Nationality  USA 3.8%  India 2.7%  China 2.2%  England 5.5%  … Century of birth  <15th century 1.1%  16th century 1.4%  … Gender  Male 4.3%  Female 3.9% Select attribute to analyse Extrapolated completeness: 30.8% Known completeness: 2.7% Based on: • There are 5371 people of this kind • For these, 231 have children • For these, Wikipedia says there should be 750 children • Average number of children of complete entities is 2.3 • Average number of children of unknown people is 0.01 • ….. 18
  • 19. 4. The meaning of it all 19
  • 20. 4e) When is data about an entity complete? Complete(Justin Bieber)? • Musician: birth date, musical instrument played, band • Scientist: alma mater, field, advisor, awards • Politician: Party, public positions held  What about musicians playing in an orchestra?  What about scientists that are also engaged in politics? ….. • Interestingness is relative (“birth date more interesting than handedness”) • Long tail of rare properties  Some work on ranking predicates by relevance Shortcoming: Mostly descriptive (see e.g. Wikidata Property Suggestor) 20
  • 21. 4f) When is an entity more complete than another? Is data about Obama more complete than about Trump? Goal: A notion of relative completeness Is data about Ronaldo more complete than about Justin Bieber? ….. Crowd studies: relative completeness = fact count? Available as user script for Wikidata (Recoin - Relative Completeness Indicator) 21
  • 23. 4g) Are interesting facts complete? LIGO: Proved gravitation waves that were predicted by Einstein 80 years ago Galileo Galilei: Contrary to the dogma of the time, postulates that the earth orbits the sun Reinhold Messner: First person to climb all mountains >8000mt without supplemental oxygen These are not elementary triples FirstPersonToClimbAllMountainsAbove8000Without(Supplemental oxygen, Reinhold Messner) 1. What are these? Events? Sets of triples? Queries? 2. Where can we get the interestingness score from? Entropy? Pagerank? Text frequency? 3. Completeness depends on completeness of context! 23
  • 24. Summary 1. Assessing completeness from inside the KB a) Rule mining b) Classification 2. Assessing completeness using text c) Cardinalities d) Recall-aware information extraction 3. Presenting the completeness of KBs 4. The meaning of it all e) When is an entity complete? f) When is an entity more complete than another? g) Are interesting facts complete? …meet me in room 416

Notes de l'éditeur

  1. VLDB, EDBT 2017 nearby
  2. Marx point: see what you are actually trying to approximate
  3. Show on the drawing what is the difference – see whether reached what aimed to approximate
  4. Mountains without toothbrush – not really interesting