SlideShare une entreprise Scribd logo
1  sur  1
Télécharger pour lire hors ligne
Cardinal Virtues: Extracting Relation Cardinalities from Text
Paramita Mirza1, Simon Razniewski2, Fariz Darari2 and Gerhard Weikum1
1 Max Planck Institute for Informatics, Germany
2 Free University of Bozen-Bolzano, Italy
1. Overview
• IE has largely focused on answering “Who has won which award?”
• However, some facts are never fully mentioned and no IE method has perfect recall
• Sentences like “John lives with his spouse and 5 children on a farm in Alabama” are
much more frequent in texts.
• We focus instead on answering “How many awards has someone won?”
• Useful for aggregate query answering, e.g., “Who won the most awards?”
• Contributions:
• We introduce the problem of Relation Cardinality Extraction
• We present a distant supervision method using Conditional Random Fields
• We discuss specific challenges that set it apart from standard IE
Relation Cardinality
a mention that expresses relation cardinality
is
a cardinal number that states the number of objects
that stand in a specific relation with a certain subject
“Barack and Michelle Obama have two children, which are currently ….”
Acknowledgment
This work has been partially supported by the projects “TCFR - The Call for Recall”, funded by the Free University of Bozen-Bolzano.
2. Motivation A: Knowledge Base (KB) curation
KB recall is highly variant and mostly unknown 
“Barack and Michelle Obama have two children, which are currently ….”
4. Relation Cardinality Extraction
“Given a well defined relation/predicate p, a subject s and a correspondingtext about s,
we try to estimate the relation cardinality,
i.e., the count of <s, p, *> triples”
Methodology
• Sequence labelling problem:
Barack and Michelle Obama have two children , which are currently ….
Barack and Michelle Obama have _num_ child , which be currently …  lemma
O O O O O CHILD O O O O O
• Conditional Random Fields (CRF) model using CRF++ (Kudo, 2005)
• Feature set: lemma of observed token t, context lemmas (windows size = 5),
bigrams and trigrams containing t
• Distant supervision for generatingtraining data
• Given an <s, p> pair we identify:
‐ the triple count |<s, p, *>| from Wikidata(Vrandečić and Krötzsch, 2014);
and
‐ candidatesentences from English Wikipedia article of s
‐ candidatenumbers (not labelled as TEMPORAL, MONEY or PERCENT) in each
sentence (if any)
• We generatetraining examples by labellinga candidatenumber n with p if
n = |<s, p, *>|, otherwise, it is labelled as O, like the rest of non-number tokens
• Prediction
• Having the annotated sentences by the CRF-based model,
• Relation cardinality for a given <s, p> pair is the candidatenumber labelled with
p, which has the highest confidencescore (i.e., marginal probability of a token
labelled as such, resulting from forward-backward inference)
Experiments
• Evaluation on manually annotated randomly sampled subjects for 4 Wikidata properties:
20 (has part), 100 (contains admin.) and 200 (child and spouse)
• baseline: randomly select a number from a pool of numbers in text
• only nummod: consider only candidate numbers that modify a noun
KB: 0 KB: 1 KB: 2
Recall: 0% Recall: 50% Recall: 100%
3. Motivation B: Disregarded by state-of-the-art (Open) IE systems
Despite its frequency 
• Open IE (Mausam et al. 2012; Del Corro and Gemulla, 2013)
• No way to interpretthe numeric expression in the Object slot , e.g., <Obama, has,
two children>
• KB-population IE, e.g., NELL (Mitchellet al., 2015)
• Knows 13 relations about the number of casualties and injuries in disasters, e.g.,
<Berlin2016attack, hasNumOfVictims, 32>
• Contains only seed facts and no learned facts
Stanford Named Entity (NE) tagger on cardinal numbers in 10K Wikipedia articles
July 30-August 4, 2017 ∙ Vancouver, Canada
DBpedia contains currently
only 6 out of 35 Dijkstra
Prize winners 
According to YAGO, the average number of
children per person is 0.02 
167 out of 199 Nobel laureates
in Physics are in DBpedia ☺
2 out of 2 children of Obama are in
Wikidata ☺
5. Challenges in Relation Cardinality Extraction
Quality of Training Data
• Distant supervision from highly incompleteKB
• e.g., manual annotation on child evaluation set  Wikidata is only ±50% accurate.
• Unlike in classical IE, missing ground truth may lead to false positives as well.
• Possible approaches:
• Filtering ground truth  consider only popular entities for training.
• Incompleteness-resilient distant supervision  label all numbers equal or higher
than the KB count as positive examples.
Compositionality
• “They have two sons and one daughter together; he has four children from his first wife.”
• 16% of false positives in extracting child cardinalities
• Possible approaches:
• Aggregating numbers  in training data generation, label a sequence of numbers
as correct cardinalitiesif the sum is equal to the KB count; in prediction step, sum
up all consecutivecardinalities.
• Learning composition rules  e.g., children are composed of sons and daughters.
Linguistic Variance
• Ordinals are quite common to express lower bounds, e.g., John’s first wife, Mary, …”.
• Relation cardinalities are sometimes expressed with non-numerals, e.g., “He never married”,
“They have a daughter together”, “The book is a trilogy”.
• Possible approaches:
• Translation to numbers  translate certain kinds of negation and indefinitearticles
into expressions containing 0 and 1.
• Word similarity with cardinals  consider words bear high similaritywith cardinal
numbers, possibly in other language such as Latin or Greek.
p #s train
baseline vanilla only nummod
P P R F1 P R F1
has part (creative work series) 261 .050 .333 .316 .324 .353 .316 .333
contains admin 18,000 .034 .390 .188 .254 .548 .200 .293
spouse 45,917 0 .014 .011 .013 .028 .017 .021
child 35,057 .112 .151 .129 .139 .320 .219 .260
child (manual ground truth) 6,408 .374 .309 .338 .452 .315 .317
Further Reading
• Predicting Completeness in Knowledge Bases, Luis Galárraga, Simon Razniewski, Antoine Amarilli,
Fabian M. Suchanek, WSDM, Cambridge, UK, 2017
• Expanding Wikidata’s Parenthood Information by 178%, or How To Mine Relation Cardinalities,
Paramita Mirza, Simon Razniewski, Werner Nutt, ISWC Poster, Osaka, Japan, 2016
• But What Do We Actually Know?, Simon Razniewski, Fabian Suchanek, Werner Nutt, AKBC
workshop at NAACL, San Diego, USA, 2016
• Identifying the Extent of Completeness of Query Answers over Partially Complete Databases, Simon
Razniewski, Flip Korn, Werner Nutt, Divesh Srivastava, SIGMOD, Melbourne, Australia, 2015
• A tool for crowdsourced completeness annotations for Wikidata: http://cool-wd.inf.unibz.it/
18.86%

Contenu connexe

Similaire à Paramita Mirza - 2017 - Cardinal Virtues: Extracting Relation Cardinalities from Text

Conceptual Plurality in Japanese EFL Learners' Online Sentence Processing: A ...
Conceptual Plurality in Japanese EFL Learners' Online Sentence Processing: A ...Conceptual Plurality in Japanese EFL Learners' Online Sentence Processing: A ...
Conceptual Plurality in Japanese EFL Learners' Online Sentence Processing: A ...Yu Tamura
 
Second and foreign language data
Second and foreign language dataSecond and foreign language data
Second and foreign language dataVivaAs
 
Logic programming (1)
Logic programming (1)Logic programming (1)
Logic programming (1)Nitesh Singh
 
Algebra__Day1.ppt
Algebra__Day1.pptAlgebra__Day1.ppt
Algebra__Day1.pptsusskind004
 
The Science and Art of Qualitative Writing and Analysis UWI 2023
The Science and Art of Qualitative Writing and Analysis UWI 2023The Science and Art of Qualitative Writing and Analysis UWI 2023
The Science and Art of Qualitative Writing and Analysis UWI 2023ssuserfefc95
 
Quantitative analysis in language research
Quantitative analysis in language researchQuantitative analysis in language research
Quantitative analysis in language researchCarlo Magno
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)Marina Santini
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
Harnessing Linked Knowledge Sources for Topic Classification in Social Media
Harnessing Linked Knowledge Sources for Topic Classification in Social MediaHarnessing Linked Knowledge Sources for Topic Classification in Social Media
Harnessing Linked Knowledge Sources for Topic Classification in Social MediaAmparo Elizabeth Cano Basave
 
Meeting the needs of slife de capua sc 09 03-15
Meeting the needs of slife de capua sc 09 03-15 Meeting the needs of slife de capua sc 09 03-15
Meeting the needs of slife de capua sc 09 03-15 Andrea DeCapua
 
Please review this assignment tutorial for help filling out this w.docx
Please review this assignment tutorial for help filling out this w.docxPlease review this assignment tutorial for help filling out this w.docx
Please review this assignment tutorial for help filling out this w.docxmattjtoni51554
 
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...Iman Mirrezaei
 
Babak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entitiesBabak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entitiesZoltan Varju
 
Big Data LDN 2017: Machine Learning on Structured Data. Why Is Learning Rules...
Big Data LDN 2017: Machine Learning on Structured Data. Why Is Learning Rules...Big Data LDN 2017: Machine Learning on Structured Data. Why Is Learning Rules...
Big Data LDN 2017: Machine Learning on Structured Data. Why Is Learning Rules...Matt Stubbs
 
DATA641 Lecture 3 - Word meaning.pptx
DATA641 Lecture 3 - Word meaning.pptxDATA641 Lecture 3 - Word meaning.pptx
DATA641 Lecture 3 - Word meaning.pptxDrPraveenPawar
 
Measuring massive multitask language understanding
Measuring massive multitask language understandingMeasuring massive multitask language understanding
Measuring massive multitask language understandingSan Kim
 
LiteracyOutcomesLADT_Poster_FINAL
LiteracyOutcomesLADT_Poster_FINALLiteracyOutcomesLADT_Poster_FINAL
LiteracyOutcomesLADT_Poster_FINALSamantha Kienemund
 
Modeling health related topics in an online forum designed for the deaf & har...
Modeling health related topics in an online forum designed for the deaf & har...Modeling health related topics in an online forum designed for the deaf & har...
Modeling health related topics in an online forum designed for the deaf & har...Hang Dong
 
Mathematics: skills, understanding or both?
Mathematics: skills, understanding or both?Mathematics: skills, understanding or both?
Mathematics: skills, understanding or both?Christian Bokhove
 
Are children with_specific_language_impairment_competent_with_the_pragmatics_...
Are children with_specific_language_impairment_competent_with_the_pragmatics_...Are children with_specific_language_impairment_competent_with_the_pragmatics_...
Are children with_specific_language_impairment_competent_with_the_pragmatics_...Dimika84
 

Similaire à Paramita Mirza - 2017 - Cardinal Virtues: Extracting Relation Cardinalities from Text (20)

Conceptual Plurality in Japanese EFL Learners' Online Sentence Processing: A ...
Conceptual Plurality in Japanese EFL Learners' Online Sentence Processing: A ...Conceptual Plurality in Japanese EFL Learners' Online Sentence Processing: A ...
Conceptual Plurality in Japanese EFL Learners' Online Sentence Processing: A ...
 
Second and foreign language data
Second and foreign language dataSecond and foreign language data
Second and foreign language data
 
Logic programming (1)
Logic programming (1)Logic programming (1)
Logic programming (1)
 
Algebra__Day1.ppt
Algebra__Day1.pptAlgebra__Day1.ppt
Algebra__Day1.ppt
 
The Science and Art of Qualitative Writing and Analysis UWI 2023
The Science and Art of Qualitative Writing and Analysis UWI 2023The Science and Art of Qualitative Writing and Analysis UWI 2023
The Science and Art of Qualitative Writing and Analysis UWI 2023
 
Quantitative analysis in language research
Quantitative analysis in language researchQuantitative analysis in language research
Quantitative analysis in language research
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Harnessing Linked Knowledge Sources for Topic Classification in Social Media
Harnessing Linked Knowledge Sources for Topic Classification in Social MediaHarnessing Linked Knowledge Sources for Topic Classification in Social Media
Harnessing Linked Knowledge Sources for Topic Classification in Social Media
 
Meeting the needs of slife de capua sc 09 03-15
Meeting the needs of slife de capua sc 09 03-15 Meeting the needs of slife de capua sc 09 03-15
Meeting the needs of slife de capua sc 09 03-15
 
Please review this assignment tutorial for help filling out this w.docx
Please review this assignment tutorial for help filling out this w.docxPlease review this assignment tutorial for help filling out this w.docx
Please review this assignment tutorial for help filling out this w.docx
 
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
 
Babak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entitiesBabak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entities
 
Big Data LDN 2017: Machine Learning on Structured Data. Why Is Learning Rules...
Big Data LDN 2017: Machine Learning on Structured Data. Why Is Learning Rules...Big Data LDN 2017: Machine Learning on Structured Data. Why Is Learning Rules...
Big Data LDN 2017: Machine Learning on Structured Data. Why Is Learning Rules...
 
DATA641 Lecture 3 - Word meaning.pptx
DATA641 Lecture 3 - Word meaning.pptxDATA641 Lecture 3 - Word meaning.pptx
DATA641 Lecture 3 - Word meaning.pptx
 
Measuring massive multitask language understanding
Measuring massive multitask language understandingMeasuring massive multitask language understanding
Measuring massive multitask language understanding
 
LiteracyOutcomesLADT_Poster_FINAL
LiteracyOutcomesLADT_Poster_FINALLiteracyOutcomesLADT_Poster_FINAL
LiteracyOutcomesLADT_Poster_FINAL
 
Modeling health related topics in an online forum designed for the deaf & har...
Modeling health related topics in an online forum designed for the deaf & har...Modeling health related topics in an online forum designed for the deaf & har...
Modeling health related topics in an online forum designed for the deaf & har...
 
Mathematics: skills, understanding or both?
Mathematics: skills, understanding or both?Mathematics: skills, understanding or both?
Mathematics: skills, understanding or both?
 
Are children with_specific_language_impairment_competent_with_the_pragmatics_...
Are children with_specific_language_impairment_competent_with_the_pragmatics_...Are children with_specific_language_impairment_competent_with_the_pragmatics_...
Are children with_specific_language_impairment_competent_with_the_pragmatics_...
 

Plus de Association for Computational Linguistics

Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Association for Computational Linguistics
 
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Association for Computational Linguistics
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsAssociation for Computational Linguistics
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsAssociation for Computational Linguistics
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Association for Computational Linguistics
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Association for Computational Linguistics
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Association for Computational Linguistics
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopAssociation for Computational Linguistics
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...Association for Computational Linguistics
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...Association for Computational Linguistics
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Association for Computational Linguistics
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Association for Computational Linguistics
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopAssociation for Computational Linguistics
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Association for Computational Linguistics
 

Plus de Association for Computational Linguistics (20)

Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal TextMuis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
 
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
 
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour AnalysisCastro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
 
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
 
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text SimplificationElior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
 

Dernier

Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdfssuserdda66b
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 

Dernier (20)

Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 

Paramita Mirza - 2017 - Cardinal Virtues: Extracting Relation Cardinalities from Text

  • 1. Cardinal Virtues: Extracting Relation Cardinalities from Text Paramita Mirza1, Simon Razniewski2, Fariz Darari2 and Gerhard Weikum1 1 Max Planck Institute for Informatics, Germany 2 Free University of Bozen-Bolzano, Italy 1. Overview • IE has largely focused on answering “Who has won which award?” • However, some facts are never fully mentioned and no IE method has perfect recall • Sentences like “John lives with his spouse and 5 children on a farm in Alabama” are much more frequent in texts. • We focus instead on answering “How many awards has someone won?” • Useful for aggregate query answering, e.g., “Who won the most awards?” • Contributions: • We introduce the problem of Relation Cardinality Extraction • We present a distant supervision method using Conditional Random Fields • We discuss specific challenges that set it apart from standard IE Relation Cardinality a mention that expresses relation cardinality is a cardinal number that states the number of objects that stand in a specific relation with a certain subject “Barack and Michelle Obama have two children, which are currently ….” Acknowledgment This work has been partially supported by the projects “TCFR - The Call for Recall”, funded by the Free University of Bozen-Bolzano. 2. Motivation A: Knowledge Base (KB) curation KB recall is highly variant and mostly unknown  “Barack and Michelle Obama have two children, which are currently ….” 4. Relation Cardinality Extraction “Given a well defined relation/predicate p, a subject s and a correspondingtext about s, we try to estimate the relation cardinality, i.e., the count of <s, p, *> triples” Methodology • Sequence labelling problem: Barack and Michelle Obama have two children , which are currently …. Barack and Michelle Obama have _num_ child , which be currently …  lemma O O O O O CHILD O O O O O • Conditional Random Fields (CRF) model using CRF++ (Kudo, 2005) • Feature set: lemma of observed token t, context lemmas (windows size = 5), bigrams and trigrams containing t • Distant supervision for generatingtraining data • Given an <s, p> pair we identify: ‐ the triple count |<s, p, *>| from Wikidata(Vrandečić and Krötzsch, 2014); and ‐ candidatesentences from English Wikipedia article of s ‐ candidatenumbers (not labelled as TEMPORAL, MONEY or PERCENT) in each sentence (if any) • We generatetraining examples by labellinga candidatenumber n with p if n = |<s, p, *>|, otherwise, it is labelled as O, like the rest of non-number tokens • Prediction • Having the annotated sentences by the CRF-based model, • Relation cardinality for a given <s, p> pair is the candidatenumber labelled with p, which has the highest confidencescore (i.e., marginal probability of a token labelled as such, resulting from forward-backward inference) Experiments • Evaluation on manually annotated randomly sampled subjects for 4 Wikidata properties: 20 (has part), 100 (contains admin.) and 200 (child and spouse) • baseline: randomly select a number from a pool of numbers in text • only nummod: consider only candidate numbers that modify a noun KB: 0 KB: 1 KB: 2 Recall: 0% Recall: 50% Recall: 100% 3. Motivation B: Disregarded by state-of-the-art (Open) IE systems Despite its frequency  • Open IE (Mausam et al. 2012; Del Corro and Gemulla, 2013) • No way to interpretthe numeric expression in the Object slot , e.g., <Obama, has, two children> • KB-population IE, e.g., NELL (Mitchellet al., 2015) • Knows 13 relations about the number of casualties and injuries in disasters, e.g., <Berlin2016attack, hasNumOfVictims, 32> • Contains only seed facts and no learned facts Stanford Named Entity (NE) tagger on cardinal numbers in 10K Wikipedia articles July 30-August 4, 2017 ∙ Vancouver, Canada DBpedia contains currently only 6 out of 35 Dijkstra Prize winners  According to YAGO, the average number of children per person is 0.02  167 out of 199 Nobel laureates in Physics are in DBpedia ☺ 2 out of 2 children of Obama are in Wikidata ☺ 5. Challenges in Relation Cardinality Extraction Quality of Training Data • Distant supervision from highly incompleteKB • e.g., manual annotation on child evaluation set  Wikidata is only ±50% accurate. • Unlike in classical IE, missing ground truth may lead to false positives as well. • Possible approaches: • Filtering ground truth  consider only popular entities for training. • Incompleteness-resilient distant supervision  label all numbers equal or higher than the KB count as positive examples. Compositionality • “They have two sons and one daughter together; he has four children from his first wife.” • 16% of false positives in extracting child cardinalities • Possible approaches: • Aggregating numbers  in training data generation, label a sequence of numbers as correct cardinalitiesif the sum is equal to the KB count; in prediction step, sum up all consecutivecardinalities. • Learning composition rules  e.g., children are composed of sons and daughters. Linguistic Variance • Ordinals are quite common to express lower bounds, e.g., John’s first wife, Mary, …”. • Relation cardinalities are sometimes expressed with non-numerals, e.g., “He never married”, “They have a daughter together”, “The book is a trilogy”. • Possible approaches: • Translation to numbers  translate certain kinds of negation and indefinitearticles into expressions containing 0 and 1. • Word similarity with cardinals  consider words bear high similaritywith cardinal numbers, possibly in other language such as Latin or Greek. p #s train baseline vanilla only nummod P P R F1 P R F1 has part (creative work series) 261 .050 .333 .316 .324 .353 .316 .333 contains admin 18,000 .034 .390 .188 .254 .548 .200 .293 spouse 45,917 0 .014 .011 .013 .028 .017 .021 child 35,057 .112 .151 .129 .139 .320 .219 .260 child (manual ground truth) 6,408 .374 .309 .338 .452 .315 .317 Further Reading • Predicting Completeness in Knowledge Bases, Luis Galárraga, Simon Razniewski, Antoine Amarilli, Fabian M. Suchanek, WSDM, Cambridge, UK, 2017 • Expanding Wikidata’s Parenthood Information by 178%, or How To Mine Relation Cardinalities, Paramita Mirza, Simon Razniewski, Werner Nutt, ISWC Poster, Osaka, Japan, 2016 • But What Do We Actually Know?, Simon Razniewski, Fabian Suchanek, Werner Nutt, AKBC workshop at NAACL, San Diego, USA, 2016 • Identifying the Extent of Completeness of Query Answers over Partially Complete Databases, Simon Razniewski, Flip Korn, Werner Nutt, Divesh Srivastava, SIGMOD, Melbourne, Australia, 2015 • A tool for crowdsourced completeness annotations for Wikidata: http://cool-wd.inf.unibz.it/ 18.86%