SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
Oana Inel and Lora Aroyo
LDK 2019
Validation Methodology for Expert-Annotated
Datasets: Event Annotation Case Study
@oana_inel
https://oana-inel.github.io
crowdtruth.org
!1
• Traditionally performed by experts
• Difficult to define who are the experts
when it comes to events
• Typical evaluation: inter-annotator
agreement
However ...
Event
Annotation
!2
Insurers could see claims totaling
nearly $1 billion from the San
Francisco earthquake, far less than
the $4 billion from Hurricane Hugo.
Which are the events in this sentence?
!3
Insurers could see claims totaling
nearly $1 billion from the San
Francisco earthquake, far less than
the $4 billion from Hurricane Hugo.
Which are the right
events here?
Insurers could see claims totaling
nearly $1 billion from the San
Francisco earthquake, far less than
the $4 billion from Hurricane Hugo.
!4
Annotation
Guidelines Difficult to
Define
How can we ensure consistent
annotation of events?
How do we know when we have the
complete set of events?
!5
… let’s check some expert-annotated datasets
How well do they do in terms of consistency and
completeness?
!6
TempEval-3
SemEval 2013: Temporal Annotation Task
TempEval-3 Gold:
● 256 documents, 3953 sentences
● 11.129 events
● 1.822 time expressions
TempEval-3 Platinum:
● 20 documents, 273 sentences
● 746 events
● 138 time expressions
Experiments
https://www.cs.york.ac.uk/semeval-2013/task1/
UzZaman, N., Llorens, H., Derczynski, L., Allen, J., Verhagen, M. and Pustejovsky, J., 2013.
Semeval-2013 task 1: Tempeval-3: Evaluating time expressions, events, and temporal
relations. SemEval 2013, Vol. 2, pp. 1-9.
!7
Tokens have different types across datasets
His appointment to that post, which has senior administrative, staff and policy
responsibilities, followed a several-year tenure as Reuters's editor in chief .
During his tenure, he has increasingly unleashed biting comedic barbs against
his critics and political adversaries .
EVENT in
TempEval-3 Gold
TIME EXPRESSION in
TempEval-3 Platinum
Consistency
!8
Annotation guidelines for events are not
used consistently
Consistency
Hungary's accession to NATO has brought about new changes to the political
balance in central and eastern Europe, shattering the world configuration under
the Yalta accord since World War II.
Only single-token EVENTS
in TempEval-3 Gold
The report estimated that the number of passenger cars in China was on track to
hit 400 million by 2030, up from 90 million now.
Also multi-token EVENTS in
TempEval-3 Platinum
EVENTS composed of
NUMERALS were removed !9
Occurrences of the same previously
annotated event/time token are not
annotated by experts
Travelers Corp.'s third-quarter net income rose 11%, even though claims stemming from
Hurricane Hugo reduced results $40 million.
Insurers could see claims totaling nearly $1 billion from the San Francisco earthquake,
far less than the $4 billion from Hurricane Hugo.
EVENT in
TempEval-3 Gold
NOT an EVENT in
TempEval-3 Gold
Completeness
!10
Occurrences of the same previously
annotated event/time lemma are not
annotated by experts
The Windows-ready Kinect sensor, is currently selling for $250 through Microsoft, more
than twice what Microsoft charges for the gaming-only Xbox version.
Starting next year, the law will block insurers from refusing to sell coverage or setting
premiums based on people’s health histories.
EVENT in
TempEval-3 Platinum
NOT an EVENT in
TempEval-3 Platinum
Completeness
!11
Expert Datasets
Are Not always consistently annotated
May not always contain all events
Can the crowd help us to improve
these datasets?
… and by computing this in a
systematic way we observe that
expert-annotated datasets ...
!12
motivating the choice increases the accuracy of the results
providing explicit definitions increases the accuracy of the
results
large amounts of crowd workers perform as well as experts
Annotation
Guidelines
Annotation
Value
Number of
Annotators
Hypotheses
crowd provides consistent annotations when asked to validate
and add missing events
Input Entity
Values
!13
Pilot Experiments
to determine
Optimal
Crowdsourcing
Setting
Evaluate Pilot
Experiments
Main Experiment
with
Optimal
Crowdsourcing
Setting
Annotate
Events
in Sentences
Crowdsourcing Event Annotations
Annotation
Guidelines
Annotation
Value
Input Entity
Values
Number of
Annotators
!14
with 50 sentences from the TempEval-3 Platinum (P) dataset
Input Data
Entity
Type
Time
Expression
Event
Dataset
Platinum
Expert (G+P) &
Tools & Missing
Expert (P) &
Tools
Entity
Values
Controlled Variables
Annotators
20 workers
Figure Eight
English
Speaking
Which is the optimal setting?
16 PILOT EXPERIMENTS
Annotation
Value
Crowdsourcing Template
Annotation
Guidelines
Implicit Definition
Explicit Definition
Entities +
Motivation (NONE)
Entities
Entities +
Motivation (ALL) +
Highlight
Entities +
Motivation (ALL)
!15
OVERVIEW OF ALL PILOT EXPERIMENTS SETTINGS
Entity
Type
Dataset
Entity
Values
Platinum
Platinum
Platinum
Platinum
Platinum
Platinum
Platinum
Platinum
Expert (G+P) &
Tools & Missing
Expert (P) &
Tools
Expert (G+P) &
Tools & Missing
Expert (P) &
Tools
Expert (P) &
Tools
Expert (P) &
Tools
Expert (P) &
Tools
Expert (P) &
Tools
Annotation
Guidelines
Implicit Definition
Explicit Definition
Implicit Definition
Implicit Definition
Explicit Definition
Explicit Definition
Explicit Definition
Explicit Definition
Annotation
Value
Event
Entities +
Motivation (NONE)
Entities
Entities + Motivation
(ALL) + Highlight
Entities +
Motivation (ALL)
Entities +
Motivation (NONE)
Entities
Entities +
Motivation (ALL)
Entities +
Motivation (ALL)
Event
Event
Event
Event
Event
Event
Event
Annotators
Figure Eight
20 workers
English Speaking
!16
F1- score / TP
Entity
Type
Dataset
Entity
Values
Platinum
Platinum
Platinum
Platinum
Platinum
Platinum
Platinum
Platinum
Expert (G+P) &
Tools & Missing
Expert (P) &
Tools
Expert (G+P) &
Tools & Missing
Expert (P) &
Tools
Expert (P) &
Tools
Expert (P) &
Tools
Expert (P) &
Tools
Expert (P) &
Tools
Annotation
Guidelines
Implicit Definition
Explicit Definition
Implicit Definition
Implicit Definition
Explicit Definition
Explicit Definition
Explicit Definition
Explicit Definition
Annotation
Value
Event
Entities +
Motivation (NONE)
Entities
Entities + Motivation
(ALL) + Highlight
Entities +
Motivation (ALL)
Entities +
Motivation (NONE)
Entities
Entities +
Motivation (ALL)
Entities +
Motivation (ALL)
Event
Event
Event
Event
Event
Event
Event
0.89 / 154
0.88 / 159
0.89 / 157
0.89 / 152
0.88 / 164
0.90 / 161
0.84 / 156
0.83 / 155
crowd performs better when they are provided with explicit definitions
!17
F1- score / TP
Entity
Type
Dataset
Entity
Values
Platinum
Platinum
Platinum
Platinum
Platinum
Platinum
Platinum
Platinum
Expert (G+P) &
Tools & Missing
Expert (P) &
Tools
Expert (G+P) &
Tools & Missing
Expert (P) &
Tools
Expert (P) &
Tools
Expert (P) &
Tools
Expert (P) &
Tools
Expert (P) &
Tools
Annotation
Guidelines
Implicit Definition
Explicit Definition
Implicit Definition
Implicit Definition
Explicit Definition
Explicit Definition
Explicit Definition
Explicit Definition
Annotation
Value
Event
Entities +
Motivation (NONE)
Entities
Entities + Motivation
(ALL) + Highlight
Entities +
Motivation (ALL)
Entities +
Motivation (NONE)
Entities
Entities +
Motivation (ALL)
Entities +
Motivation (ALL)
Event
Event
Event
Event
Event
Event
Event
0.89 / 154
0.88 / 159
0.89 / 157
0.89 / 152
0.88 / 164
0.90 / 161
0.84 / 156
0.83 / 155
crowd performs better when answers are motivated
!18
What amount of workers gives accurate results?
event annotation: 12 workers perform well when compared to the experts
time expression annotation: 15 workers perform well when compared to the experts
PlatinumTimeEvent
Expert (G+P) &
Tools & Missing
Explicit Definition
Entities + Motivation
(ALL) + Highlight
!19
with 4.202 sentences from the TempEval-3 Gold (G) and TempEval-3 Platinum (P) datasets
MAIN EXPERIMENTS FOR EVENT ANNOTATION
Figure Eight
15 workers
English Speaking
4 cents per
Annotation
Gold & Platinum
Explicit Definition
Event
Entities + Motivation
(ALL) + Highlight
Expert (G+P) &
Tools & Missing
!20
Train and Evaluate ClearTK with Expert & Crowd Events
Train on Experts and Test on Experts: F1-score of the ClearTK tool is 0.788
Train on Experts and Test on Crowd: F1-score of the ClearTK tool is significantly better, around 0.83
Train on Crowd and Test on Experts: F1-score of the ClearTK tool is only almost as good (0.77)!21
Training and Evaluating ClearTK with Crowd Events
Train on Crowd and Test on Crowd: F1-score of the ClearTK tool reaches a maximum of 0.83
ClearTK performs well when trained and evaluated at similar crowd event-sentence score thresholds
!22
Contributions and Future Work
Contributions
● data-agnostic validation methodology of expert-annotated datasets in terms of
consistency and completeness
● 4,202 crowd-annotated English sentences from the TempEval-3 Gold and
TempEval-3 Platinum datasets with events
● 121 crowd-annotated sentences from the TempEval-3 Platinum dataset with time
expressions
● training and evaluating ClearTK with crowd-driven event annotations
Future Work
● replicate crowdsourcing experiment on time expressions
● investigate the role of sentence and event ambiguity in the training and
evaluation of event extraction systems !23
Data & Code
https://github.com/CrowdTruth/Event-Extraction
CrowdTruth Metrics
https://github.com/CrowdTruth/CrowdTruth-core
CrowdTruth Tutorial
http://crowdtruth.org/tutorial/
Resources
!24

Contenu connexe

Similaire à LDK2019: Validation methodology for expert-annotated datasets: Events Annotation Case Study

ROSeAnn Presentation
ROSeAnn PresentationROSeAnn Presentation
ROSeAnn PresentationDBOnto
 
Tech M&A Monthly: 2017 Midyear Report
Tech M&A Monthly: 2017 Midyear ReportTech M&A Monthly: 2017 Midyear Report
Tech M&A Monthly: 2017 Midyear ReportCorum Group
 
Philippe Borremans - How To Automate Boring Tasks & Increase Productivity In PR
Philippe Borremans - How To Automate Boring Tasks & Increase Productivity In PRPhilippe Borremans - How To Automate Boring Tasks & Increase Productivity In PR
Philippe Borremans - How To Automate Boring Tasks & Increase Productivity In PRNorsk kommunikasjonsforening
 
Cheap Or Expensive Custo. Online assignment writing service.
Cheap Or Expensive Custo. Online assignment writing service.Cheap Or Expensive Custo. Online assignment writing service.
Cheap Or Expensive Custo. Online assignment writing service.Tiffany Miller
 
Feeding the Beast-How Fraud Tools Bring Context into Authentication (Gartner ...
Feeding the Beast-How Fraud Tools Bring Context into Authentication (Gartner ...Feeding the Beast-How Fraud Tools Bring Context into Authentication (Gartner ...
Feeding the Beast-How Fraud Tools Bring Context into Authentication (Gartner ...TransUnion
 
“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...
“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...
“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...Quantopian
 
Haskel - Spillovers from public intangibles
Haskel - Spillovers from public intangiblesHaskel - Spillovers from public intangibles
Haskel - Spillovers from public intangiblesinnovationoecd
 
OECD Blue Skies Conference. Sept 2016
OECD Blue Skies Conference. Sept 2016OECD Blue Skies Conference. Sept 2016
OECD Blue Skies Conference. Sept 2016SPINTAN
 
Global IP Market Quick Update on the Secondary Market for Patents
Global IP Market Quick Update on the Secondary Market for PatentsGlobal IP Market Quick Update on the Secondary Market for Patents
Global IP Market Quick Update on the Secondary Market for PatentsErik Oliver
 
Price optimization for high-mix, low-volume environments | Using R and Tablea...
Price optimization for high-mix, low-volume environments | Using R and Tablea...Price optimization for high-mix, low-volume environments | Using R and Tablea...
Price optimization for high-mix, low-volume environments | Using R and Tablea...Wil Davis
 
SEE Gemeentedag: Petra de West, Roeland van Oers en Gökhan Tuna - Van selfser...
SEE Gemeentedag: Petra de West, Roeland van Oers en Gökhan Tuna - Van selfser...SEE Gemeentedag: Petra de West, Roeland van Oers en Gökhan Tuna - Van selfser...
SEE Gemeentedag: Petra de West, Roeland van Oers en Gökhan Tuna - Van selfser...TOPdesk
 
Example Of A Thesis Statement In An Expository Essay
Example Of A Thesis Statement In An Expository EssayExample Of A Thesis Statement In An Expository Essay
Example Of A Thesis Statement In An Expository EssayJill Swenson
 
Qraft company deck_202003_rf
Qraft company deck_202003_rfQraft company deck_202003_rf
Qraft company deck_202003_rf형식 김
 
Carolina Scarton - ESR 7 - USFD
Carolina Scarton - ESR 7 - USFD  Carolina Scarton - ESR 7 - USFD
Carolina Scarton - ESR 7 - USFD RIILP
 
2022 apidays LIVE Helsinki & North_Future proofing API Security
2022 apidays LIVE Helsinki & North_Future proofing API Security2022 apidays LIVE Helsinki & North_Future proofing API Security
2022 apidays LIVE Helsinki & North_Future proofing API Securityapidays
 
State of the US VC Market
State of the US VC MarketState of the US VC Market
State of the US VC MarketGGV Capital
 

Similaire à LDK2019: Validation methodology for expert-annotated datasets: Events Annotation Case Study (20)

ROSeAnn Presentation
ROSeAnn PresentationROSeAnn Presentation
ROSeAnn Presentation
 
Tech M&A Monthly: 2017 Midyear Report
Tech M&A Monthly: 2017 Midyear ReportTech M&A Monthly: 2017 Midyear Report
Tech M&A Monthly: 2017 Midyear Report
 
Philippe Borremans - How To Automate Boring Tasks & Increase Productivity In PR
Philippe Borremans - How To Automate Boring Tasks & Increase Productivity In PRPhilippe Borremans - How To Automate Boring Tasks & Increase Productivity In PR
Philippe Borremans - How To Automate Boring Tasks & Increase Productivity In PR
 
Cheap Or Expensive Custo. Online assignment writing service.
Cheap Or Expensive Custo. Online assignment writing service.Cheap Or Expensive Custo. Online assignment writing service.
Cheap Or Expensive Custo. Online assignment writing service.
 
Feeding the Beast-How Fraud Tools Bring Context into Authentication (Gartner ...
Feeding the Beast-How Fraud Tools Bring Context into Authentication (Gartner ...Feeding the Beast-How Fraud Tools Bring Context into Authentication (Gartner ...
Feeding the Beast-How Fraud Tools Bring Context into Authentication (Gartner ...
 
“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...
“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...
“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...
 
Haskel - Spillovers from public intangibles
Haskel - Spillovers from public intangiblesHaskel - Spillovers from public intangibles
Haskel - Spillovers from public intangibles
 
OECD Blue Skies Conference. Sept 2016
OECD Blue Skies Conference. Sept 2016OECD Blue Skies Conference. Sept 2016
OECD Blue Skies Conference. Sept 2016
 
Global IP Market Quick Update on the Secondary Market for Patents
Global IP Market Quick Update on the Secondary Market for PatentsGlobal IP Market Quick Update on the Secondary Market for Patents
Global IP Market Quick Update on the Secondary Market for Patents
 
Price optimization for high-mix, low-volume environments | Using R and Tablea...
Price optimization for high-mix, low-volume environments | Using R and Tablea...Price optimization for high-mix, low-volume environments | Using R and Tablea...
Price optimization for high-mix, low-volume environments | Using R and Tablea...
 
SEE Gemeentedag: Petra de West, Roeland van Oers en Gökhan Tuna - Van selfser...
SEE Gemeentedag: Petra de West, Roeland van Oers en Gökhan Tuna - Van selfser...SEE Gemeentedag: Petra de West, Roeland van Oers en Gökhan Tuna - Van selfser...
SEE Gemeentedag: Petra de West, Roeland van Oers en Gökhan Tuna - Van selfser...
 
Example Of A Thesis Statement In An Expository Essay
Example Of A Thesis Statement In An Expository EssayExample Of A Thesis Statement In An Expository Essay
Example Of A Thesis Statement In An Expository Essay
 
Present Naked
Present NakedPresent Naked
Present Naked
 
OnTheMove2015
OnTheMove2015OnTheMove2015
OnTheMove2015
 
Qraft company deck_202003_rf
Qraft company deck_202003_rfQraft company deck_202003_rf
Qraft company deck_202003_rf
 
Carolina Scarton - ESR 7 - USFD
Carolina Scarton - ESR 7 - USFD  Carolina Scarton - ESR 7 - USFD
Carolina Scarton - ESR 7 - USFD
 
Whbm14
Whbm14Whbm14
Whbm14
 
Whbm14
Whbm14Whbm14
Whbm14
 
2022 apidays LIVE Helsinki & North_Future proofing API Security
2022 apidays LIVE Helsinki & North_Future proofing API Security2022 apidays LIVE Helsinki & North_Future proofing API Security
2022 apidays LIVE Helsinki & North_Future proofing API Security
 
State of the US VC Market
State of the US VC MarketState of the US VC Market
State of the US VC Market
 

Plus de oanainel

Eliciting User Preferences for Personalized Explanations for Video Summaries
Eliciting User Preferences for Personalized Explanations for Video SummariesEliciting User Preferences for Personalized Explanations for Video Summaries
Eliciting User Preferences for Personalized Explanations for Video Summariesoanainel
 
Harnessing diversity in crowds and machines for better ner performance
Harnessing diversity in crowds and machines for better ner performanceHarnessing diversity in crowds and machines for better ner performance
Harnessing diversity in crowds and machines for better ner performanceoanainel
 
Boosting Named Entity Extraction through Crowdsourcing
Boosting Named Entity Extraction through CrowdsourcingBoosting Named Entity Extraction through Crowdsourcing
Boosting Named Entity Extraction through Crowdsourcingoanainel
 
Dive+@ICTOpen2017
Dive+@ICTOpen2017Dive+@ICTOpen2017
Dive+@ICTOpen2017oanainel
 
Towards Better Media Understanding and Searchability
Towards Better Media Understanding and SearchabilityTowards Better Media Understanding and Searchability
Towards Better Media Understanding and Searchabilityoanainel
 
ESWC - PhD Symposium 2016
ESWC - PhD Symposium 2016ESWC - PhD Symposium 2016
ESWC - PhD Symposium 2016oanainel
 
Harnessing the Power of Machines & Crowds for Event Extraction
Harnessing the Power of Machines & Crowds for Event ExtractionHarnessing the Power of Machines & Crowds for Event Extraction
Harnessing the Power of Machines & Crowds for Event Extractionoanainel
 

Plus de oanainel (7)

Eliciting User Preferences for Personalized Explanations for Video Summaries
Eliciting User Preferences for Personalized Explanations for Video SummariesEliciting User Preferences for Personalized Explanations for Video Summaries
Eliciting User Preferences for Personalized Explanations for Video Summaries
 
Harnessing diversity in crowds and machines for better ner performance
Harnessing diversity in crowds and machines for better ner performanceHarnessing diversity in crowds and machines for better ner performance
Harnessing diversity in crowds and machines for better ner performance
 
Boosting Named Entity Extraction through Crowdsourcing
Boosting Named Entity Extraction through CrowdsourcingBoosting Named Entity Extraction through Crowdsourcing
Boosting Named Entity Extraction through Crowdsourcing
 
Dive+@ICTOpen2017
Dive+@ICTOpen2017Dive+@ICTOpen2017
Dive+@ICTOpen2017
 
Towards Better Media Understanding and Searchability
Towards Better Media Understanding and SearchabilityTowards Better Media Understanding and Searchability
Towards Better Media Understanding and Searchability
 
ESWC - PhD Symposium 2016
ESWC - PhD Symposium 2016ESWC - PhD Symposium 2016
ESWC - PhD Symposium 2016
 
Harnessing the Power of Machines & Crowds for Event Extraction
Harnessing the Power of Machines & Crowds for Event ExtractionHarnessing the Power of Machines & Crowds for Event Extraction
Harnessing the Power of Machines & Crowds for Event Extraction
 

Dernier

Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 

Dernier (20)

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 

LDK2019: Validation methodology for expert-annotated datasets: Events Annotation Case Study

  • 1. Oana Inel and Lora Aroyo LDK 2019 Validation Methodology for Expert-Annotated Datasets: Event Annotation Case Study @oana_inel https://oana-inel.github.io crowdtruth.org !1
  • 2. • Traditionally performed by experts • Difficult to define who are the experts when it comes to events • Typical evaluation: inter-annotator agreement However ... Event Annotation !2
  • 3. Insurers could see claims totaling nearly $1 billion from the San Francisco earthquake, far less than the $4 billion from Hurricane Hugo. Which are the events in this sentence? !3
  • 4. Insurers could see claims totaling nearly $1 billion from the San Francisco earthquake, far less than the $4 billion from Hurricane Hugo. Which are the right events here? Insurers could see claims totaling nearly $1 billion from the San Francisco earthquake, far less than the $4 billion from Hurricane Hugo. !4
  • 5. Annotation Guidelines Difficult to Define How can we ensure consistent annotation of events? How do we know when we have the complete set of events? !5
  • 6. … let’s check some expert-annotated datasets How well do they do in terms of consistency and completeness? !6
  • 7. TempEval-3 SemEval 2013: Temporal Annotation Task TempEval-3 Gold: ● 256 documents, 3953 sentences ● 11.129 events ● 1.822 time expressions TempEval-3 Platinum: ● 20 documents, 273 sentences ● 746 events ● 138 time expressions Experiments https://www.cs.york.ac.uk/semeval-2013/task1/ UzZaman, N., Llorens, H., Derczynski, L., Allen, J., Verhagen, M. and Pustejovsky, J., 2013. Semeval-2013 task 1: Tempeval-3: Evaluating time expressions, events, and temporal relations. SemEval 2013, Vol. 2, pp. 1-9. !7
  • 8. Tokens have different types across datasets His appointment to that post, which has senior administrative, staff and policy responsibilities, followed a several-year tenure as Reuters's editor in chief . During his tenure, he has increasingly unleashed biting comedic barbs against his critics and political adversaries . EVENT in TempEval-3 Gold TIME EXPRESSION in TempEval-3 Platinum Consistency !8
  • 9. Annotation guidelines for events are not used consistently Consistency Hungary's accession to NATO has brought about new changes to the political balance in central and eastern Europe, shattering the world configuration under the Yalta accord since World War II. Only single-token EVENTS in TempEval-3 Gold The report estimated that the number of passenger cars in China was on track to hit 400 million by 2030, up from 90 million now. Also multi-token EVENTS in TempEval-3 Platinum EVENTS composed of NUMERALS were removed !9
  • 10. Occurrences of the same previously annotated event/time token are not annotated by experts Travelers Corp.'s third-quarter net income rose 11%, even though claims stemming from Hurricane Hugo reduced results $40 million. Insurers could see claims totaling nearly $1 billion from the San Francisco earthquake, far less than the $4 billion from Hurricane Hugo. EVENT in TempEval-3 Gold NOT an EVENT in TempEval-3 Gold Completeness !10
  • 11. Occurrences of the same previously annotated event/time lemma are not annotated by experts The Windows-ready Kinect sensor, is currently selling for $250 through Microsoft, more than twice what Microsoft charges for the gaming-only Xbox version. Starting next year, the law will block insurers from refusing to sell coverage or setting premiums based on people’s health histories. EVENT in TempEval-3 Platinum NOT an EVENT in TempEval-3 Platinum Completeness !11
  • 12. Expert Datasets Are Not always consistently annotated May not always contain all events Can the crowd help us to improve these datasets? … and by computing this in a systematic way we observe that expert-annotated datasets ... !12
  • 13. motivating the choice increases the accuracy of the results providing explicit definitions increases the accuracy of the results large amounts of crowd workers perform as well as experts Annotation Guidelines Annotation Value Number of Annotators Hypotheses crowd provides consistent annotations when asked to validate and add missing events Input Entity Values !13
  • 14. Pilot Experiments to determine Optimal Crowdsourcing Setting Evaluate Pilot Experiments Main Experiment with Optimal Crowdsourcing Setting Annotate Events in Sentences Crowdsourcing Event Annotations Annotation Guidelines Annotation Value Input Entity Values Number of Annotators !14
  • 15. with 50 sentences from the TempEval-3 Platinum (P) dataset Input Data Entity Type Time Expression Event Dataset Platinum Expert (G+P) & Tools & Missing Expert (P) & Tools Entity Values Controlled Variables Annotators 20 workers Figure Eight English Speaking Which is the optimal setting? 16 PILOT EXPERIMENTS Annotation Value Crowdsourcing Template Annotation Guidelines Implicit Definition Explicit Definition Entities + Motivation (NONE) Entities Entities + Motivation (ALL) + Highlight Entities + Motivation (ALL) !15
  • 16. OVERVIEW OF ALL PILOT EXPERIMENTS SETTINGS Entity Type Dataset Entity Values Platinum Platinum Platinum Platinum Platinum Platinum Platinum Platinum Expert (G+P) & Tools & Missing Expert (P) & Tools Expert (G+P) & Tools & Missing Expert (P) & Tools Expert (P) & Tools Expert (P) & Tools Expert (P) & Tools Expert (P) & Tools Annotation Guidelines Implicit Definition Explicit Definition Implicit Definition Implicit Definition Explicit Definition Explicit Definition Explicit Definition Explicit Definition Annotation Value Event Entities + Motivation (NONE) Entities Entities + Motivation (ALL) + Highlight Entities + Motivation (ALL) Entities + Motivation (NONE) Entities Entities + Motivation (ALL) Entities + Motivation (ALL) Event Event Event Event Event Event Event Annotators Figure Eight 20 workers English Speaking !16
  • 17. F1- score / TP Entity Type Dataset Entity Values Platinum Platinum Platinum Platinum Platinum Platinum Platinum Platinum Expert (G+P) & Tools & Missing Expert (P) & Tools Expert (G+P) & Tools & Missing Expert (P) & Tools Expert (P) & Tools Expert (P) & Tools Expert (P) & Tools Expert (P) & Tools Annotation Guidelines Implicit Definition Explicit Definition Implicit Definition Implicit Definition Explicit Definition Explicit Definition Explicit Definition Explicit Definition Annotation Value Event Entities + Motivation (NONE) Entities Entities + Motivation (ALL) + Highlight Entities + Motivation (ALL) Entities + Motivation (NONE) Entities Entities + Motivation (ALL) Entities + Motivation (ALL) Event Event Event Event Event Event Event 0.89 / 154 0.88 / 159 0.89 / 157 0.89 / 152 0.88 / 164 0.90 / 161 0.84 / 156 0.83 / 155 crowd performs better when they are provided with explicit definitions !17
  • 18. F1- score / TP Entity Type Dataset Entity Values Platinum Platinum Platinum Platinum Platinum Platinum Platinum Platinum Expert (G+P) & Tools & Missing Expert (P) & Tools Expert (G+P) & Tools & Missing Expert (P) & Tools Expert (P) & Tools Expert (P) & Tools Expert (P) & Tools Expert (P) & Tools Annotation Guidelines Implicit Definition Explicit Definition Implicit Definition Implicit Definition Explicit Definition Explicit Definition Explicit Definition Explicit Definition Annotation Value Event Entities + Motivation (NONE) Entities Entities + Motivation (ALL) + Highlight Entities + Motivation (ALL) Entities + Motivation (NONE) Entities Entities + Motivation (ALL) Entities + Motivation (ALL) Event Event Event Event Event Event Event 0.89 / 154 0.88 / 159 0.89 / 157 0.89 / 152 0.88 / 164 0.90 / 161 0.84 / 156 0.83 / 155 crowd performs better when answers are motivated !18
  • 19. What amount of workers gives accurate results? event annotation: 12 workers perform well when compared to the experts time expression annotation: 15 workers perform well when compared to the experts PlatinumTimeEvent Expert (G+P) & Tools & Missing Explicit Definition Entities + Motivation (ALL) + Highlight !19
  • 20. with 4.202 sentences from the TempEval-3 Gold (G) and TempEval-3 Platinum (P) datasets MAIN EXPERIMENTS FOR EVENT ANNOTATION Figure Eight 15 workers English Speaking 4 cents per Annotation Gold & Platinum Explicit Definition Event Entities + Motivation (ALL) + Highlight Expert (G+P) & Tools & Missing !20
  • 21. Train and Evaluate ClearTK with Expert & Crowd Events Train on Experts and Test on Experts: F1-score of the ClearTK tool is 0.788 Train on Experts and Test on Crowd: F1-score of the ClearTK tool is significantly better, around 0.83 Train on Crowd and Test on Experts: F1-score of the ClearTK tool is only almost as good (0.77)!21
  • 22. Training and Evaluating ClearTK with Crowd Events Train on Crowd and Test on Crowd: F1-score of the ClearTK tool reaches a maximum of 0.83 ClearTK performs well when trained and evaluated at similar crowd event-sentence score thresholds !22
  • 23. Contributions and Future Work Contributions ● data-agnostic validation methodology of expert-annotated datasets in terms of consistency and completeness ● 4,202 crowd-annotated English sentences from the TempEval-3 Gold and TempEval-3 Platinum datasets with events ● 121 crowd-annotated sentences from the TempEval-3 Platinum dataset with time expressions ● training and evaluating ClearTK with crowd-driven event annotations Future Work ● replicate crowdsourcing experiment on time expressions ● investigate the role of sentence and event ambiguity in the training and evaluation of event extraction systems !23
  • 24. Data & Code https://github.com/CrowdTruth/Event-Extraction CrowdTruth Metrics https://github.com/CrowdTruth/CrowdTruth-core CrowdTruth Tutorial http://crowdtruth.org/tutorial/ Resources !24