SlideShare une entreprise Scribd logo
1  sur  19
Machine-Crowd Annotation Workflow
for Event Understanding
across Collections & Domains
Oana Inel Extended Semantic Web Conference
PhD Symposium
May 30th 2016
Too much information ...
e.g., if you are interested in the topic of “whaling”
2
… and after a while it all looks the same
it is difficult to form a global picture on a topic
3
… thus, content without context is difficult to process
events can help create context around content
4
…, but events are not easy to deal with
• Events are vague
• Event semantics are difficult
• Events can be viewed and interpreted from multiple perspectives and interpretations
e.g. of participants interpretation: The mayor of the city called the celebration a success.
• Events can be presented at different levels of granularities
e.g. of spatial disagreement: The celebration took place in every city in the Netherlands.
• People are not consistent in the way they talk about or use events
e.g.: The celebration took place last week, fireworks shows were held everywhere.
5
… a lot of ground truth is needed to learn event specifics
• Traditional ground truth collection doesn’t scale:
• there is not really ‘one type of experts’ when it comes to events
• the annotation guidelines for events are difficult to define
• the annotation of events can be a tedious process
• all of the above can result in high inter-annotator disagreement
• Crowdsourcing could be an alternative
• but is still not a robust & replicable approach
6
… let’s look at some examples
According to department policy prosecutors must make
a strong showing that lawyers' fees came from assets
tainted by illegal profits before any attempts at seizure
are made.
The unit makes intravenous pumps used by hospitals
and had more than $110 million in sales last year
according to Advanced Medical.
7
… here is what experts annotate on these sentences
[According] to department policy prosecutors must make
a strong [showing] that lawyers' fees [came] from assets
tainted by illegal profits before any [attempts] at [seizure]
are [made].
The unit makes intravenous pumps used by hospitals
and [had] more than $110 million in [sales] last year
according to Advanced Medical.
8
… here is what the crowd annotates on them
According to department policy prosecutors must make
a [strong [showing]] that lawyers' fees [[came] from
assets] [tainted] by illegal profits before any [attempts] at
[seizure] are [made].
The unit [makes] intravenous pumps [used] by hospitals
and [[had] more than $110 million in [sales]] last year
according to Advanced Medical.
9
… here is what the machines can detect
According to department policy prosecutors must [make]
a strong showing that lawyers' fees [came] from assets
[tainted] by illegal profits before any attempts at seizure
are made.
The unit [makes] intravenous pumps [used] by hospitals
and [had] more than $110 million in sales last year
according to Advanced Medical.
10
Research Questions
• Can crowdsourcing help in improving event detection?
• Can we provide reliable crowdsourced training data?
• Can we optimize the crowdsourcing process by using results from
NLP tools?
• Can we achieve a replicable data collection process across different
data types and use cases?
11
Current Hypothesis:
Disagreement-based approach to crowdsource ground truth
is reliable and produces quality results
12
Preliminary Results - Crowd vs. Experts
● 200 news snippets from TimeBank● 3019 tweets published in 2014
● potential relevant tweets for events such as ‘whaling’,
‘Davos 2014’ among others
CrowdTruth approach outperforms the-state-of-the-art
crowdsourcing approaches such as single annotator and
majority vote
The crowd performs almost as good as the experts due to
very linguistic-specialized guidelines for expert annotators13
Current Hypothesis:
Disagreement-based approach to crowdsource ground truth
can be optimised by using results from NLP tools
15
Preliminary Results - Hybrid Workflow
ENTITY EXTRACTION
EVENTS CROWDSOURCING AND
LINKING TO CONCEPTS
SEGMENTATION & KEYFRAMES
LINKING EVENTS AND
CONCEPTS TO KEYFRAMES
diveplus.beeldengeluid.nl
16
Preliminary Results - Hybrid Workflow Outcome
17diveplus.beeldengeluid.nl
Approach: Disagreement is Signal
Principles for disagreement-based
crowdsourcing
• Do not enforce agreement
• Capture a multitude of views
• Take advantage of existing
tools, reuse their functionality
This results in teaching machines to reason in
the disagreement space
18
Overall Methodology
1. Instantiate the research methodology with specific data, domain
• Video synopsis, news
2. Identify state-of-the-art IE approaches that can be used
• NER tools for identifying events and their participating entities in the video synopsis
3. Evaluate IE approaches and identify their drawbacks
• Poor performance in extracting events
4. Combine IE with crowdsourcing tasks in a complementary way
• Use crowdsourcing for identifying the events and linking them with their participating entities
5. Evaluate crowdsourcing results with CrowdTruth disagreement-first approach
• Evaluate the input unit, the workers and the annotations
6. Instantiate the same workflow with different data and/or different domain
• Tweets, Twitter
7. Perform cross-domain analysis
• Event extraction in video synopsis vs. event extraction in tweets 19
Project Websites
http://CrowdTruth.org
http://diveproject.beeldengeluid.nl
Tools & Code
http://dev.CrowdTruth.org
http://github.com/CrowdTruth
http://diveplus.beeldengeluid.nl
Data
http://data.crowdtruth.org
http://data.dive.beeldengeluid.nl
20

Contenu connexe

En vedette

Crowdsourcing & Semantic Web: Dagstuhl 2014 (Presentation Lora)
Crowdsourcing & Semantic Web: Dagstuhl 2014 (Presentation Lora)Crowdsourcing & Semantic Web: Dagstuhl 2014 (Presentation Lora)
Crowdsourcing & Semantic Web: Dagstuhl 2014 (Presentation Lora)Lora Aroyo
 
(Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014
(Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014 (Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014
(Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014 Lora Aroyo
 
Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)
Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)
Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)Lora Aroyo
 
Towards Better Media Understanding and Searchability
Towards Better Media Understanding and SearchabilityTowards Better Media Understanding and Searchability
Towards Better Media Understanding and Searchabilityoanainel
 
Gamification of crowdsourcing tasks: What motivates a medical expert?
Gamification of crowdsourcing tasks: What motivates a medical expert?Gamification of crowdsourcing tasks: What motivates a medical expert?
Gamification of crowdsourcing tasks: What motivates a medical expert?CrowdTruth
 
Visualization of Disagreement-based Quality Metrics of Crowdsourcing Data
Visualization of Disagreement-based Quality Metrics of Crowdsourcing DataVisualization of Disagreement-based Quality Metrics of Crowdsourcing Data
Visualization of Disagreement-based Quality Metrics of Crowdsourcing DataCrowdTruth
 
Crowdsourcing Disagreement on Open-Domain Questions
Crowdsourcing Disagreement on Open-Domain QuestionsCrowdsourcing Disagreement on Open-Domain Questions
Crowdsourcing Disagreement on Open-Domain QuestionsBenjamin Timmermans
 
Utilizing Social Health Websites for Cognitive Computing and Clinical Decisio...
Utilizing Social Health Websites for Cognitive Computing and Clinical Decisio...Utilizing Social Health Websites for Cognitive Computing and Clinical Decisio...
Utilizing Social Health Websites for Cognitive Computing and Clinical Decisio...CrowdTruth
 
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...Lora Aroyo
 
Truth is a Lie: 7 Myths about Human Annotation @CogComputing Forum 2014
Truth is a Lie: 7 Myths about Human Annotation @CogComputing Forum 2014Truth is a Lie: 7 Myths about Human Annotation @CogComputing Forum 2014
Truth is a Lie: 7 Myths about Human Annotation @CogComputing Forum 2014Lora Aroyo
 
Dive+@ICTOpen2017
Dive+@ICTOpen2017Dive+@ICTOpen2017
Dive+@ICTOpen2017oanainel
 
Dive+ NL eScience symposium 2015
Dive+ NL eScience symposium 2015Dive+ NL eScience symposium 2015
Dive+ NL eScience symposium 2015CrowdTruth
 
CrowdTruth Games @NLeSc eHumanities day 2015
CrowdTruth Games @NLeSc eHumanities day 2015CrowdTruth Games @NLeSc eHumanities day 2015
CrowdTruth Games @NLeSc eHumanities day 2015Lora Aroyo
 
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New SearchSXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New SearchLora Aroyo
 
Harnessing the Power of Machines & Crowds for Event Extraction
Harnessing the Power of Machines & Crowds for Event ExtractionHarnessing the Power of Machines & Crowds for Event Extraction
Harnessing the Power of Machines & Crowds for Event Extractionoanainel
 
DIVE Semantic Web Challenge Presentation
DIVE Semantic Web Challenge Presentation DIVE Semantic Web Challenge Presentation
DIVE Semantic Web Challenge Presentation Victor de Boer
 
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital AgeEuropeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital AgeLora Aroyo
 
Truth is a Lie - 7 Myths of Human Annotation
Truth is a Lie - 7 Myths of Human AnnotationTruth is a Lie - 7 Myths of Human Annotation
Truth is a Lie - 7 Myths of Human AnnotationAnca Dumitrache
 
Genuine semantic publishing
Genuine semantic publishingGenuine semantic publishing
Genuine semantic publishingTobias Kuhn
 

En vedette (20)

Crowdsourcing & Semantic Web: Dagstuhl 2014 (Presentation Lora)
Crowdsourcing & Semantic Web: Dagstuhl 2014 (Presentation Lora)Crowdsourcing & Semantic Web: Dagstuhl 2014 (Presentation Lora)
Crowdsourcing & Semantic Web: Dagstuhl 2014 (Presentation Lora)
 
(Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014
(Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014 (Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014
(Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014
 
Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)
Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)
Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)
 
Towards Better Media Understanding and Searchability
Towards Better Media Understanding and SearchabilityTowards Better Media Understanding and Searchability
Towards Better Media Understanding and Searchability
 
Gamification of crowdsourcing tasks: What motivates a medical expert?
Gamification of crowdsourcing tasks: What motivates a medical expert?Gamification of crowdsourcing tasks: What motivates a medical expert?
Gamification of crowdsourcing tasks: What motivates a medical expert?
 
Visualization of Disagreement-based Quality Metrics of Crowdsourcing Data
Visualization of Disagreement-based Quality Metrics of Crowdsourcing DataVisualization of Disagreement-based Quality Metrics of Crowdsourcing Data
Visualization of Disagreement-based Quality Metrics of Crowdsourcing Data
 
Crowdsourcing Disagreement on Open-Domain Questions
Crowdsourcing Disagreement on Open-Domain QuestionsCrowdsourcing Disagreement on Open-Domain Questions
Crowdsourcing Disagreement on Open-Domain Questions
 
Utilizing Social Health Websites for Cognitive Computing and Clinical Decisio...
Utilizing Social Health Websites for Cognitive Computing and Clinical Decisio...Utilizing Social Health Websites for Cognitive Computing and Clinical Decisio...
Utilizing Social Health Websites for Cognitive Computing and Clinical Decisio...
 
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...
 
Truth is a Lie: 7 Myths about Human Annotation @CogComputing Forum 2014
Truth is a Lie: 7 Myths about Human Annotation @CogComputing Forum 2014Truth is a Lie: 7 Myths about Human Annotation @CogComputing Forum 2014
Truth is a Lie: 7 Myths about Human Annotation @CogComputing Forum 2014
 
Dive+@ICTOpen2017
Dive+@ICTOpen2017Dive+@ICTOpen2017
Dive+@ICTOpen2017
 
Dive+ NL eScience symposium 2015
Dive+ NL eScience symposium 2015Dive+ NL eScience symposium 2015
Dive+ NL eScience symposium 2015
 
CrowdTruth Games @NLeSc eHumanities day 2015
CrowdTruth Games @NLeSc eHumanities day 2015CrowdTruth Games @NLeSc eHumanities day 2015
CrowdTruth Games @NLeSc eHumanities day 2015
 
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New SearchSXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
 
Harnessing the Power of Machines & Crowds for Event Extraction
Harnessing the Power of Machines & Crowds for Event ExtractionHarnessing the Power of Machines & Crowds for Event Extraction
Harnessing the Power of Machines & Crowds for Event Extraction
 
Kick-off meeting Linkflows project
Kick-off meeting Linkflows projectKick-off meeting Linkflows project
Kick-off meeting Linkflows project
 
DIVE Semantic Web Challenge Presentation
DIVE Semantic Web Challenge Presentation DIVE Semantic Web Challenge Presentation
DIVE Semantic Web Challenge Presentation
 
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital AgeEuropeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital Age
 
Truth is a Lie - 7 Myths of Human Annotation
Truth is a Lie - 7 Myths of Human AnnotationTruth is a Lie - 7 Myths of Human Annotation
Truth is a Lie - 7 Myths of Human Annotation
 
Genuine semantic publishing
Genuine semantic publishingGenuine semantic publishing
Genuine semantic publishing
 

Similaire à ESWC - PhD Symposium 2016

W4P-Launch - Open Source Crowdsourcing platform
W4P-Launch - Open Source Crowdsourcing platformW4P-Launch - Open Source Crowdsourcing platform
W4P-Launch - Open Source Crowdsourcing platformOpen Knowledge Belgium
 
Where to focus event innovation? - An audience led approach
Where to focus event innovation? - An audience led approachWhere to focus event innovation? - An audience led approach
Where to focus event innovation? - An audience led approachLive Union
 
CROWDFUNDING IN ACTION: HOW INSTITUTIONAL LOGICS ENCOURAGE AND CONSTRAIN AFFO...
CROWDFUNDING IN ACTION: HOW INSTITUTIONAL LOGICS ENCOURAGE AND CONSTRAIN AFFO...CROWDFUNDING IN ACTION: HOW INSTITUTIONAL LOGICS ENCOURAGE AND CONSTRAIN AFFO...
CROWDFUNDING IN ACTION: HOW INSTITUTIONAL LOGICS ENCOURAGE AND CONSTRAIN AFFO...Claire Ingram Bogusz
 
How Customer Intelligence Will Future Proof Your Event Portfolio
How Customer Intelligence Will Future Proof Your Event PortfolioHow Customer Intelligence Will Future Proof Your Event Portfolio
How Customer Intelligence Will Future Proof Your Event PortfolioBear Analytics
 
Queuing and The Age of Context: Release 1 The Digital Consumer Collaborative
Queuing and The Age of Context: Release 1 The Digital Consumer CollaborativeQueuing and The Age of Context: Release 1 The Digital Consumer Collaborative
Queuing and The Age of Context: Release 1 The Digital Consumer CollaborativeDave Norton
 
Intro For Informative Essay
Intro For Informative EssayIntro For Informative Essay
Intro For Informative EssayLisa Johnson
 
Essay Radiology Career
Essay Radiology CareerEssay Radiology Career
Essay Radiology CareerAmy Williams
 
Accountability in Action - Step Seven
Accountability in Action - Step SevenAccountability in Action - Step Seven
Accountability in Action - Step Seventincancollective
 
Essay On Current Affairs Of Pakistan 2014
Essay On Current Affairs Of Pakistan 2014Essay On Current Affairs Of Pakistan 2014
Essay On Current Affairs Of Pakistan 2014Shantel Jervey
 
10ictprojectforsocialchange
10ictprojectforsocialchange10ictprojectforsocialchange
10ictprojectforsocialchangeYoonaIm6
 
Crowdsourcing 101 for GLAMs
Crowdsourcing 101 for GLAMsCrowdsourcing 101 for GLAMs
Crowdsourcing 101 for GLAMsOlaf Janssen
 
ICT Project for Social Change - Empowerment Technologies
ICT Project for Social Change - Empowerment TechnologiesICT Project for Social Change - Empowerment Technologies
ICT Project for Social Change - Empowerment TechnologiesMark Jhon Oxillo
 
Bad Effects Of Smoking Short Essay. Online assignment writing service.
Bad Effects Of Smoking Short Essay. Online assignment writing service.Bad Effects Of Smoking Short Essay. Online assignment writing service.
Bad Effects Of Smoking Short Essay. Online assignment writing service.Lisa Richardson
 
Prospecting & Screening: A Beginners Guide
Prospecting & Screening: A Beginners GuideProspecting & Screening: A Beginners Guide
Prospecting & Screening: A Beginners GuideBen Rymer
 
Personal Data and Trust Network inaugural Event 11 march 2015 - record
Personal Data and Trust Network inaugural Event   11 march 2015 - recordPersonal Data and Trust Network inaugural Event   11 march 2015 - record
Personal Data and Trust Network inaugural Event 11 march 2015 - recordDigital Catapult
 
Speech Maarten Brouwer at Open Data for Development Camp, May 2011, Amsterdam
Speech Maarten Brouwer at  Open Data for Development Camp, May 2011,  AmsterdamSpeech Maarten Brouwer at  Open Data for Development Camp, May 2011,  Amsterdam
Speech Maarten Brouwer at Open Data for Development Camp, May 2011, Amsterdamopenforchange
 
UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling a...
UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling a...UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling a...
UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling a...gjhouben
 

Similaire à ESWC - PhD Symposium 2016 (20)

W4P-Launch - Open Source Crowdsourcing platform
W4P-Launch - Open Source Crowdsourcing platformW4P-Launch - Open Source Crowdsourcing platform
W4P-Launch - Open Source Crowdsourcing platform
 
Where to focus event innovation? - An audience led approach
Where to focus event innovation? - An audience led approachWhere to focus event innovation? - An audience led approach
Where to focus event innovation? - An audience led approach
 
CROWDFUNDING IN ACTION: HOW INSTITUTIONAL LOGICS ENCOURAGE AND CONSTRAIN AFFO...
CROWDFUNDING IN ACTION: HOW INSTITUTIONAL LOGICS ENCOURAGE AND CONSTRAIN AFFO...CROWDFUNDING IN ACTION: HOW INSTITUTIONAL LOGICS ENCOURAGE AND CONSTRAIN AFFO...
CROWDFUNDING IN ACTION: HOW INSTITUTIONAL LOGICS ENCOURAGE AND CONSTRAIN AFFO...
 
How Customer Intelligence Will Future Proof Your Event Portfolio
How Customer Intelligence Will Future Proof Your Event PortfolioHow Customer Intelligence Will Future Proof Your Event Portfolio
How Customer Intelligence Will Future Proof Your Event Portfolio
 
Audience Lessons
Audience LessonsAudience Lessons
Audience Lessons
 
Matchbox presentation
Matchbox presentation Matchbox presentation
Matchbox presentation
 
Queuing and The Age of Context: Release 1 The Digital Consumer Collaborative
Queuing and The Age of Context: Release 1 The Digital Consumer CollaborativeQueuing and The Age of Context: Release 1 The Digital Consumer Collaborative
Queuing and The Age of Context: Release 1 The Digital Consumer Collaborative
 
Intro For Informative Essay
Intro For Informative EssayIntro For Informative Essay
Intro For Informative Essay
 
Essay Radiology Career
Essay Radiology CareerEssay Radiology Career
Essay Radiology Career
 
Accountability in Action - Step Seven
Accountability in Action - Step SevenAccountability in Action - Step Seven
Accountability in Action - Step Seven
 
Essay On Current Affairs Of Pakistan 2014
Essay On Current Affairs Of Pakistan 2014Essay On Current Affairs Of Pakistan 2014
Essay On Current Affairs Of Pakistan 2014
 
10ictprojectforsocialchange
10ictprojectforsocialchange10ictprojectforsocialchange
10ictprojectforsocialchange
 
Crowdsourcing 101 for GLAMs
Crowdsourcing 101 for GLAMsCrowdsourcing 101 for GLAMs
Crowdsourcing 101 for GLAMs
 
ICT Project for Social Change - Empowerment Technologies
ICT Project for Social Change - Empowerment TechnologiesICT Project for Social Change - Empowerment Technologies
ICT Project for Social Change - Empowerment Technologies
 
Bad Effects Of Smoking Short Essay. Online assignment writing service.
Bad Effects Of Smoking Short Essay. Online assignment writing service.Bad Effects Of Smoking Short Essay. Online assignment writing service.
Bad Effects Of Smoking Short Essay. Online assignment writing service.
 
EIA2016 Turin - Alberto Giusti. Crowdfunding
EIA2016 Turin - Alberto Giusti.  CrowdfundingEIA2016 Turin - Alberto Giusti.  Crowdfunding
EIA2016 Turin - Alberto Giusti. Crowdfunding
 
Prospecting & Screening: A Beginners Guide
Prospecting & Screening: A Beginners GuideProspecting & Screening: A Beginners Guide
Prospecting & Screening: A Beginners Guide
 
Personal Data and Trust Network inaugural Event 11 march 2015 - record
Personal Data and Trust Network inaugural Event   11 march 2015 - recordPersonal Data and Trust Network inaugural Event   11 march 2015 - record
Personal Data and Trust Network inaugural Event 11 march 2015 - record
 
Speech Maarten Brouwer at Open Data for Development Camp, May 2011, Amsterdam
Speech Maarten Brouwer at  Open Data for Development Camp, May 2011,  AmsterdamSpeech Maarten Brouwer at  Open Data for Development Camp, May 2011,  Amsterdam
Speech Maarten Brouwer at Open Data for Development Camp, May 2011, Amsterdam
 
UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling a...
UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling a...UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling a...
UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling a...
 

Dernier

Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxStephen266013
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group MeetingAlison Pitt
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Jon Hansen
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsCEPTES Software Inc
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp onlinebalibahu1313
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictJack Cole
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理pyhepag
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfscitechtalktv
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理pyhepag
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理cyebo
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxDilipVasan
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...ssuserf63bd7
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Valters Lauzums
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonPayment Village
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理pyhepag
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理cyebo
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeralNABLAS株式会社
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdfvyankatesh1
 

Dernier (20)

Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 

ESWC - PhD Symposium 2016

  • 1. Machine-Crowd Annotation Workflow for Event Understanding across Collections & Domains Oana Inel Extended Semantic Web Conference PhD Symposium May 30th 2016
  • 2. Too much information ... e.g., if you are interested in the topic of “whaling” 2
  • 3. … and after a while it all looks the same it is difficult to form a global picture on a topic 3
  • 4. … thus, content without context is difficult to process events can help create context around content 4
  • 5. …, but events are not easy to deal with • Events are vague • Event semantics are difficult • Events can be viewed and interpreted from multiple perspectives and interpretations e.g. of participants interpretation: The mayor of the city called the celebration a success. • Events can be presented at different levels of granularities e.g. of spatial disagreement: The celebration took place in every city in the Netherlands. • People are not consistent in the way they talk about or use events e.g.: The celebration took place last week, fireworks shows were held everywhere. 5
  • 6. … a lot of ground truth is needed to learn event specifics • Traditional ground truth collection doesn’t scale: • there is not really ‘one type of experts’ when it comes to events • the annotation guidelines for events are difficult to define • the annotation of events can be a tedious process • all of the above can result in high inter-annotator disagreement • Crowdsourcing could be an alternative • but is still not a robust & replicable approach 6
  • 7. … let’s look at some examples According to department policy prosecutors must make a strong showing that lawyers' fees came from assets tainted by illegal profits before any attempts at seizure are made. The unit makes intravenous pumps used by hospitals and had more than $110 million in sales last year according to Advanced Medical. 7
  • 8. … here is what experts annotate on these sentences [According] to department policy prosecutors must make a strong [showing] that lawyers' fees [came] from assets tainted by illegal profits before any [attempts] at [seizure] are [made]. The unit makes intravenous pumps used by hospitals and [had] more than $110 million in [sales] last year according to Advanced Medical. 8
  • 9. … here is what the crowd annotates on them According to department policy prosecutors must make a [strong [showing]] that lawyers' fees [[came] from assets] [tainted] by illegal profits before any [attempts] at [seizure] are [made]. The unit [makes] intravenous pumps [used] by hospitals and [[had] more than $110 million in [sales]] last year according to Advanced Medical. 9
  • 10. … here is what the machines can detect According to department policy prosecutors must [make] a strong showing that lawyers' fees [came] from assets [tainted] by illegal profits before any attempts at seizure are made. The unit [makes] intravenous pumps [used] by hospitals and [had] more than $110 million in sales last year according to Advanced Medical. 10
  • 11. Research Questions • Can crowdsourcing help in improving event detection? • Can we provide reliable crowdsourced training data? • Can we optimize the crowdsourcing process by using results from NLP tools? • Can we achieve a replicable data collection process across different data types and use cases? 11
  • 12. Current Hypothesis: Disagreement-based approach to crowdsource ground truth is reliable and produces quality results 12
  • 13. Preliminary Results - Crowd vs. Experts ● 200 news snippets from TimeBank● 3019 tweets published in 2014 ● potential relevant tweets for events such as ‘whaling’, ‘Davos 2014’ among others CrowdTruth approach outperforms the-state-of-the-art crowdsourcing approaches such as single annotator and majority vote The crowd performs almost as good as the experts due to very linguistic-specialized guidelines for expert annotators13
  • 14. Current Hypothesis: Disagreement-based approach to crowdsource ground truth can be optimised by using results from NLP tools 15
  • 15. Preliminary Results - Hybrid Workflow ENTITY EXTRACTION EVENTS CROWDSOURCING AND LINKING TO CONCEPTS SEGMENTATION & KEYFRAMES LINKING EVENTS AND CONCEPTS TO KEYFRAMES diveplus.beeldengeluid.nl 16
  • 16. Preliminary Results - Hybrid Workflow Outcome 17diveplus.beeldengeluid.nl
  • 17. Approach: Disagreement is Signal Principles for disagreement-based crowdsourcing • Do not enforce agreement • Capture a multitude of views • Take advantage of existing tools, reuse their functionality This results in teaching machines to reason in the disagreement space 18
  • 18. Overall Methodology 1. Instantiate the research methodology with specific data, domain • Video synopsis, news 2. Identify state-of-the-art IE approaches that can be used • NER tools for identifying events and their participating entities in the video synopsis 3. Evaluate IE approaches and identify their drawbacks • Poor performance in extracting events 4. Combine IE with crowdsourcing tasks in a complementary way • Use crowdsourcing for identifying the events and linking them with their participating entities 5. Evaluate crowdsourcing results with CrowdTruth disagreement-first approach • Evaluate the input unit, the workers and the annotations 6. Instantiate the same workflow with different data and/or different domain • Tweets, Twitter 7. Perform cross-domain analysis • Event extraction in video synopsis vs. event extraction in tweets 19
  • 19. Project Websites http://CrowdTruth.org http://diveproject.beeldengeluid.nl Tools & Code http://dev.CrowdTruth.org http://github.com/CrowdTruth http://diveplus.beeldengeluid.nl Data http://data.crowdtruth.org http://data.dive.beeldengeluid.nl 20

Notes de l'éditeur

  1. Massive amount of information One of the main characteristics of today is the massive, even overwhelming amount of information around us Just think at all the videos, images and the infinite amount of web pages, tweets that you get as search results when you want to learn about a topic
  2. However, this unconceivable amount of information starts to ‘look all the same’ to the users and they are not able to properly consume the information and get an overview of the topic
  3. and this happens because content without context is difficult to process. but, events can help create context around content
  4. Experts can be inconsistent - despite the traditional believe that they are always right
  5. The crowd overlaps with the experts in proportion of 88%, i.e. it detects almost the same events as the experts But the added value is that crowd finds even more events and it is more specific Another point is that the crowd seems to be more consistent :-)
  6. And how little the machines are able to detect from this - so they need to learn more, thus more training data is needed for them
  7. majority vote - the answer that was picked by the majority of the workers and all the answers that were picked by at least half of the total number of workers single - randomly sampled from the set of workers annotating it; to show that having more annotators generates better quality data. CT scores consistently above the majority vote and single annotator and its performance is also comparable to that of domain experts. The crowdsourcing task where workers choose annotations from a fixed number of options perform better at higher thresholds, e.g. (Twitter event extraction). Whereas open annotation tasks (event extraction) perform better when the threshold is at its lowest, thus ensuring the most diverse opinions are accounted in the resulting ground truth.
  8. Message of the results Data on which the experiments were performed
  9. Have two hypothesis for this
  10. Experts are inconsistent
  11. Automatic tools detect less; difficult to see what is the focus The crowd is much more specific than the experts The crowd overlaps a lot with the experts Experts have some difficult events Experts are not consistent
  12. Automatic tools detect less; difficult to see what is the focus The crowd is much more specific than the experts The crowd overlaps a lot with the experts Experts have some difficult events Experts are not consistent