SlideShare une entreprise Scribd logo
1  sur  25
Generating Ground Truth for
Music Mood Classification
Using Mechanical Turk
Jin Ha Lee & Xiao Hu
JCDL 2012
Mood: a relatively long lasting
and stable emotional state (Meyer, 1956)
Emotion?
Affect?
Music mood
• Recently received a lot of attention in
MIR (Music Information Retrieval) domain
• “Audio Music Mood Classification” task in
MIREX, starting in 2007
• Critical for developing MDL
Music Information Retrieval
Evaluation eXchange
• Evaluation is based
on ground truth
Passionate Bittersweet Bittersweet
Bittersweet
More is better!
However, generating ground truth
based on human input is expensive
and time consuming
How is it done in MIREX?
• A web-based survey system called E6K
• Invitations posted to MIREX and music-ir
mailing lists in order to recruit
volunteers
Can we use the
CROWD
instead of
MUSIC
EXPERTS?
Is there a
better way?
1. How do music mood classification results
obtained from MechanicalTurk
compare to those collected from music
experts in MIREX?
2. How different or similar are the
evaluation outcomes for MIREX
AMC task when based on ground truth
collected from MechanicalTurk vs. E6K?
Workers (Turkers)
Task RequesterAmazon
Mechanical
Turk
(MTurk)
Cluster1 passionate, rousing, confident, boisterous, rowdy
Cluster2 cheerful, fun, rollicking, sweet, amiable/good natured
Cluster3 bittersweet, poignant, wistful, literate, autumnal, brooding
Cluster4 humorous, silly, campy, quirky, whimsical, witty, wry
Cluster5 aggressive, intense, fiery, tense/anxious, volatile, visceral
TASK:
Listen to 30 second
music clips →
Select one of the five
mood clusters ↓
Qualification
test
Consistency
check
Review
process
1250 songs
x 2 judgments
2500 unique mood judgments
186 HITs collected
- 86 HITs rejected
100 HITs accepted Basic Stats
1HIT =25 songs
EVALUTRON 6000
Stats on Collecting Data
AverageTime Spent on Each Music Clip
21.54 seconds 17.46 seconds
TotalTime for Collecting All Judgments
38 days
(+ additional in-house
assessment)
19 days
Cost for Collecting All Judgments
$0 $60.50
Comparison of E6K and MTurk data
Cluster E6K MTurk
Diff. in %
(E6K-MTurk)
Cluster1 405 (16.4%) 450 (18.0%) -1.6%
Cluster2 472 (19.1%) 536 (21.4%) -2.3%
Cluster3 542 (22.0%) 622 (24.9%) -2.9%
Cluster4 412 (16.7%) 367 (14.7%) 2.0%
Cluster5 400 (16.2%) 403 (16.1%) 0.1%
Other 237 (9.6%) 122 (4.9%) 4.7%
Total 2468 2500 -
Number of Judgments and
Distribution across Clusters
Distribution of Agreement
Cluster E6K MTurk Both
Cluster1 121 89 29
Cluster2 130 131 44
Cluster3 163 216 91
Cluster4 121 85 42
Cluster5 126 121 64
Total 661 642 270
Confusion among the Clusters
Clusters
Disagreed in
E6K
Disagreed IN
MTurk
Cluster 1 & Cluster 2 20 95
Cluster 2 & Cluster 4 31 86
Cluster 1 & Cluster 5 13 74
⁞ ⁞ ⁞
Cluster 3 & Cluster 4 6 27
Cluster 2 & Cluster 5 1 22
Cluster 3 & Cluster 5 1 20
Total 253 595
Cluster
1
Cluster
2
Cluster
5
Cluster
4
Cluster
3
Russell’s model
System Performance
E6K
Average
accuracy
MTurk
Average
accuracy
CL 0.65 GT 0.66
GT 0.64 CL 0.63
TL 0.64 TL 0.63
ME1 0.61 ME1 0.57
ME2 0.61 ME2 0.57
IM2 0.57 IM2 0.57
KL1 0.56 KL1 0.55
IM1 0.53 IM1 0.54
KL2 0.29 KL2 0.29
TK-HSD Rank Comparison
MTurkE6K
Conclusion
• Overall the human judgments from E6K and
MTurk showed similar patterns:
– Judgment distribution across five mood clusters
– Agreement distribution across clusters
– Confusion among clusters
• System performance rankings from E6K and
Mturk were also comparable
Conclusion (Cont’d.)
• However, combined ground truth from E6K
and MTurk is only about 60% the size of the
original E6K ground truth
• Mood is a highly subjective feature for
describing and organizing music
• Other means for judging the moods should be
explored (e.g., ranking)
Future work
• In-depth interview with users to investigate
factors affecting people’s judgments on music
mood
• More controlled study with different user
groups
Questions?

Contenu connexe

Dernier

Dernier (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

En vedette

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

En vedette (20)

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 

Generating Ground Truth for Music Mood Classification Using Mechanical Turk

  • 1. Generating Ground Truth for Music Mood Classification Using Mechanical Turk Jin Ha Lee & Xiao Hu JCDL 2012
  • 2. Mood: a relatively long lasting and stable emotional state (Meyer, 1956) Emotion? Affect?
  • 3. Music mood • Recently received a lot of attention in MIR (Music Information Retrieval) domain • “Audio Music Mood Classification” task in MIREX, starting in 2007 • Critical for developing MDL Music Information Retrieval Evaluation eXchange
  • 4. • Evaluation is based on ground truth Passionate Bittersweet Bittersweet Bittersweet
  • 5. More is better! However, generating ground truth based on human input is expensive and time consuming
  • 6. How is it done in MIREX? • A web-based survey system called E6K • Invitations posted to MIREX and music-ir mailing lists in order to recruit volunteers
  • 7.
  • 8. Can we use the CROWD instead of MUSIC EXPERTS? Is there a better way?
  • 9. 1. How do music mood classification results obtained from MechanicalTurk compare to those collected from music experts in MIREX? 2. How different or similar are the evaluation outcomes for MIREX AMC task when based on ground truth collected from MechanicalTurk vs. E6K?
  • 11. Cluster1 passionate, rousing, confident, boisterous, rowdy Cluster2 cheerful, fun, rollicking, sweet, amiable/good natured Cluster3 bittersweet, poignant, wistful, literate, autumnal, brooding Cluster4 humorous, silly, campy, quirky, whimsical, witty, wry Cluster5 aggressive, intense, fiery, tense/anxious, volatile, visceral TASK: Listen to 30 second music clips → Select one of the five mood clusters ↓
  • 13. 1250 songs x 2 judgments 2500 unique mood judgments 186 HITs collected - 86 HITs rejected 100 HITs accepted Basic Stats 1HIT =25 songs
  • 14. EVALUTRON 6000 Stats on Collecting Data AverageTime Spent on Each Music Clip 21.54 seconds 17.46 seconds TotalTime for Collecting All Judgments 38 days (+ additional in-house assessment) 19 days Cost for Collecting All Judgments $0 $60.50
  • 15. Comparison of E6K and MTurk data
  • 16. Cluster E6K MTurk Diff. in % (E6K-MTurk) Cluster1 405 (16.4%) 450 (18.0%) -1.6% Cluster2 472 (19.1%) 536 (21.4%) -2.3% Cluster3 542 (22.0%) 622 (24.9%) -2.9% Cluster4 412 (16.7%) 367 (14.7%) 2.0% Cluster5 400 (16.2%) 403 (16.1%) 0.1% Other 237 (9.6%) 122 (4.9%) 4.7% Total 2468 2500 - Number of Judgments and Distribution across Clusters
  • 17. Distribution of Agreement Cluster E6K MTurk Both Cluster1 121 89 29 Cluster2 130 131 44 Cluster3 163 216 91 Cluster4 121 85 42 Cluster5 126 121 64 Total 661 642 270
  • 18. Confusion among the Clusters Clusters Disagreed in E6K Disagreed IN MTurk Cluster 1 & Cluster 2 20 95 Cluster 2 & Cluster 4 31 86 Cluster 1 & Cluster 5 13 74 ⁞ ⁞ ⁞ Cluster 3 & Cluster 4 6 27 Cluster 2 & Cluster 5 1 22 Cluster 3 & Cluster 5 1 20 Total 253 595
  • 20. System Performance E6K Average accuracy MTurk Average accuracy CL 0.65 GT 0.66 GT 0.64 CL 0.63 TL 0.64 TL 0.63 ME1 0.61 ME1 0.57 ME2 0.61 ME2 0.57 IM2 0.57 IM2 0.57 KL1 0.56 KL1 0.55 IM1 0.53 IM1 0.54 KL2 0.29 KL2 0.29
  • 22. Conclusion • Overall the human judgments from E6K and MTurk showed similar patterns: – Judgment distribution across five mood clusters – Agreement distribution across clusters – Confusion among clusters • System performance rankings from E6K and Mturk were also comparable
  • 23. Conclusion (Cont’d.) • However, combined ground truth from E6K and MTurk is only about 60% the size of the original E6K ground truth • Mood is a highly subjective feature for describing and organizing music • Other means for judging the moods should be explored (e.g., ranking)
  • 24. Future work • In-depth interview with users to investigate factors affecting people’s judgments on music mood • More controlled study with different user groups