SlideShare a Scribd company logo
1 of 25
Generating Ground Truth for
Music Mood Classification
Using Mechanical Turk
Jin Ha Lee & Xiao Hu
JCDL 2012
Mood: a relatively long lasting
and stable emotional state (Meyer, 1956)
Emotion?
Affect?
Music mood
• Recently received a lot of attention in
MIR (Music Information Retrieval) domain
• “Audio Music Mood Classification” task in
MIREX, starting in 2007
• Critical for developing MDL
Music Information Retrieval
Evaluation eXchange
• Evaluation is based
on ground truth
Passionate Bittersweet Bittersweet
Bittersweet
More is better!
However, generating ground truth
based on human input is expensive
and time consuming
How is it done in MIREX?
• A web-based survey system called E6K
• Invitations posted to MIREX and music-ir
mailing lists in order to recruit
volunteers
Can we use the
CROWD
instead of
MUSIC
EXPERTS?
Is there a
better way?
1. How do music mood classification results
obtained from MechanicalTurk
compare to those collected from music
experts in MIREX?
2. How different or similar are the
evaluation outcomes for MIREX
AMC task when based on ground truth
collected from MechanicalTurk vs. E6K?
Workers (Turkers)
Task RequesterAmazon
Mechanical
Turk
(MTurk)
Cluster1 passionate, rousing, confident, boisterous, rowdy
Cluster2 cheerful, fun, rollicking, sweet, amiable/good natured
Cluster3 bittersweet, poignant, wistful, literate, autumnal, brooding
Cluster4 humorous, silly, campy, quirky, whimsical, witty, wry
Cluster5 aggressive, intense, fiery, tense/anxious, volatile, visceral
TASK:
Listen to 30 second
music clips →
Select one of the five
mood clusters ↓
Qualification
test
Consistency
check
Review
process
1250 songs
x 2 judgments
2500 unique mood judgments
186 HITs collected
- 86 HITs rejected
100 HITs accepted Basic Stats
1HIT =25 songs
EVALUTRON 6000
Stats on Collecting Data
AverageTime Spent on Each Music Clip
21.54 seconds 17.46 seconds
TotalTime for Collecting All Judgments
38 days
(+ additional in-house
assessment)
19 days
Cost for Collecting All Judgments
$0 $60.50
Comparison of E6K and MTurk data
Cluster E6K MTurk
Diff. in %
(E6K-MTurk)
Cluster1 405 (16.4%) 450 (18.0%) -1.6%
Cluster2 472 (19.1%) 536 (21.4%) -2.3%
Cluster3 542 (22.0%) 622 (24.9%) -2.9%
Cluster4 412 (16.7%) 367 (14.7%) 2.0%
Cluster5 400 (16.2%) 403 (16.1%) 0.1%
Other 237 (9.6%) 122 (4.9%) 4.7%
Total 2468 2500 -
Number of Judgments and
Distribution across Clusters
Distribution of Agreement
Cluster E6K MTurk Both
Cluster1 121 89 29
Cluster2 130 131 44
Cluster3 163 216 91
Cluster4 121 85 42
Cluster5 126 121 64
Total 661 642 270
Confusion among the Clusters
Clusters
Disagreed in
E6K
Disagreed IN
MTurk
Cluster 1 & Cluster 2 20 95
Cluster 2 & Cluster 4 31 86
Cluster 1 & Cluster 5 13 74
⁞ ⁞ ⁞
Cluster 3 & Cluster 4 6 27
Cluster 2 & Cluster 5 1 22
Cluster 3 & Cluster 5 1 20
Total 253 595
Cluster
1
Cluster
2
Cluster
5
Cluster
4
Cluster
3
Russell’s model
System Performance
E6K
Average
accuracy
MTurk
Average
accuracy
CL 0.65 GT 0.66
GT 0.64 CL 0.63
TL 0.64 TL 0.63
ME1 0.61 ME1 0.57
ME2 0.61 ME2 0.57
IM2 0.57 IM2 0.57
KL1 0.56 KL1 0.55
IM1 0.53 IM1 0.54
KL2 0.29 KL2 0.29
TK-HSD Rank Comparison
MTurkE6K
Conclusion
• Overall the human judgments from E6K and
MTurk showed similar patterns:
– Judgment distribution across five mood clusters
– Agreement distribution across clusters
– Confusion among clusters
• System performance rankings from E6K and
Mturk were also comparable
Conclusion (Cont’d.)
• However, combined ground truth from E6K
and MTurk is only about 60% the size of the
original E6K ground truth
• Mood is a highly subjective feature for
describing and organizing music
• Other means for judging the moods should be
explored (e.g., ranking)
Future work
• In-depth interview with users to investigate
factors affecting people’s judgments on music
mood
• More controlled study with different user
groups
Questions?

More Related Content

Recently uploaded

Recently uploaded (20)

Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 

Featured

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 

Generating Ground Truth for Music Mood Classification Using Mechanical Turk

  • 1. Generating Ground Truth for Music Mood Classification Using Mechanical Turk Jin Ha Lee & Xiao Hu JCDL 2012
  • 2. Mood: a relatively long lasting and stable emotional state (Meyer, 1956) Emotion? Affect?
  • 3. Music mood • Recently received a lot of attention in MIR (Music Information Retrieval) domain • “Audio Music Mood Classification” task in MIREX, starting in 2007 • Critical for developing MDL Music Information Retrieval Evaluation eXchange
  • 4. • Evaluation is based on ground truth Passionate Bittersweet Bittersweet Bittersweet
  • 5. More is better! However, generating ground truth based on human input is expensive and time consuming
  • 6. How is it done in MIREX? • A web-based survey system called E6K • Invitations posted to MIREX and music-ir mailing lists in order to recruit volunteers
  • 7.
  • 8. Can we use the CROWD instead of MUSIC EXPERTS? Is there a better way?
  • 9. 1. How do music mood classification results obtained from MechanicalTurk compare to those collected from music experts in MIREX? 2. How different or similar are the evaluation outcomes for MIREX AMC task when based on ground truth collected from MechanicalTurk vs. E6K?
  • 11. Cluster1 passionate, rousing, confident, boisterous, rowdy Cluster2 cheerful, fun, rollicking, sweet, amiable/good natured Cluster3 bittersweet, poignant, wistful, literate, autumnal, brooding Cluster4 humorous, silly, campy, quirky, whimsical, witty, wry Cluster5 aggressive, intense, fiery, tense/anxious, volatile, visceral TASK: Listen to 30 second music clips → Select one of the five mood clusters ↓
  • 13. 1250 songs x 2 judgments 2500 unique mood judgments 186 HITs collected - 86 HITs rejected 100 HITs accepted Basic Stats 1HIT =25 songs
  • 14. EVALUTRON 6000 Stats on Collecting Data AverageTime Spent on Each Music Clip 21.54 seconds 17.46 seconds TotalTime for Collecting All Judgments 38 days (+ additional in-house assessment) 19 days Cost for Collecting All Judgments $0 $60.50
  • 15. Comparison of E6K and MTurk data
  • 16. Cluster E6K MTurk Diff. in % (E6K-MTurk) Cluster1 405 (16.4%) 450 (18.0%) -1.6% Cluster2 472 (19.1%) 536 (21.4%) -2.3% Cluster3 542 (22.0%) 622 (24.9%) -2.9% Cluster4 412 (16.7%) 367 (14.7%) 2.0% Cluster5 400 (16.2%) 403 (16.1%) 0.1% Other 237 (9.6%) 122 (4.9%) 4.7% Total 2468 2500 - Number of Judgments and Distribution across Clusters
  • 17. Distribution of Agreement Cluster E6K MTurk Both Cluster1 121 89 29 Cluster2 130 131 44 Cluster3 163 216 91 Cluster4 121 85 42 Cluster5 126 121 64 Total 661 642 270
  • 18. Confusion among the Clusters Clusters Disagreed in E6K Disagreed IN MTurk Cluster 1 & Cluster 2 20 95 Cluster 2 & Cluster 4 31 86 Cluster 1 & Cluster 5 13 74 ⁞ ⁞ ⁞ Cluster 3 & Cluster 4 6 27 Cluster 2 & Cluster 5 1 22 Cluster 3 & Cluster 5 1 20 Total 253 595
  • 20. System Performance E6K Average accuracy MTurk Average accuracy CL 0.65 GT 0.66 GT 0.64 CL 0.63 TL 0.64 TL 0.63 ME1 0.61 ME1 0.57 ME2 0.61 ME2 0.57 IM2 0.57 IM2 0.57 KL1 0.56 KL1 0.55 IM1 0.53 IM1 0.54 KL2 0.29 KL2 0.29
  • 22. Conclusion • Overall the human judgments from E6K and MTurk showed similar patterns: – Judgment distribution across five mood clusters – Agreement distribution across clusters – Confusion among clusters • System performance rankings from E6K and Mturk were also comparable
  • 23. Conclusion (Cont’d.) • However, combined ground truth from E6K and MTurk is only about 60% the size of the original E6K ground truth • Mood is a highly subjective feature for describing and organizing music • Other means for judging the moods should be explored (e.g., ranking)
  • 24. Future work • In-depth interview with users to investigate factors affecting people’s judgments on music mood • More controlled study with different user groups