SlideShare une entreprise Scribd logo
1  sur  19
Competence Center Information Retrieval & Machine Learning
11th International Workshop on Content-Based Multimedia Indexing (CBMI), Veszprem, Hungary, 2013
Detecting Violent Content in Hollywood Movies by Mid-level
Audio Representations
Esra Acar
Esra Acar, Frank Hopfgartner, Sahin Albayrak
Outline
217. Juni 2013 CBMI‘2013
► Motivation
► The Violence Detection Method
 Audio Representation of Videos
 Learning Violence Detection Model
► Performance Evaluation
► Conclusions & Future Work
Motivation
317. Juni 2013 CBMI‘2013
► Goal: the detection of most violent scenes in Hollywood
movies.
► Use case: Parents select or reject movies by previewing parts of
the movies that include the most violent moments.
► We investigate the discriminative power of mid-level audio
features
 Bag-of-Audio Words (BoAW) representations based on Mel-
Frequency Cepstral Coefficients (MFCCs)
 Two different BoAW construction methods
Vector quantization-based (VQ-based) method, and
Sparse coding-based (SC-based) method
The Violence Detection Method
417. Juni 2013 CBMI‘2013
►The definition of violence: “physical violence or
accident resulting in human injury or pain”
“violence” as defined in the MediaEval Violent
Scenes Detection (VSD) task.
►Two main components of the method:
The representation of video shots
The learning of a violence model
Audio Representation of Videos (1)
517. Juni 2013 CBMI‘2013
► Mel-Frequency Cepstral Coefficients (MFCCs)
 are commonly used in speech recognition and music
information retrieval (e.g., genre classification).
 relate better to human perception.
 work well for the detection of excitement/non-excitement
(i.e., indicators of the excitement level of video segments).
► MFCC-based audio representation is employed for the
description of the audio content of Hollywood movies.
► Using mid-level representations may help modeling video
segments one step closer to human perception. Examples are:
 bags of features,
 the upper units of convolutional networks or deep belief
networks
Audio Representation of Videos (2)
617. Juni 2013 CBMI‘2013
► We use mid-level audio features based on MFCCs (i.e., BoAW
approach).
► The BoAW approach with two different coding schemes
 Vector quantization (by k-means clustering)
dividing feature vectors into groups, where each group is
represented by its centroid point (e.g., k-means clustering
algorithm).
 Sparse coding (by the LARS algorithm)
representing a feature vector as a linear combination of an over-
complete set of basis vectors.
Audio Representation of Videos (3)
717. Juni 2013 CBMI‘2013
Dictionary Generation Phase
Audio Representation of Videos (4)
817. Juni 2013 CBMI‘2013
Representation Construction Phase
Learning Violence Detection Model
917. Juni 2013 CBMI‘2013
Learning a Violence Model
Performance Evaluation
1017. Juni 2013 CBMI‘2013
► Dataset:
 32,708 video shots from 18 Hollywood movies of different genres
(ranging from extremely violent movies to movies without
violence).
Training set: 26,138 video shots from 15 movies.
Test set: 6,570 video shots from 3 movies.
► Ground truth:
 generated by 7 human assessors. Violent movie segments are
annotated at the frame-level.
 Each video shot is labeled as violent or non-violent.
The characteristics of training and test datasets
Evaluation Metrics
1117. Juni 2013 CBMI‘2013
► The ranking of violent shots are more important for the use
case.
► Metrics other than precision and recall are required to
compare the performance.
► Average precision at 20 & 100 are used (official metrics in the
MediaEval VSD task)
► R-precision which can be seen as an alternative to the precision
at k.
Results & Discussions (1)
1217. Juni 2013 CBMI‘2013
Average Precision at 100 for the Baseline and Our Methods
Average Precision at 20 & 100 and R-precision
for the VQ- and SC-based methods
Results & Discussions (2)
1317. Juni 2013 CBMI‘2013
Average Precision at 20 & 100 and R-precision on Independence Day
Average Precision at 20 & 100 and R-precision on Dead Poets Society
Average Precision at 20 & 100 and R-precision on Fight Club
Results & Discussions (3)
1417. Juni 2013 CBMI‘2013
Team Features Modality APat100*
ARF Color, texture, audio and concepts audio-visual 0.651
Shanghai-
Hong Kong
Trajectory-based features, SIFT, STIP, MFCCs audio-visual 0.624
TEC Color, motion, acoustic features audio-visual 0.618
TUM Acoustic energy and spectral, color, texture,
optical flow
audio-visual 0.484
SC-based
(ours)
BoAW with sparse coding audio 0.444
VQ-based
(ours)
BoAW with vector quantization audio 0.387
LIG-MIRM Color, texture, bag of SIFT and MFCCs audio-visual 0.314
NII Visual concepts learned from color and
texture
visual 0.308
DYNI-LSIS Multi-scale local binary pattern visual 0.125
* Average Precision at 100 (the official evaluation metric of the MediaEval VSD task)
Sample Video Shots (Correctly Classified)
1517. Juni 2013 CBMI‘2013
Sample Video Shots (Wrongly Classified)
1617. Juni 2013 CBMI‘2013
Conclusions
1717. Juni 2013 CBMI‘2013
► An approach for movie violent content detection at video shot
level is presented.
► Mid-level audio features based on BoAW approach with two
different coding schemes are employed.
► Promising results are obtained
 the SC-based BoAW outperforms all uni-modal submissions in
the MediaEval VSD task except one vision-based method.
► One significant point is that the average precision variation of
the proposed method is high for movies of varying violence
levels.
Future Work
1817. Juni 2013 CBMI‘2013
► Construction of more sophisticated mid-level representations
for video content analysis.
► Augmenting the feature set by including visual features (both
low-level and mid-level) helps further improving classification.
► Extend our approach to user-generated videos.
 Different from Hollywood movies, these videos are not
professionally edited, e.g., in order to enhance dramatic
scenes.
1917. Juni 2013 CBMI‘2013
THANKS!
QUESTIONS?

Contenu connexe

Similaire à Detecting Violent Content in Hollywood Movies by Mid-level Audio Representations

Ijarcet vol-2-issue-4-1347-1351
Ijarcet vol-2-issue-4-1347-1351Ijarcet vol-2-issue-4-1347-1351
Ijarcet vol-2-issue-4-1347-1351Editor IJARCET
 
An In-Depth Evaluation of Multimodal Video Genre Categorization
An In-Depth Evaluation of Multimodal Video Genre CategorizationAn In-Depth Evaluation of Multimodal Video Genre Categorization
An In-Depth Evaluation of Multimodal Video Genre CategorizationIonut Mironica
 
Fisher Kernel based Relevance Feedback for Multimodal Video Retrieval
Fisher Kernel based Relevance Feedback for Multimodal Video RetrievalFisher Kernel based Relevance Feedback for Multimodal Video Retrieval
Fisher Kernel based Relevance Feedback for Multimodal Video RetrievalIonut Mironica
 
TVSum: Summarizing Web Videos Using Titles
TVSum: Summarizing Web Videos Using TitlesTVSum: Summarizing Web Videos Using Titles
TVSum: Summarizing Web Videos Using TitlesNEERAJ BAGHEL
 
MediaEval 2017 - Interestingness Task: EURECOM @MediaEval 2017: Media Genre I...
MediaEval 2017 - Interestingness Task: EURECOM @MediaEval 2017: Media Genre I...MediaEval 2017 - Interestingness Task: EURECOM @MediaEval 2017: Media Genre I...
MediaEval 2017 - Interestingness Task: EURECOM @MediaEval 2017: Media Genre I...multimediaeval
 
Media Genre Inference for Predicting Media Interestingness
Media Genre Inference for Predicting Media InterestingnessMedia Genre Inference for Predicting Media Interestingness
Media Genre Inference for Predicting Media InterestingnessBenoit HUET
 
ppt icitisee 2022_without_recording.pptx
ppt icitisee 2022_without_recording.pptxppt icitisee 2022_without_recording.pptx
ppt icitisee 2022_without_recording.pptxssusera4da91
 
Analysis of visual similarity in news videos with robust and memory efficient...
Analysis of visual similarity in news videos with robust and memory efficient...Analysis of visual similarity in news videos with robust and memory efficient...
Analysis of visual similarity in news videos with robust and memory efficient...MediaMixerCommunity
 
Violent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence ClusteringViolent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence Clusteringcsandit
 
Violent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence ClusteringViolent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence Clusteringcsandit
 
Violent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence ClusteringViolent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence Clusteringcsandit
 
Multimedia Information Retrieval: What is it, and why isn't ...
Multimedia Information Retrieval: What is it, and why isn't ...Multimedia Information Retrieval: What is it, and why isn't ...
Multimedia Information Retrieval: What is it, and why isn't ...webhostingguy
 
AMATH582_Final_Poster
AMATH582_Final_PosterAMATH582_Final_Poster
AMATH582_Final_PosterMark Chang
 
Ac02417471753
Ac02417471753Ac02417471753
Ac02417471753IJMER
 
image processing image processing image processing
image processing  image processing  image processingimage processing  image processing  image processing
image processing image processing image processingSportsAcademy1
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Action event retrieval from cricket video using audio energy feature for event
Action event retrieval from cricket video using audio energy feature for eventAction event retrieval from cricket video using audio energy feature for event
Action event retrieval from cricket video using audio energy feature for eventIAEME Publication
 
Action event retrieval from cricket video using audio energy feature for even...
Action event retrieval from cricket video using audio energy feature for even...Action event retrieval from cricket video using audio energy feature for even...
Action event retrieval from cricket video using audio energy feature for even...IAEME Publication
 

Similaire à Detecting Violent Content in Hollywood Movies by Mid-level Audio Representations (20)

Ijarcet vol-2-issue-4-1347-1351
Ijarcet vol-2-issue-4-1347-1351Ijarcet vol-2-issue-4-1347-1351
Ijarcet vol-2-issue-4-1347-1351
 
An In-Depth Evaluation of Multimodal Video Genre Categorization
An In-Depth Evaluation of Multimodal Video Genre CategorizationAn In-Depth Evaluation of Multimodal Video Genre Categorization
An In-Depth Evaluation of Multimodal Video Genre Categorization
 
Fisher Kernel based Relevance Feedback for Multimodal Video Retrieval
Fisher Kernel based Relevance Feedback for Multimodal Video RetrievalFisher Kernel based Relevance Feedback for Multimodal Video Retrieval
Fisher Kernel based Relevance Feedback for Multimodal Video Retrieval
 
TVSum: Summarizing Web Videos Using Titles
TVSum: Summarizing Web Videos Using TitlesTVSum: Summarizing Web Videos Using Titles
TVSum: Summarizing Web Videos Using Titles
 
Deep Audio and Vision - Eva Mohedano - UPC Barcelona 2018
Deep Audio and Vision - Eva Mohedano - UPC Barcelona 2018Deep Audio and Vision - Eva Mohedano - UPC Barcelona 2018
Deep Audio and Vision - Eva Mohedano - UPC Barcelona 2018
 
MediaEval 2017 - Interestingness Task: EURECOM @MediaEval 2017: Media Genre I...
MediaEval 2017 - Interestingness Task: EURECOM @MediaEval 2017: Media Genre I...MediaEval 2017 - Interestingness Task: EURECOM @MediaEval 2017: Media Genre I...
MediaEval 2017 - Interestingness Task: EURECOM @MediaEval 2017: Media Genre I...
 
Media Genre Inference for Predicting Media Interestingness
Media Genre Inference for Predicting Media InterestingnessMedia Genre Inference for Predicting Media Interestingness
Media Genre Inference for Predicting Media Interestingness
 
ppt icitisee 2022_without_recording.pptx
ppt icitisee 2022_without_recording.pptxppt icitisee 2022_without_recording.pptx
ppt icitisee 2022_without_recording.pptx
 
Analysis of visual similarity in news videos with robust and memory efficient...
Analysis of visual similarity in news videos with robust and memory efficient...Analysis of visual similarity in news videos with robust and memory efficient...
Analysis of visual similarity in news videos with robust and memory efficient...
 
Violent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence ClusteringViolent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence Clustering
 
Violent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence ClusteringViolent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence Clustering
 
Violent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence ClusteringViolent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence Clustering
 
C04841417
C04841417C04841417
C04841417
 
Multimedia Information Retrieval: What is it, and why isn't ...
Multimedia Information Retrieval: What is it, and why isn't ...Multimedia Information Retrieval: What is it, and why isn't ...
Multimedia Information Retrieval: What is it, and why isn't ...
 
AMATH582_Final_Poster
AMATH582_Final_PosterAMATH582_Final_Poster
AMATH582_Final_Poster
 
Ac02417471753
Ac02417471753Ac02417471753
Ac02417471753
 
image processing image processing image processing
image processing  image processing  image processingimage processing  image processing  image processing
image processing image processing image processing
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Action event retrieval from cricket video using audio energy feature for event
Action event retrieval from cricket video using audio energy feature for eventAction event retrieval from cricket video using audio energy feature for event
Action event retrieval from cricket video using audio energy feature for event
 
Action event retrieval from cricket video using audio energy feature for even...
Action event retrieval from cricket video using audio energy feature for even...Action event retrieval from cricket video using audio energy feature for even...
Action event retrieval from cricket video using audio energy feature for even...
 

Dernier

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Dernier (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Detecting Violent Content in Hollywood Movies by Mid-level Audio Representations

  • 1. Competence Center Information Retrieval & Machine Learning 11th International Workshop on Content-Based Multimedia Indexing (CBMI), Veszprem, Hungary, 2013 Detecting Violent Content in Hollywood Movies by Mid-level Audio Representations Esra Acar Esra Acar, Frank Hopfgartner, Sahin Albayrak
  • 2. Outline 217. Juni 2013 CBMI‘2013 ► Motivation ► The Violence Detection Method  Audio Representation of Videos  Learning Violence Detection Model ► Performance Evaluation ► Conclusions & Future Work
  • 3. Motivation 317. Juni 2013 CBMI‘2013 ► Goal: the detection of most violent scenes in Hollywood movies. ► Use case: Parents select or reject movies by previewing parts of the movies that include the most violent moments. ► We investigate the discriminative power of mid-level audio features  Bag-of-Audio Words (BoAW) representations based on Mel- Frequency Cepstral Coefficients (MFCCs)  Two different BoAW construction methods Vector quantization-based (VQ-based) method, and Sparse coding-based (SC-based) method
  • 4. The Violence Detection Method 417. Juni 2013 CBMI‘2013 ►The definition of violence: “physical violence or accident resulting in human injury or pain” “violence” as defined in the MediaEval Violent Scenes Detection (VSD) task. ►Two main components of the method: The representation of video shots The learning of a violence model
  • 5. Audio Representation of Videos (1) 517. Juni 2013 CBMI‘2013 ► Mel-Frequency Cepstral Coefficients (MFCCs)  are commonly used in speech recognition and music information retrieval (e.g., genre classification).  relate better to human perception.  work well for the detection of excitement/non-excitement (i.e., indicators of the excitement level of video segments). ► MFCC-based audio representation is employed for the description of the audio content of Hollywood movies. ► Using mid-level representations may help modeling video segments one step closer to human perception. Examples are:  bags of features,  the upper units of convolutional networks or deep belief networks
  • 6. Audio Representation of Videos (2) 617. Juni 2013 CBMI‘2013 ► We use mid-level audio features based on MFCCs (i.e., BoAW approach). ► The BoAW approach with two different coding schemes  Vector quantization (by k-means clustering) dividing feature vectors into groups, where each group is represented by its centroid point (e.g., k-means clustering algorithm).  Sparse coding (by the LARS algorithm) representing a feature vector as a linear combination of an over- complete set of basis vectors.
  • 7. Audio Representation of Videos (3) 717. Juni 2013 CBMI‘2013 Dictionary Generation Phase
  • 8. Audio Representation of Videos (4) 817. Juni 2013 CBMI‘2013 Representation Construction Phase
  • 9. Learning Violence Detection Model 917. Juni 2013 CBMI‘2013 Learning a Violence Model
  • 10. Performance Evaluation 1017. Juni 2013 CBMI‘2013 ► Dataset:  32,708 video shots from 18 Hollywood movies of different genres (ranging from extremely violent movies to movies without violence). Training set: 26,138 video shots from 15 movies. Test set: 6,570 video shots from 3 movies. ► Ground truth:  generated by 7 human assessors. Violent movie segments are annotated at the frame-level.  Each video shot is labeled as violent or non-violent. The characteristics of training and test datasets
  • 11. Evaluation Metrics 1117. Juni 2013 CBMI‘2013 ► The ranking of violent shots are more important for the use case. ► Metrics other than precision and recall are required to compare the performance. ► Average precision at 20 & 100 are used (official metrics in the MediaEval VSD task) ► R-precision which can be seen as an alternative to the precision at k.
  • 12. Results & Discussions (1) 1217. Juni 2013 CBMI‘2013 Average Precision at 100 for the Baseline and Our Methods Average Precision at 20 & 100 and R-precision for the VQ- and SC-based methods
  • 13. Results & Discussions (2) 1317. Juni 2013 CBMI‘2013 Average Precision at 20 & 100 and R-precision on Independence Day Average Precision at 20 & 100 and R-precision on Dead Poets Society Average Precision at 20 & 100 and R-precision on Fight Club
  • 14. Results & Discussions (3) 1417. Juni 2013 CBMI‘2013 Team Features Modality APat100* ARF Color, texture, audio and concepts audio-visual 0.651 Shanghai- Hong Kong Trajectory-based features, SIFT, STIP, MFCCs audio-visual 0.624 TEC Color, motion, acoustic features audio-visual 0.618 TUM Acoustic energy and spectral, color, texture, optical flow audio-visual 0.484 SC-based (ours) BoAW with sparse coding audio 0.444 VQ-based (ours) BoAW with vector quantization audio 0.387 LIG-MIRM Color, texture, bag of SIFT and MFCCs audio-visual 0.314 NII Visual concepts learned from color and texture visual 0.308 DYNI-LSIS Multi-scale local binary pattern visual 0.125 * Average Precision at 100 (the official evaluation metric of the MediaEval VSD task)
  • 15. Sample Video Shots (Correctly Classified) 1517. Juni 2013 CBMI‘2013
  • 16. Sample Video Shots (Wrongly Classified) 1617. Juni 2013 CBMI‘2013
  • 17. Conclusions 1717. Juni 2013 CBMI‘2013 ► An approach for movie violent content detection at video shot level is presented. ► Mid-level audio features based on BoAW approach with two different coding schemes are employed. ► Promising results are obtained  the SC-based BoAW outperforms all uni-modal submissions in the MediaEval VSD task except one vision-based method. ► One significant point is that the average precision variation of the proposed method is high for movies of varying violence levels.
  • 18. Future Work 1817. Juni 2013 CBMI‘2013 ► Construction of more sophisticated mid-level representations for video content analysis. ► Augmenting the feature set by including visual features (both low-level and mid-level) helps further improving classification. ► Extend our approach to user-generated videos.  Different from Hollywood movies, these videos are not professionally edited, e.g., in order to enhance dramatic scenes.
  • 19. 1917. Juni 2013 CBMI‘2013 THANKS! QUESTIONS?