SlideShare une entreprise Scribd logo
1  sur  50
Télécharger pour lire hors ligne
Noboru Babaguchi
Osaka University
Joint Work with Prof. N. Nitta
ICME2013 Co-located WS
MMIX13, Keynote
San Jose, July 18, 2013
 Definitions, Background, Related Work
 Multimedia Remixing Support System
 Video Clip Sequence Creation
 Music Clip Selection
 Shot Extraction
 Conclusion and Future Work
 From wikipedia…
A remix is a song that has been edited to sound different from the original version.
The person who remixed it might have changed the pitch of the singers' voice,
changed the tempo and speed and has made the song shorter or longer, or
instead of hearing just one person singing they might have duplicated the voice to
make it sound like two people are singing, or make the voice echo.
Remixes should not be confused with edits, which usually involve shortening a
final stereo master for marketing or broadcasting purposes. … A remix song
recombines audio pieces from a recording to create an altered version of the song.
In recent years the concept of the remix has been applied analogously to
other media. …. Scary Movie series is famous for its comic remix of various well-
known horror movies such as Ring, Scream, and Saw.
Video Remix: a video clip made by recombining various media
components to create an altered version of the original videos.
Video transition effects
(Cut, fade-in/out,
dissolve, etc.)
Audio clips
(music, sound effects,
voices, etc.)
Original video clips
Video remixes (e.g. movie trailers)
Video clip
selection & arrangement
Multimedia stream
Combination
How can we create video remixes of good quality?
from “The School
of Rock” (2003)
 Semantic Aspect:
 What should we present? (Semantic Content)
 Highlights of Sports Games, etc.
 Affective Aspect:
 How should we present the video content?
(Aesthetic Compatibility, Film Syntax)
 Commercial Films,Movie Trailers, etc.
 How to arrange video clips or what music clip to augment
to enhance the expressive quality
Two aspects in video remixing
Video Summarization
Video Remix
Scene-Music Relation
Shot-Scene Relation A sequence of L video shots
A sequence of D music clips
A video scene
Problem of Video Remixing
A music clip
= A sequence of D video scenes
An excerpt from a video clip
To maintain the feeling of continuity in a scene
 Hitchcock[Girgensohn2001]
 Template-based Editing[Davis2003]
 Lazycut[Hua2005]
 Emotion-based[Canini2010]
 Video-Music Mixing
[Mulhem2003][Hua2004][Wang2005][Yoon20
09][Cristani2010]
Video clip selection and arrangement
Focused on
how various types of video clips are arranged in sequence.
For example…
• A scene has to have at least three video clips[Sundaram01].
• Two video shots of extremely different shot sizes
should not be connected[Kumano02].
• The duration of a shot recorded with the camera fixed
is up to 15 seconds[Kumano02].
Film Syntax
[Sundaram01] H. Sundaram, et al., “Condensing computable scenes using visual complexity and film syntax analysis,” Proc. ICME,
pp.389-392, 2001.
[Kumano02] M. Kumano, et al., “Video editing support system based on video content analysis,” Proc. ACCV, pp.628-633, 2002.
[Canini10] L. Canini, et al., “Interactive video mashup based on emotional identity,” Proc. European Signal Processing Conf., pp.1499-1503, 2010.
Aesthetic Compatibility
•Shots with similar emotional impact
should be connected[Canini10].
Music clip selection
Focused on which types of music clips are mixed with video shots.
For example…
• dynamic, motion, and pitch of image and audio streams
coincide with each other[Mulhem03].
• novelty, velocity, and brightness of image and audio streams
coincide with each other[Yoon09].
Aesthetic Compatibility
[Mulhem03] P. Mulhem, et al., “Pivot vector space approach for audio-video mixing,” IEEE Multimedia, 10(2), pp.28-40, 2003
[Yoon09] J.-C. Yoon, et al., “Automated music video generation using multi-level feature-based segmentation,” MTAP, 41(2), pp.197-214, 2009
[Cristani10] M. Cristani, et al., “Toward an automatically generated soundtrack from low-level cross-modal correlations for automotive scenarios,”
Proc. ACM Multimedia, pp.551-559, 2010
Determined heuristically
• brightness of image and audio streams
and rhythm of audio stream and optical flow in image stream
coincide with each other[Cristani10]
Determined statistically
Multimedia Remixing Support
System
 It is difficult to explicitly defining the rules and
know-how about how the video and music clips
should be arranged, considering the aesthetic
compatibility.
 The rules and structures commonly used in
professionally created examples can be modeled
by standard machine learning techniques.
 Non-professional users can be supported on their
interface based on the models which implicitly
describe shot-scene and scene-music relations
considering aesthetic compatibility.
A Set of Video Remix Examples
Professionally Created Video Remixes
A Set of Video Remix Examples
Target: Remixing original video clips based on Examples
A Set of Music Clips
A Set of Original Video Clips
video remix
video remix
I) Video Clip Sequence Creation
Interface
II) Music Clip Selection
III) Shot Extraction
(Video and Music
Synchronization)
User
・・・
・・・
A set of video clips:
A set of music clips:
A Set of
Video Remix Examples
・・・
・
・
・・・
Video Remix Template
Shot
Scene
Video Clip Suggestions
N. Nitta and N. Babaguchi, “Example-based video remixing,” Multimedia Tools and Applications,
51(2), pp.649-673, 2011
N. Nitta and N. Babaguchi, “Example-based home video remixing,” Proc. ICME, 2011
Video Remix Examples
Symbol Sequence
Home (Personal) Videos
Video Clips
Segmentation
Suitability[Nitta2011]
To Template
Perceived Quality[Tao2007]
B AB CGE
Template
Interface
Overview of Procedure I)
Template
Generation
T. Mei, et al., "Home Video Visual Quality Assessment With Spatiotemporal Factors," IEEE Trans.
Circuits and Systems for Video Technology, vol.17, no.6, pp.699-706, 2007.
Video Remix Examples
Slow
Scene
Active
Scene
HMM
Example-based Template Generation
Shot Length
Brightness
Motion Intensity
w/wo Camera Work
w/wo Human Objects
Low-level Features
Feature
Extraction
・・・
Sequences of video shots
Shot
ihg
fed
cba
Symbolization
Symbol Sequence
Video Remix Template
(New Symbol Sequence
& State Sequence)
GA
A Sequence of L Shots
A Sequence of D Scenes
Video Clip 1 Video Clip 2 Video Clip 3
A Home Video
Suitability to Template 0.3 0.20.7
Perceived Quality 0.7 0.5 0.6
From Shot to Video Clip
Shots in target video are divided into
video clips based on the camerawork
Video clip selection
Video Remix Template
Interface
3D book-style video clip presentation
Timeline Presentation
Suitability
To Template
Perceived Quality
◎
× △
▲
spine
Fore edgeFore edge
Interface
 Video remix examples: 61 action movie trailers
 Video clips: 265 home (personal) video clips recording a sports
field day held by a kindergarten
 Subjective evaluation by 8 subjects
 Compare with video clip sequence created by considering only
the perceived quality of video clips
Subjective Score: 3.5 Subjective Score: 3
With Template*Without Template
* Selected video clips are shortened according to the template
Created Video Clip Sequence
Using action movie trailers as examples
resulted in creating a sequence of many short video clips
N. Nitta and N. Babaguchi, “Example-based video remixing support system,”
Proc. ACM Multimedia, pp.563-572, 2011
Video Clip Sequence (Scene)
Overview of Procedure II)
A Set of Video Remix Examples (Scenes)
A set of Music Clips
visually similar
video remix examples
similar music clips
 Evaluate the compatibility among video scenes and music clips by their
distances in the video scene and music feature spaces
 Learn non-linear mapping of music feature space so that the distances
among video scenes and the mixed music clips would be correlated
[Suzuki07]
Music Clip Feature Space
(Music Clips
Mixed to Example Video Scenes)
Video scene feature space
(Example Video Scenes)
Expected Music clip feature space
(Music Clips
Mixed to Example Video Scenes)
[Suzuki07] K. Suzuki, et al., “A similarity-based neural network for facial expression analysis,” Pattern Recognition Letters, 28(9), pp.1104-1111, 2007
 Music Clip Selection
 Video Scenes・・・Visual Features
 Music Clips・・・Audio Features
 [Zettl99]
 Emotion-based Music Classification
[Zettl99] H. Zettl, “Sight Sound Motion: Applied Media Aesthetics,” Wadsworth Publishing, 1999
 Consists of 2 Neural Networks
 Input: Audio Features xA
i and xB
i of Music Clips A and B
 Output: Transformed Audio Features yA
j and yB
j of Music Clips A and B
 Learn the weights wl,m of Neural Network so that the differences between the distances of
yA
j and yB
j and the distances of the video scenes mixed with music clips A and B would be
minimized.
 wl,m: Weight for the edge between nodes I and m.
・
・
・ ・
・
・
・
・
・
・
・
・ ・
・
・
・・・
TAB
dAB
Teacher
(Distances ofVideo Scenes
Mixed with Music Clips A and B )
Input A
Input B
xA
i
xB
i
Neural Network A
Neural Network B
yA
j
yB
j
Distance calclulation
Interface
 Video Remix Examples: 61 Action Movie Trailers
 Video Scene Examples :45 Scenes
 Music Clips:180 Music Clips of Various Genres
(Movie Soundtracks, Classical Music, Japanese-pop, Western-pop, etc.)
 Video Clips:
 Shots extracted from Original Movies
 265 Home Video Clips recording a sports field day held by a kindergarten
 Video Clip Sequence:
 Made by Procedure I)
 Input:10 Video Scenes randomly extracted from movie trailers (without Audio Stream )
 10 subjects rated (1: very bad – 10: very good) 10 video scenes mixed with
 Video1) 3 Music Clips Selected by Proposed Approach
 Video2) Music Clips most similar to the music excerpts mixed with the 3 least
similar video scenes
 Video3) Music Clip mixed with the video scenes in movie trailers (baseline:
professional)
 Video4) 3 Music Clips selected in the same way as for Video 1) without music
feature space transformation
 Video5) 3 Music Clips selected in the same way as for Video 2) without music
feature space transformation Video1 –Video 2 = 1.72±0.34
(95% confidence interval)
⇒indicates the effectiveness of
similarity-based music clip selection
Video1 –Video 4 = 1.11±0.35
⇒indicates the effectiveness of
music feature space transformation
Video 1 → closest toVideo 3
⇒selected music clips are subjectively
closest to professionally selected ones
0
1
2
3
4
5
6
7
8
Video1
Video2
Video3
Video4
Video5
Average Subjective Scores
6.1
4.4
7.2
5.0
4.5
Video1
6.8
Video2
2.7
Video3
8.3
Video4
6.4
Video5
3.0
Score
Video1
8.5
Video2
2.1
Video3
5.5
Video4
5.5
Video5
2.3
Score
Subjective Score: 3.8 Subjective Score: 5.3
With Template*Without Template
Video Clip Sequence after Music Mixing
Subjective score improved largely after music mixing
Created video clip sequence and selected music clips
are synergetic in improving the expressive quality.
* Selected video clips are shortened according to the template
Y. Kurihara, N. Nitta, and N. Babaguchi, “Automatic appropriate segment extraction from shots
based on learning from example videos,” Proc. PSIVT, pp.1082-1093, 2009
Y. Kurihara, N. Nitta, and N. Babaguchi, “Appropriate segment extraction from shots based on
temporal patterns of example videos,” Proc. MMM, pp.253-264, 2008
VideoClip
SequenceVideoRemix
Video Clip 1
Shot 1 Shot 2 Shot 3
Video Clip 3Video Clip 2
 A video clip needs to be shortened.
 A video clip contains redundant parts.
Which part of a video clip should be extracted as a shot?
Shot Extraction from Selected Video Clip
k frames
Discarded part
(Non-shot)
Selected Part
(Shot)
Video Clip Example Video Clip
Shot
Extraction
Feature Extraction
Pattern
Scan for the k frames
which best matches the shot HMM
Feature Extraction
ShotSymbolization
Symbol
Sequence
Shot HMM
Non-shot HMM
Overview of Procedure III)
•Shot Classification
action and conversation
•Feature extraction
Shot
Action Conversation Scenery ・・・
※VSTD : Volume Standard Deviation,
LVFR : Low Volume Signal Ratio,ERSB : Energy Ratio of Ferquency SubBand
ZCR : Zero Crossing Ratio
Each type of shot is characterized
by different features
 Examples:Movies+Trailers
Video Clips:Shots in Movies
Shots:Shots in Trailers
 Shot extraction from 69 video clips (shots in movies)
 Shot Length (k) = Length of corresponding shots in trailers
(32.3% ×video clips on average)
2247Test
1210Training
ConversationAction
Experiments
Objective Evaluation
Video Clip (Action)
Ground Truth
(Shot in Trailer)
Extracted Shot
82 frames
k= 9 frames
Difference:3 frames(0.3sec)
•Compare Extracted Shot
with Ground Truth
•1 frame=0.1 sec
107 frames
Extracted Shot
Ground Truth
k= 17 frames
Difference:3 frames (0.3sec)
Video Clip (Conversation)
-25
-20
-15
-10
-5
0
5
10
0 5 10 15 20 25 30 35 40 45 50
フレーム#
LogP
f(n);編集区間モデル
g(n);非編集区間モデル
f(n)-g(n)
Extracted Shot Ground Truth
hk(f)-gk(f)
Correctly Extracted Shot
47 frames
k = 5 frames
Difference:3 frames
Video Clip (Action)
Extracted Shot
Ground Truth
Shot HMM
Non-Shot HMM
frame
-30
-25
-20
-15
-10
-5
0
5
0 5 10 15 20 25 30 35
フレーム#
logP
f(n);編集区間モデル
g(n);非編集区間モデル
f(n)-g(n)
Extracted Shot Ground Truth
hk(f)-gk(f)
Incorrectly Extracted Shot
35 frames
k = 7 frames
Difference:26 frames
Extracted Shot
Ground Truth
Shot HMM
Non-Shot HMM
frame
Objective Evaluation
clipsvideoof#
extractioncorrectof#
accuracy 
※Correct Extraction : Shot was extracted within T-frame Difference
1 frame = 0.1 sec
Correct shots were extracted
from 72.5%(50/69) of video clips when T=5
73%(16/22)72%(34/47)T=5
64%(14/22)60%(28/47)T=3
50%(11/22)53%(25/47)T=2
Action Conversation
 14 subjects watch original long video clips, and then
three kinds of shortly extracted shots:
①Ground Truth
②Extracted Shot
③Random Shot
in random order and rank them.
(There can be a tie)
or
or
Ground Truth:③
Extracted Shot:②
Random Shot:①
Video Clip (36 frames)
①
②
③ k = 15 frames
Ground Truth
Extracted Shot Random Shot
・・・Rank 1
・・・Rank 2
・・・Rank 3
69.1%
26.9%
4.0%
53.9%38.9%
7.2%
7.1%
12.9%
80.0%
Action:18 video clips
Conversation:13 video clips
Subjective Evaluation
Extracted Shot ≒Ground Truth >> Random Shot
Subjective Score: 6.2Subjective Score: 3.9
Without Template With Template
Created Video Remix
Proposed Comparative
I II III I II III
Length
(min:sec)
0:36 0:43 10:56 10:59
score 3 5.3 6.2 3.5 3.8 3.9
 Introduced an example-based approach for video
remixing
 Video Clip Sequence Creation
 Music Clip Selection
 Shot Extraction
 Interface
 Experiments using movie trailers as remix examples
and movies and home videos as video clips
 Verified the effectiveness of using remix examples
 With Support(6.2), Without Support(3.9)
Conclusion
 Improvement of Interface
 More investigations using various types/genres
of video remix examples
 How many examples do we need?
 Good examples can reduce the number of
examples.
Example-Based Remixing of Multimedia Contents

Contenu connexe

Similaire à Example-Based Remixing of Multimedia Contents

pivot vector space approach in audio-video mixing
pivot vector space approach in audio-video mixingpivot vector space approach in audio-video mixing
pivot vector space approach in audio-video mixingkanikarr
 
Storyboarding
StoryboardingStoryboarding
Storyboardingdpagoffs
 
Cutting to time task
Cutting to time taskCutting to time task
Cutting to time taskCraig Osborne
 
TAAI 2016 Keynote Talk: It is all about AI
TAAI 2016 Keynote Talk: It is all about AITAAI 2016 Keynote Talk: It is all about AI
TAAI 2016 Keynote Talk: It is all about AIYi-Shin Chen
 
Film Production Workflow
Film Production WorkflowFilm Production Workflow
Film Production WorkflowJohn Grace
 
The History And Development Of Editing.
The History And Development Of Editing.The History And Development Of Editing.
The History And Development Of Editing.mediakhan
 
The history and development of editing
The history and development of  editingThe history and development of  editing
The history and development of editingmediakhan
 
Editing Presentation
Editing PresentationEditing Presentation
Editing PresentationZaxapias
 
Analysing Music Videos
Analysing Music VideosAnalysing Music Videos
Analysing Music VideosM Taylor
 
Music video codes and conventions. .pdf
Music video codes and conventions.  .pdfMusic video codes and conventions.  .pdf
Music video codes and conventions. .pdfMatijaSekulic
 
Convetions in Audio Production and Post-production
Convetions in Audio Production and Post-productionConvetions in Audio Production and Post-production
Convetions in Audio Production and Post-production952501
 
Multimodal deep learning
Multimodal deep learningMultimodal deep learning
Multimodal deep learningAkhter Al Amin
 
Video editing - Introduction
Video editing - IntroductionVideo editing - Introduction
Video editing - IntroductionMedia Center IMAC
 
Television production
Television productionTelevision production
Television productionRaj Mohan
 
Editing in TV Drama
Editing in TV DramaEditing in TV Drama
Editing in TV DramaZoe Lorenz
 

Similaire à Example-Based Remixing of Multimedia Contents (20)

pivot vector space approach in audio-video mixing
pivot vector space approach in audio-video mixingpivot vector space approach in audio-video mixing
pivot vector space approach in audio-video mixing
 
Storyboarding
StoryboardingStoryboarding
Storyboarding
 
Cutting to time task
Cutting to time taskCutting to time task
Cutting to time task
 
Sound mm
Sound mmSound mm
Sound mm
 
TAAI 2016 Keynote Talk: It is all about AI
TAAI 2016 Keynote Talk: It is all about AITAAI 2016 Keynote Talk: It is all about AI
TAAI 2016 Keynote Talk: It is all about AI
 
Film Production Workflow
Film Production WorkflowFilm Production Workflow
Film Production Workflow
 
cali.ppt
cali.pptcali.ppt
cali.ppt
 
The History And Development Of Editing.
The History And Development Of Editing.The History And Development Of Editing.
The History And Development Of Editing.
 
The history and development of editing
The history and development of  editingThe history and development of  editing
The history and development of editing
 
Editing Presentation
Editing PresentationEditing Presentation
Editing Presentation
 
Analysing Music Videos
Analysing Music VideosAnalysing Music Videos
Analysing Music Videos
 
Music video codes and conventions. .pdf
Music video codes and conventions.  .pdfMusic video codes and conventions.  .pdf
Music video codes and conventions. .pdf
 
Presentation1
Presentation1Presentation1
Presentation1
 
Convetions in Audio Production and Post-production
Convetions in Audio Production and Post-productionConvetions in Audio Production and Post-production
Convetions in Audio Production and Post-production
 
Multimodal deep learning
Multimodal deep learningMultimodal deep learning
Multimodal deep learning
 
Video editing - Introduction
Video editing - IntroductionVideo editing - Introduction
Video editing - Introduction
 
Television production
Television productionTelevision production
Television production
 
Video Production (CH4).pptx
Video Production (CH4).pptxVideo Production (CH4).pptx
Video Production (CH4).pptx
 
Subtitle
SubtitleSubtitle
Subtitle
 
Editing in TV Drama
Editing in TV DramaEditing in TV Drama
Editing in TV Drama
 

Plus de MediaMixerCommunity

VideoLecturesMashup: using media fragments and semantic annotations to enable...
VideoLecturesMashup: using media fragments and semantic annotations to enable...VideoLecturesMashup: using media fragments and semantic annotations to enable...
VideoLecturesMashup: using media fragments and semantic annotations to enable...MediaMixerCommunity
 
Re-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playoutRe-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playoutMediaMixerCommunity
 
Remixing Media on the Web: Media Fragment Specification and Semantics
Remixing Media on the Web: Media Fragment Specification and SemanticsRemixing Media on the Web: Media Fragment Specification and Semantics
Remixing Media on the Web: Media Fragment Specification and SemanticsMediaMixerCommunity
 
Re-using Media on the Web tutorial: Media Fragment Creation and Annotation
Re-using Media on the Web tutorial: Media Fragment Creation and AnnotationRe-using Media on the Web tutorial: Media Fragment Creation and Annotation
Re-using Media on the Web tutorial: Media Fragment Creation and AnnotationMediaMixerCommunity
 
Re-using Media on the Web Tutorial: Introduction and Examples
Re-using Media on the Web Tutorial: Introduction and ExamplesRe-using Media on the Web Tutorial: Introduction and Examples
Re-using Media on the Web Tutorial: Introduction and ExamplesMediaMixerCommunity
 
Semantic Multimedia Remixing - MediaEval 2013 Search and Hyperlinking Task
Semantic Multimedia Remixing - MediaEval 2013 Search and Hyperlinking TaskSemantic Multimedia Remixing - MediaEval 2013 Search and Hyperlinking Task
Semantic Multimedia Remixing - MediaEval 2013 Search and Hyperlinking TaskMediaMixerCommunity
 
Opening up audiovisual archives for media professionals and researchers
Opening up audiovisual archives for media professionals and researchersOpening up audiovisual archives for media professionals and researchers
Opening up audiovisual archives for media professionals and researchersMediaMixerCommunity
 
The Sensor Web - New Opportunities for MediaMixing
The Sensor Web - New Opportunities for MediaMixingThe Sensor Web - New Opportunities for MediaMixing
The Sensor Web - New Opportunities for MediaMixingMediaMixerCommunity
 
Building a linked data based content discovery service for the RTÉ Archives
Building a linked data based content discovery service for the RTÉ ArchivesBuilding a linked data based content discovery service for the RTÉ Archives
Building a linked data based content discovery service for the RTÉ ArchivesMediaMixerCommunity
 
Media Mixing in the broadcast TV industry
Media Mixing in the broadcast TV industryMedia Mixing in the broadcast TV industry
Media Mixing in the broadcast TV industryMediaMixerCommunity
 
Building a linked data based content discovery service for the RTÉ Archives
Building a linked data based content discovery service for the RTÉ ArchivesBuilding a linked data based content discovery service for the RTÉ Archives
Building a linked data based content discovery service for the RTÉ ArchivesMediaMixerCommunity
 
Semantic technologies for copyright management
Semantic technologies for copyright managementSemantic technologies for copyright management
Semantic technologies for copyright managementMediaMixerCommunity
 
Tell me why! ain't nothin' but a mistake describing media item differences w...
Tell me why! ain't nothin' but a mistake  describing media item differences w...Tell me why! ain't nothin' but a mistake  describing media item differences w...
Tell me why! ain't nothin' but a mistake describing media item differences w...MediaMixerCommunity
 
A feature analysis based fragment remix instrument
A feature analysis based fragment remix instrumentA feature analysis based fragment remix instrument
A feature analysis based fragment remix instrumentMediaMixerCommunity
 
Video concept detection by learning from web images
Video concept detection by learning from web imagesVideo concept detection by learning from web images
Video concept detection by learning from web imagesMediaMixerCommunity
 
Fast object re detection and localization in video for spatio-temporal fragme...
Fast object re detection and localization in video for spatio-temporal fragme...Fast object re detection and localization in video for spatio-temporal fragme...
Fast object re detection and localization in video for spatio-temporal fragme...MediaMixerCommunity
 
Analysis of visual similarity in news videos with robust and memory efficient...
Analysis of visual similarity in news videos with robust and memory efficient...Analysis of visual similarity in news videos with robust and memory efficient...
Analysis of visual similarity in news videos with robust and memory efficient...MediaMixerCommunity
 
Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013MediaMixerCommunity
 

Plus de MediaMixerCommunity (19)

VideoLecturesMashup: using media fragments and semantic annotations to enable...
VideoLecturesMashup: using media fragments and semantic annotations to enable...VideoLecturesMashup: using media fragments and semantic annotations to enable...
VideoLecturesMashup: using media fragments and semantic annotations to enable...
 
Re-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playoutRe-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playout
 
Remixing Media on the Web: Media Fragment Specification and Semantics
Remixing Media on the Web: Media Fragment Specification and SemanticsRemixing Media on the Web: Media Fragment Specification and Semantics
Remixing Media on the Web: Media Fragment Specification and Semantics
 
Re-using Media on the Web tutorial: Media Fragment Creation and Annotation
Re-using Media on the Web tutorial: Media Fragment Creation and AnnotationRe-using Media on the Web tutorial: Media Fragment Creation and Annotation
Re-using Media on the Web tutorial: Media Fragment Creation and Annotation
 
Re-using Media on the Web Tutorial: Introduction and Examples
Re-using Media on the Web Tutorial: Introduction and ExamplesRe-using Media on the Web Tutorial: Introduction and Examples
Re-using Media on the Web Tutorial: Introduction and Examples
 
Semantic Multimedia Remixing - MediaEval 2013 Search and Hyperlinking Task
Semantic Multimedia Remixing - MediaEval 2013 Search and Hyperlinking TaskSemantic Multimedia Remixing - MediaEval 2013 Search and Hyperlinking Task
Semantic Multimedia Remixing - MediaEval 2013 Search and Hyperlinking Task
 
Opening up audiovisual archives for media professionals and researchers
Opening up audiovisual archives for media professionals and researchersOpening up audiovisual archives for media professionals and researchers
Opening up audiovisual archives for media professionals and researchers
 
The Sensor Web - New Opportunities for MediaMixing
The Sensor Web - New Opportunities for MediaMixingThe Sensor Web - New Opportunities for MediaMixing
The Sensor Web - New Opportunities for MediaMixing
 
Building a linked data based content discovery service for the RTÉ Archives
Building a linked data based content discovery service for the RTÉ ArchivesBuilding a linked data based content discovery service for the RTÉ Archives
Building a linked data based content discovery service for the RTÉ Archives
 
Media Mixing in the broadcast TV industry
Media Mixing in the broadcast TV industryMedia Mixing in the broadcast TV industry
Media Mixing in the broadcast TV industry
 
Building a linked data based content discovery service for the RTÉ Archives
Building a linked data based content discovery service for the RTÉ ArchivesBuilding a linked data based content discovery service for the RTÉ Archives
Building a linked data based content discovery service for the RTÉ Archives
 
Semantic multimedia remixing
Semantic multimedia remixingSemantic multimedia remixing
Semantic multimedia remixing
 
Semantic technologies for copyright management
Semantic technologies for copyright managementSemantic technologies for copyright management
Semantic technologies for copyright management
 
Tell me why! ain't nothin' but a mistake describing media item differences w...
Tell me why! ain't nothin' but a mistake  describing media item differences w...Tell me why! ain't nothin' but a mistake  describing media item differences w...
Tell me why! ain't nothin' but a mistake describing media item differences w...
 
A feature analysis based fragment remix instrument
A feature analysis based fragment remix instrumentA feature analysis based fragment remix instrument
A feature analysis based fragment remix instrument
 
Video concept detection by learning from web images
Video concept detection by learning from web imagesVideo concept detection by learning from web images
Video concept detection by learning from web images
 
Fast object re detection and localization in video for spatio-temporal fragme...
Fast object re detection and localization in video for spatio-temporal fragme...Fast object re detection and localization in video for spatio-temporal fragme...
Fast object re detection and localization in video for spatio-temporal fragme...
 
Analysis of visual similarity in news videos with robust and memory efficient...
Analysis of visual similarity in news videos with robust and memory efficient...Analysis of visual similarity in news videos with robust and memory efficient...
Analysis of visual similarity in news videos with robust and memory efficient...
 
Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013
 

Dernier

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Example-Based Remixing of Multimedia Contents

  • 1. Noboru Babaguchi Osaka University Joint Work with Prof. N. Nitta ICME2013 Co-located WS MMIX13, Keynote San Jose, July 18, 2013
  • 2.  Definitions, Background, Related Work  Multimedia Remixing Support System  Video Clip Sequence Creation  Music Clip Selection  Shot Extraction  Conclusion and Future Work
  • 3.  From wikipedia… A remix is a song that has been edited to sound different from the original version. The person who remixed it might have changed the pitch of the singers' voice, changed the tempo and speed and has made the song shorter or longer, or instead of hearing just one person singing they might have duplicated the voice to make it sound like two people are singing, or make the voice echo. Remixes should not be confused with edits, which usually involve shortening a final stereo master for marketing or broadcasting purposes. … A remix song recombines audio pieces from a recording to create an altered version of the song. In recent years the concept of the remix has been applied analogously to other media. …. Scary Movie series is famous for its comic remix of various well- known horror movies such as Ring, Scream, and Saw.
  • 4. Video Remix: a video clip made by recombining various media components to create an altered version of the original videos. Video transition effects (Cut, fade-in/out, dissolve, etc.) Audio clips (music, sound effects, voices, etc.) Original video clips Video remixes (e.g. movie trailers) Video clip selection & arrangement Multimedia stream Combination How can we create video remixes of good quality? from “The School of Rock” (2003)
  • 5.  Semantic Aspect:  What should we present? (Semantic Content)  Highlights of Sports Games, etc.  Affective Aspect:  How should we present the video content? (Aesthetic Compatibility, Film Syntax)  Commercial Films,Movie Trailers, etc.  How to arrange video clips or what music clip to augment to enhance the expressive quality Two aspects in video remixing Video Summarization
  • 6. Video Remix Scene-Music Relation Shot-Scene Relation A sequence of L video shots A sequence of D music clips A video scene Problem of Video Remixing A music clip = A sequence of D video scenes An excerpt from a video clip To maintain the feeling of continuity in a scene
  • 7.  Hitchcock[Girgensohn2001]  Template-based Editing[Davis2003]  Lazycut[Hua2005]  Emotion-based[Canini2010]  Video-Music Mixing [Mulhem2003][Hua2004][Wang2005][Yoon20 09][Cristani2010]
  • 8. Video clip selection and arrangement Focused on how various types of video clips are arranged in sequence. For example… • A scene has to have at least three video clips[Sundaram01]. • Two video shots of extremely different shot sizes should not be connected[Kumano02]. • The duration of a shot recorded with the camera fixed is up to 15 seconds[Kumano02]. Film Syntax [Sundaram01] H. Sundaram, et al., “Condensing computable scenes using visual complexity and film syntax analysis,” Proc. ICME, pp.389-392, 2001. [Kumano02] M. Kumano, et al., “Video editing support system based on video content analysis,” Proc. ACCV, pp.628-633, 2002. [Canini10] L. Canini, et al., “Interactive video mashup based on emotional identity,” Proc. European Signal Processing Conf., pp.1499-1503, 2010. Aesthetic Compatibility •Shots with similar emotional impact should be connected[Canini10].
  • 9. Music clip selection Focused on which types of music clips are mixed with video shots. For example… • dynamic, motion, and pitch of image and audio streams coincide with each other[Mulhem03]. • novelty, velocity, and brightness of image and audio streams coincide with each other[Yoon09]. Aesthetic Compatibility [Mulhem03] P. Mulhem, et al., “Pivot vector space approach for audio-video mixing,” IEEE Multimedia, 10(2), pp.28-40, 2003 [Yoon09] J.-C. Yoon, et al., “Automated music video generation using multi-level feature-based segmentation,” MTAP, 41(2), pp.197-214, 2009 [Cristani10] M. Cristani, et al., “Toward an automatically generated soundtrack from low-level cross-modal correlations for automotive scenarios,” Proc. ACM Multimedia, pp.551-559, 2010 Determined heuristically • brightness of image and audio streams and rhythm of audio stream and optical flow in image stream coincide with each other[Cristani10] Determined statistically
  • 11.  It is difficult to explicitly defining the rules and know-how about how the video and music clips should be arranged, considering the aesthetic compatibility.  The rules and structures commonly used in professionally created examples can be modeled by standard machine learning techniques.  Non-professional users can be supported on their interface based on the models which implicitly describe shot-scene and scene-music relations considering aesthetic compatibility.
  • 12. A Set of Video Remix Examples Professionally Created Video Remixes
  • 13. A Set of Video Remix Examples Target: Remixing original video clips based on Examples A Set of Music Clips A Set of Original Video Clips video remix
  • 14. video remix I) Video Clip Sequence Creation Interface II) Music Clip Selection III) Shot Extraction (Video and Music Synchronization) User ・・・ ・・・ A set of video clips: A set of music clips: A Set of Video Remix Examples ・・・ ・ ・ ・・・ Video Remix Template Shot Scene Video Clip Suggestions
  • 15. N. Nitta and N. Babaguchi, “Example-based video remixing,” Multimedia Tools and Applications, 51(2), pp.649-673, 2011 N. Nitta and N. Babaguchi, “Example-based home video remixing,” Proc. ICME, 2011
  • 16. Video Remix Examples Symbol Sequence Home (Personal) Videos Video Clips Segmentation Suitability[Nitta2011] To Template Perceived Quality[Tao2007] B AB CGE Template Interface Overview of Procedure I) Template Generation T. Mei, et al., "Home Video Visual Quality Assessment With Spatiotemporal Factors," IEEE Trans. Circuits and Systems for Video Technology, vol.17, no.6, pp.699-706, 2007.
  • 17. Video Remix Examples Slow Scene Active Scene HMM Example-based Template Generation Shot Length Brightness Motion Intensity w/wo Camera Work w/wo Human Objects Low-level Features Feature Extraction ・・・ Sequences of video shots Shot ihg fed cba Symbolization Symbol Sequence Video Remix Template (New Symbol Sequence & State Sequence) GA A Sequence of L Shots A Sequence of D Scenes
  • 18. Video Clip 1 Video Clip 2 Video Clip 3 A Home Video Suitability to Template 0.3 0.20.7 Perceived Quality 0.7 0.5 0.6 From Shot to Video Clip Shots in target video are divided into video clips based on the camerawork
  • 19. Video clip selection Video Remix Template Interface 3D book-style video clip presentation Timeline Presentation Suitability To Template Perceived Quality ◎ × △ ▲ spine Fore edgeFore edge
  • 21.  Video remix examples: 61 action movie trailers  Video clips: 265 home (personal) video clips recording a sports field day held by a kindergarten  Subjective evaluation by 8 subjects  Compare with video clip sequence created by considering only the perceived quality of video clips
  • 22. Subjective Score: 3.5 Subjective Score: 3 With Template*Without Template * Selected video clips are shortened according to the template Created Video Clip Sequence Using action movie trailers as examples resulted in creating a sequence of many short video clips
  • 23. N. Nitta and N. Babaguchi, “Example-based video remixing support system,” Proc. ACM Multimedia, pp.563-572, 2011
  • 24. Video Clip Sequence (Scene) Overview of Procedure II) A Set of Video Remix Examples (Scenes) A set of Music Clips visually similar video remix examples similar music clips
  • 25.  Evaluate the compatibility among video scenes and music clips by their distances in the video scene and music feature spaces  Learn non-linear mapping of music feature space so that the distances among video scenes and the mixed music clips would be correlated [Suzuki07] Music Clip Feature Space (Music Clips Mixed to Example Video Scenes) Video scene feature space (Example Video Scenes) Expected Music clip feature space (Music Clips Mixed to Example Video Scenes) [Suzuki07] K. Suzuki, et al., “A similarity-based neural network for facial expression analysis,” Pattern Recognition Letters, 28(9), pp.1104-1111, 2007
  • 26.  Music Clip Selection  Video Scenes・・・Visual Features  Music Clips・・・Audio Features  [Zettl99]  Emotion-based Music Classification [Zettl99] H. Zettl, “Sight Sound Motion: Applied Media Aesthetics,” Wadsworth Publishing, 1999
  • 27.  Consists of 2 Neural Networks  Input: Audio Features xA i and xB i of Music Clips A and B  Output: Transformed Audio Features yA j and yB j of Music Clips A and B  Learn the weights wl,m of Neural Network so that the differences between the distances of yA j and yB j and the distances of the video scenes mixed with music clips A and B would be minimized.  wl,m: Weight for the edge between nodes I and m. ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・・・ TAB dAB Teacher (Distances ofVideo Scenes Mixed with Music Clips A and B ) Input A Input B xA i xB i Neural Network A Neural Network B yA j yB j Distance calclulation
  • 29.  Video Remix Examples: 61 Action Movie Trailers  Video Scene Examples :45 Scenes  Music Clips:180 Music Clips of Various Genres (Movie Soundtracks, Classical Music, Japanese-pop, Western-pop, etc.)  Video Clips:  Shots extracted from Original Movies  265 Home Video Clips recording a sports field day held by a kindergarten  Video Clip Sequence:  Made by Procedure I)
  • 30.  Input:10 Video Scenes randomly extracted from movie trailers (without Audio Stream )  10 subjects rated (1: very bad – 10: very good) 10 video scenes mixed with  Video1) 3 Music Clips Selected by Proposed Approach  Video2) Music Clips most similar to the music excerpts mixed with the 3 least similar video scenes  Video3) Music Clip mixed with the video scenes in movie trailers (baseline: professional)  Video4) 3 Music Clips selected in the same way as for Video 1) without music feature space transformation  Video5) 3 Music Clips selected in the same way as for Video 2) without music feature space transformation Video1 –Video 2 = 1.72±0.34 (95% confidence interval) ⇒indicates the effectiveness of similarity-based music clip selection Video1 –Video 4 = 1.11±0.35 ⇒indicates the effectiveness of music feature space transformation Video 1 → closest toVideo 3 ⇒selected music clips are subjectively closest to professionally selected ones 0 1 2 3 4 5 6 7 8 Video1 Video2 Video3 Video4 Video5 Average Subjective Scores 6.1 4.4 7.2 5.0 4.5
  • 33. Subjective Score: 3.8 Subjective Score: 5.3 With Template*Without Template Video Clip Sequence after Music Mixing Subjective score improved largely after music mixing Created video clip sequence and selected music clips are synergetic in improving the expressive quality. * Selected video clips are shortened according to the template
  • 34. Y. Kurihara, N. Nitta, and N. Babaguchi, “Automatic appropriate segment extraction from shots based on learning from example videos,” Proc. PSIVT, pp.1082-1093, 2009 Y. Kurihara, N. Nitta, and N. Babaguchi, “Appropriate segment extraction from shots based on temporal patterns of example videos,” Proc. MMM, pp.253-264, 2008
  • 35. VideoClip SequenceVideoRemix Video Clip 1 Shot 1 Shot 2 Shot 3 Video Clip 3Video Clip 2  A video clip needs to be shortened.  A video clip contains redundant parts. Which part of a video clip should be extracted as a shot? Shot Extraction from Selected Video Clip
  • 36. k frames Discarded part (Non-shot) Selected Part (Shot) Video Clip Example Video Clip Shot Extraction Feature Extraction Pattern Scan for the k frames which best matches the shot HMM Feature Extraction ShotSymbolization Symbol Sequence Shot HMM Non-shot HMM Overview of Procedure III)
  • 37. •Shot Classification action and conversation •Feature extraction Shot Action Conversation Scenery ・・・ ※VSTD : Volume Standard Deviation, LVFR : Low Volume Signal Ratio,ERSB : Energy Ratio of Ferquency SubBand ZCR : Zero Crossing Ratio Each type of shot is characterized by different features
  • 38.  Examples:Movies+Trailers Video Clips:Shots in Movies Shots:Shots in Trailers  Shot extraction from 69 video clips (shots in movies)  Shot Length (k) = Length of corresponding shots in trailers (32.3% ×video clips on average) 2247Test 1210Training ConversationAction Experiments
  • 39. Objective Evaluation Video Clip (Action) Ground Truth (Shot in Trailer) Extracted Shot 82 frames k= 9 frames Difference:3 frames(0.3sec) •Compare Extracted Shot with Ground Truth •1 frame=0.1 sec
  • 40. 107 frames Extracted Shot Ground Truth k= 17 frames Difference:3 frames (0.3sec) Video Clip (Conversation)
  • 41. -25 -20 -15 -10 -5 0 5 10 0 5 10 15 20 25 30 35 40 45 50 フレーム# LogP f(n);編集区間モデル g(n);非編集区間モデル f(n)-g(n) Extracted Shot Ground Truth hk(f)-gk(f) Correctly Extracted Shot 47 frames k = 5 frames Difference:3 frames Video Clip (Action) Extracted Shot Ground Truth Shot HMM Non-Shot HMM frame
  • 42. -30 -25 -20 -15 -10 -5 0 5 0 5 10 15 20 25 30 35 フレーム# logP f(n);編集区間モデル g(n);非編集区間モデル f(n)-g(n) Extracted Shot Ground Truth hk(f)-gk(f) Incorrectly Extracted Shot 35 frames k = 7 frames Difference:26 frames Extracted Shot Ground Truth Shot HMM Non-Shot HMM frame
  • 43. Objective Evaluation clipsvideoof# extractioncorrectof# accuracy  ※Correct Extraction : Shot was extracted within T-frame Difference 1 frame = 0.1 sec Correct shots were extracted from 72.5%(50/69) of video clips when T=5 73%(16/22)72%(34/47)T=5 64%(14/22)60%(28/47)T=3 50%(11/22)53%(25/47)T=2 Action Conversation
  • 44.  14 subjects watch original long video clips, and then three kinds of shortly extracted shots: ①Ground Truth ②Extracted Shot ③Random Shot in random order and rank them. (There can be a tie) or or
  • 45. Ground Truth:③ Extracted Shot:② Random Shot:① Video Clip (36 frames) ① ② ③ k = 15 frames
  • 46. Ground Truth Extracted Shot Random Shot ・・・Rank 1 ・・・Rank 2 ・・・Rank 3 69.1% 26.9% 4.0% 53.9%38.9% 7.2% 7.1% 12.9% 80.0% Action:18 video clips Conversation:13 video clips Subjective Evaluation Extracted Shot ≒Ground Truth >> Random Shot
  • 47. Subjective Score: 6.2Subjective Score: 3.9 Without Template With Template Created Video Remix Proposed Comparative I II III I II III Length (min:sec) 0:36 0:43 10:56 10:59 score 3 5.3 6.2 3.5 3.8 3.9
  • 48.  Introduced an example-based approach for video remixing  Video Clip Sequence Creation  Music Clip Selection  Shot Extraction  Interface  Experiments using movie trailers as remix examples and movies and home videos as video clips  Verified the effectiveness of using remix examples  With Support(6.2), Without Support(3.9) Conclusion
  • 49.  Improvement of Interface  More investigations using various types/genres of video remix examples  How many examples do we need?  Good examples can reduce the number of examples.