SlideShare une entreprise Scribd logo
1  sur  61
Télécharger pour lire hors ligne
A Computational Framework for
Sound Segregation in Music
Signals
Luís Gustavo Martins
CITAR / Escola das Artes da UCP
lmartins@porto.ucp.pt
Porto, Portugal
Auditory Modeling Workshop
Google, MountainView, CA, USA
19.11.2010
Acknowledgments
A Computational Framework for Sound Segregation in Music Signals2
}  This work is the result of the collaboration with:
}  University ofVictoria, BC, Canada
}  GeorgeTzanetakis, Mathieu Lagrange, Jennifer Murdock
}  All the Marsyas team
}  INESC Porto
}  Luis Filipe Teixeira
}  Jaime Cardoso
}  Fabien Gouyon
}  Technical University of Berlin, Germany
}  Juan José Burred
}  FEUP PhD Advisor Professor
}  Aníbal Ferreira
}  Supporting entities
}  Fundação para a Ciência e aTecnologia - FCT
}  Fundação Calouste Gulbenkian
}  VISNET II, NoE European Project
Research Project
A Computational Framework for Sound Segregation in Music Signals3
}  FCT R&D Project (APPROVED FOR FUNDING)
}  A Computational Auditory Scene Analysis Framework for Sound Segregation in Music Signals 
}  3-year project (starting Jan. 2011)
}  Partners:
}  CITAR (Porto, Portugal)
  Luís Gustavo Martins (PI), Álvaro Barbosa, Daniela Coimbra
}  INESC Porto (Porto, Portugal)
  Fabien Gouyon
}  UVic (Victoria, BC, Canada)
  George Tzanetakis
}  IRCAM (Paris, France)
  Mathieu Lagrange
}  Consultants
}  FEUP (Porto, Portugal)
  Prof. Aníbal Ferreira, Prof. Jaime Cardoso
}  McGill University / CIRMMT (Montreal, QC, Canada)
  Prof. Stephan McAdams
Summary
A Computational Framework for Sound Segregation in Music Signals4
}  Problem Statement
}  The Main Challenges
}  Current State
}  Related Research Areas
}  Main Contributions
}  Proposed Approach
}  Results
}  Software Implementation
}  Conclusions and Future Work
Problem Statement
A Computational Framework for Sound Segregation in Music Signals5
}  Propose a computational sound segregation framework
}  Focused on music signals
}  But not necessarily limited to music signals
}  Perceptually inspired
}  So it can build upon the current knowledge of how listeners perceive sound
events in music signals
}  Causal
}  So it mimics the human auditory system and allows online processing of sounds
}  Flexible
}  So it can accommodate different perceptually inspired grouping cues
}  Generic
}  So it can be used in different audio and MIR application scenarios
}  Effective
}  So it can improve the extraction of perceptually relevant information from musical
mixtures
}  Efficient
}  So it can find practical use in audio processing and MIR tasks
MUSIC LISTENING
ABSTRACT
KNOWLEDGE
STRUCTURES
EVENT
STRUCTURE
PROCESSING
EXTRACTION
OF
ATTRIBUTES
AUDITORY
GROUPING
PROCESSES
MENTAL
REPRESENTATION OF
SOUND
ENVIRONMENT
TRANSDUCTION
TRANSDUCTION
ATTENTIONAL
PROCESSES
Figure 2: The main types of auditory processing and their interactions (adapted
from [McAdams and Bigand, 1993]).
possible to extract perceptual attributes which provide a representation of each element in
the auditory system.
}  Human listeners are able to perceive individual sound
events in complex mixtures
}  Even if listening to:
}  Monaural music recordings
}  Unknown sounds, timbres or instruments
}  Perception is influenced by several complex factors
}  Listener’s prior knowledge, context, attention, …
}  Based on both low-level and high-level cues
}  Difficult to replicate computationally…
The Main Challenges
A Computational Framework for Sound Segregation in Music Signals8
The Main Challenges
A Computational Framework for Sound Segregation in Music Signals9
}  Why Music Signals?
}  Music sound is, in some senses, more challenging to analyse
than non-musical sounds
}  High time-frequency overlap of sources and sound events
  Music composition and orchestration
  Sources that often play simultaneously  polyphony
  Favor consonant pitch intervals
  Sound sources are highly correlated
}  High variety of spectral and temporal characteristics
  Musical instruments present a wide range of sound production
mechanisms
}  Techniques traditionally used for monophonic, non-musical
or speech signals perform poorly
}  Yet, music signals are usually well organized and structured
Current State
A Computational Framework for Sound Segregation in Music Signals10
}  Typical systems in MIR
}  Represent statistically the entire sound mixture
}  Analysis and retrieval performance reached a “glass ceiling”
[Aucouturier and Pachet, 2004]
}  New Paradigm
}  Attempt to individually characterize the different sound
events in a sound mixture
}  Performance still quite limited when compared to human auditory
system
}  But already provides alternative and improved approaches to common
sound analysis and MIR tasks
Applications
A Computational Framework for Sound Segregation in Music Signals11
}  “Holy grail” applications
}  “The Listening Machine”
}  “The Robotic Ear”
}  “Down to earth” applications
}  Sound and Music Description
}  Sound Manipulation
}  Robust Speech and Speaker Recognition
}  Object-based Audio Coding
}  Automatic Music Transcription
}  Audio and Music Information Retrieval
}  Auditory Scene Reconstruction
}  Hearing Prostheses
}  Up-mixing
}  …
Related Research Areas
A Computational Framework for Sound Segregation in Music Signals12
}  Sound and Music Computing (SMC) [Serra et al., 2007]
}  Computational Auditory Scene Analysis (CASA)
[Wang and Brown, 2006]
}  Perception Research
}  Psychoacoustics [Stevens, 1957]
}  Auditory Scene Analysis (ASA) [Bregman, 1990]
}  Digital Signal Processing [Oppenheim and Schafer, 1975]
}  Music Information Retrieval (MIR) [Downie, 2003]
}  Machine Learning [Duda et al., 2000]
}  ComputerVision [Marr, 1982]
Related Areas
A Computational Framework for Sound Segregation in Music Signals13
}  Auditory Scene Analysis (ASA) [Bregman, 1990]
}  How do humans “understand” sound mixtures?
}  Find packages of acoustic evidence such that each package has
arisen from a single sound source
}  Grouping Cues
}  Integration
  Simultaneous vs. Sequential
  Primitive vs. schema-based
}  Cues
  Common amplitude, frequency, fate
  Harmonicity
  Time continuity
  …
Time
Related Areas
A Computational Framework for Sound Segregation in Music Signals14
}  Computational Auditory Scene Analysis (CASA)
[Wang and Brown, 2006]
}  “Field of computational study that aims to achieve human
performance in ASA by using one or two microphone recordings of
the acoustic scene.” [Wang and Brown, 2006]
MUSIC LISTENING
SOURCE
MODELS
ANALYSIS
FRONT-END
MID-LEVEL
REPRESENTATION
SCENE
ORGANIZATION
GROUPING
CUES
STREAM
RESYNTHESIS
ACOUSTIC
MIXTURE
SEGREGATED
SIGNALS
Figure 3: System Architecture of a typical CASA system.
reference in the development of sound source separation systems, since it is the only ex-
Main Contributions
A Computational Framework for Sound Segregation in Music Signals15
}  Proposal and experimental validation of a flexible and efficient
framework for sound segregation
}  Focused on “real-world” polyphonic music
}  Inspired by ideas from CASA
}  Causal and data-driven
}  Definition of a novel harmonicity cue
}  Termed HarmonicallyWrapped Peak Similarity (HWPS)
}  Experimentally shown as a good grouping criteria
}  Software implementation of the proposed sound segregation
framework
}  Modular, extensible and efficient
}  Made available as free and open source software (FOSS)
}  Based on the MARSYAS framework
Proposed Approach
A Computational Framework for Sound Segregation in Music Signals16
}  Assumptions
}  Perception primarily depends on the use of low-level sensory
information
}  Does not necessarily require prior knowledge (i.e. training)
}  Still able to perform primitive identification and segregation of sound
events in a sound mixture
}  Prior knowledge and high-level information can still be used
}  To award additional meaning to the primitive observations
}  To consolidate primitive observations as relevant sound events
}  To modify the listener’s focus of attention
Proposed Approach
A Computational Framework for Sound Segregation in Music Signals19
}  System overview
Sinusoidal
Synthesis
Texture Window
Spectral Peaks
(over Texture Window)
150ms
Spectral
Peaks
46ms
Sinusoidal
Analysis
Spectral
Peaks
46ms
Cluster Selection
Similarity Computation
Normalized Cut
Analysis Front-end
A Computational Framework for Sound Segregation in Music Signals22
}  Sinusoidal Modeling
}  Sum of highest amplitude sinusoids at each frame  peaks
}  Maximum of 20 peaks/frame
}  Window = 46ms ; hop = 11ms
}  Parametric model: Estimate Amplitude, Frequency, Phase of each peak
frequency
Spectral Peaks
Sinusoidal
Analysis
Spectral
Peaks
46ms
Time Segmentation
A Computational Framework for Sound Segregation in Music Signals23
}  Texture Windows
}  Construct a graph over a texture window of the sound
mixture
}  Provides time integration
  Approaches partial tracking and source separation jointly
  Traditionally two separated, consecutive stages
Spectral Peaks
Sinusoidal
Analysis
time
frequency
Spectral Peaks
Sinusoidal
Analysis
Texture Window
Time Segmentation
A Computational Framework for Sound Segregation in Music Signals24
}  Fixed length texture
windows
}  E.g. 150 ms
}  Dynamically adjusted
texture windows
}  Onset detector
}  Perceptually more
relevant
}  50ms ~ 300ms
AmplitudeFrequency
0 0.8 1.6
Time (secs)
SpectralFlux
1 TEXTURE WINDOW 2 TEXTURE WINDOW 3 TEXTURE WINDOW 4 TEXTURE WINDOW 5 6 TEXTURE WIN. 7
Perceptual Cues as Similarity Functions
A Computational Framework for Sound Segregation in Music Signals25
Similarity Computation
AMPLITUDE
SIMILARITY
FREQUENCY
SIMILARITY
HARMONIC
SIMILARITY
(HWPS)
AZIMUTH
PROXIMITY
COMMON
ONSET
OFFSET
SOURCE
MODELS
COMBINER
Spectral Peaks
(over Texture Window)
150ms
OVERALL
SIMILARITY MATRIX
Normalized Cut
...
Perceptual Cues as Similarity Functions
A Computational Framework for Sound Segregation in Music Signals26
}  Grouping Cues (inspired from ASA)
}  Similarity between time-frequency components in a texture window
}  Frequency proximity
}  Amplitude proximity
}  Harmonicity proximity (HWPS)
}  …
}  Encode topological knowledge into a similarity graph/matrix
}  Simultaneous integration (peaks within the same frame)
}  Sequential integration over the texture window
Similarity Matrix
A0 A1 A2 A3 B3, A4 B0 B1 B2 B4
A0
A1
A2
A3
B3, A4
B0
B1
B2
B4
xi
xj
xk
wij = wji
xq
xp
xl
Perceptual Cues as Similarity Functions
A Computational Framework for Sound Segregation in Music Signals27
}  Defining a Generic Similarity Function
}  Fully connected graphs
}  Gaussian similarity function
  How to define neighborhood width (σ)?
  Local statistics from data in a Texture Window
  Use prior knowledge (e.g. JNDs)
   Use σ as weights (after normalizing the Sim. Fun. to [0,1])
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5
0.25
0.5
0.75
1
d(xi, xj)
wij
σ=0.4
σ
=
1.0
σ =
1.2
wij = e
−
“ d(xi,xj )
σ
”2
xi
xj
wij = wji
Perceptual Cues as Similarity Functions
A Computational Framework for Sound Segregation in Music Signals28
}  Amplitude and Frequency Similarity
}  Amplitude
}  Gaussian function of the Euclidean distances
  In dB  more perceptually relevant
}  Frequency
}  Gaussian function of the Euclidean distances
  In Bark  more perceptually relevant
}  Not sufficient to segregate harmonic events
}  Nevertheless are important to group peaks from:
  Inharmonic or noisy frequency components in harmonic sounds
  Non-harmonic sounds (unpitched sounds)
Two of the most basic similarities explored by the auditory system a
frequency and amplitude features of the sound components in a sound m
tion 2.3.1).
Accordingly, the edge weight connecting two peaks pk
l and pk+n
m will
frequency and amplitude proximities. Following the generic considerati
the definition of a similarity function for spectral clustering in Section
and frequency similarities, Wa and Wf respectively, are defined as follow
Wa(pk
l , pk+n
m ) = e
−
„
ak
l −ak+n
m
σa
«2
Wf (pk
l , pk+n
m ) = e
−
„
fk
l −fk+n
m
σf
«2
where the Euclidean distances are modeled as two Gaussian functions,
fined in Equation 8. The amplitudes are measured in Decibels (dB) an
are measured in Barks (a frequency scale approximately linear below 500
mic above), since these scales have shown to better model the the sensib
the human ear [Hartmann, 1998].
79
frequency and amplitude features of the sound components in a sound m
tion 2.3.1).
Accordingly, the edge weight connecting two peaks pk
l and pk+n
m will
frequency and amplitude proximities. Following the generic considerati
the definition of a similarity function for spectral clustering in Section
and frequency similarities, Wa and Wf respectively, are defined as follow
Wa(pk
l , pk+n
m ) = e
−
„
ak
l −ak+n
m
σa
«2
Wf (pk
l , pk+n
m ) = e
−
„
fk
l −fk+n
m
σf
«2
where the Euclidean distances are modeled as two Gaussian functions,
fined in Equation 8. The amplitudes are measured in Decibels (dB) an
are measured in Barks (a frequency scale approximately linear below 500
mic above), since these scales have shown to better model the the sensib
the human ear [Hartmann, 1998].
79
Perceptual Cues as Similarity Functions
A Computational Framework for Sound Segregation in Music Signals29
}  Harmonically Wrapped Peak Similarity (HWPS)
}  Harmonicity is one of the most powerful ASA cues [Wang and Brown, 2006]
}  Proposal of a novel harmonicity similarity function
}  Does not rely on the prior knowledge of f0 in the signal
}  Takes into account spectral information in a global manner (spectral patterns)
  For peaks in a same frame or in different frames in a Texture Window
  Takes into consideration the amplitudes of the spectral peaks
}  3 step algorithm
  Shifted Spectral Pattern
  Wrapped Frequency Space  Histogram computation
  Discrete Cosine Similarity  [0,1]
STEP 3 – Discrete Cosine Similarity
The last step is now to correlate the two shifted and harmonically wrapped spec-
tral patterns ( ˆF k
l and ˆF k+n
m ) to obtain the HWPS measure between the two correspond-
ing peaks. This correlation can be done using an algorithmic approach as proposed in
[Lagrange and Marchand, 2006], but this was found not to be reliable or robust in prac-
tice. Alternatively, the proposal is to discretize each shifted and harmonically wrapped
spectral pattern into an amplitude weighted histogram, Hk
l , corresponding to each spec-
tral pattern ˆF k
l . The contribution of each peak to the histogram is equal to its amplitude
and the range between 0 and 1 of the Harmonically-Wrapped Frequency is divided into
20 equal-size bins (a 12 or a 24 bin histogram would provide a more musically meaning-
ful chroma-based representation, but preliminary and empirical tests have shown better
results when using 20 bin histograms).
In addition, the harmonically wrapped spectral patterns are also folded into an octave
to form a pitch-invariant “chroma” profile. For example, in Figure 19, the energy of the
spectral pattern in wrapped frequency 1 (all integer multiples of the wrapping frequency)
is mapped to histogram bin 0.
The HWPS similarity between the peaks pk
l and pk+n
m is then defined based on the
cosine distance between the two corresponding discretized histograms as follows:
Wh(pk
l , pk+n
m ) = HWPS(pk
l , pk+n
m ) = e
0
@ c(Hk
l ,Hk+n
m )
r
c(Hk
l
,Hk
l
)·c(Hk+n
m ,Hk+n
m )
1
A
2
(28)
where
c(Hb
a, Hd
c ) =

i
Hb
a(i) × Hd
c (i)
. (29)
One may notice that due to the wrapping operation of Equation 25, the size of the
histograms can be relatively small (e.g. 20 bins), thus being computationally efficient. A
Gaussian function is also used for controlling the neighborhood width of the harmonicity
cue, where σh = 1 is implicitly used in the current system implementation.
Wh(pk
l , pk+n
m ) = HWPS(pk
l , pk+n
m ) = e
−

1−
c(Hk
l ,Hk+n
m )
√
c(Hk
l
,Hk
l
)×c(H
k+n
m ,H
k+n
m )
2
Perceptual Cues as Similarity Functions
A Computational Framework for Sound Segregation in Music Signals30
}  HWPS
}  Between peaks of a same
harmonic “source”
}  In a same frame
 High similarity
(~1.0)
A0 B0
A1 B1
B2
A2
f0A f0B 2f0A 3f0A 3f0B2f0B0
frame k
0
1
3f0
0
−f0A
f0A
2f0A
3f0A
4f0A
0
1
3f0
0
−f0A
f0A
2f0A
3f0A
4f0A
0
1
3f0
0
−f0A
f0A
2f0A
3f0A
4f0A
0
1
3f0
0
−f0A
f0A
2f0A
3f0A
4f0A
fk
A1
= 2f0A
SHIFTING
SHIFTING
fk
A0
= f0A
A1
A0
HWPS(A1, A0)|h=f0A
¯Fk
A1
˜Fk
A1
˜Fk
A0
¯Fk
A0
ˆFk
A0
ˆFk
A1
dB
High HWPS(A1, A0)|h=f0A
= =
0 1
A1 A0
Fk
A1
= = Fk
A0
˜A1
˜A0
Perceptual Cues as Similarity Functions
A Computational Framework for Sound Segregation in Music Signals31
}  HWPS
}  Between peaks of
different harmonic
“sources”
}  In a same frame
 Low similarity
(~0.0)
A0 B0
A1 B1
B2
A2
f0A f0B 2f0A 3f0A 3f0B2f0B0
frame k
0
1
3f0
0
−f0A
f0A
2f0A
3f0A
4f0A
0
1
3f0
0
−f0A
f0A
2f0A
3f0A
4f0A
0
1
3f0
0
−f0A
f0A
2f0A
3f0A
4f0A
0
1
3f0
0
−f0A
f0A
2f0A
3f0A
4f0A
Fk
A1
= = Fk
B0
fk
A1
= 2f0A
SHIFTING
SHIFTING
fk
B0
= f0B
A1
HWPS(A1, B0)|h=f0A
¯Fk
A1
˜Fk
A1
˜Fk
B0
¯Fk
B0
ˆFk
B0
ˆFk
A1
dB
B0
!
A1 B0
˜A1
˜B0
Low HWPS(A1, B0)|h=f0A
=
0 1
Perceptual Cues as Similarity Functions
A Computational Framework for Sound Segregation in Music Signals32
}  HWPS
}  Between peaks of a same
harmonic “source”
}  In different frames
 Mid-High similarity
  Interfering spectral content may
be different
  Degrades HWPS…
  Only consider bin 0?
A0 B0
A1 B1
B2
A2
f0A f0B 2f0A 3f0A 3f0B2f0B0
frame k
Fk
A1
= = Fk+n
A0
dB
A0
A1
A2
f0A 2f0A 3f0A0
dB
frame k + n
C0
C1
C2
f0C 2f0C
3f0C
0
1
3f0
0
−f0A
f0A
2f0A
3f0A
4f0A
0
1
3f0
0
−f0A
f0A
2f0A
3f0A
4f0A
0
1
3f0
0
−f0A
f0A
2f0A
3f0A
4f0A
0
1
3f0
0
−f0A
f0A
2f0A
3f0A
4f0A
fk
A1
= 2f0A
SHIFTING
SHIFTING
Ak
1
HWPS(Ak
1, Ak+n
0 )|h=f0A
¯Fk
A1
˜Fk
A1
˜Fk+n
A0
¯Fk+n
A0
ˆFk+n
A0
ˆFk
A1
Ak+n
0
Ak
1 Ak+n
0
fk+n
A0
= f0A
˜Ak
1
˜Ak+n
0
Mid-High HWPS(Ak
1, Ak+n
0 )|h=f0A
=
0 1
=
Perceptual Cues as Similarity Functions
A Computational Framework for Sound Segregation in Music Signals33
}  HWPS
}  Impact of f0 estimates (h’)
}  Ideal
}  Min peak frequency
}  Highest amplitude peak
}  Histogram-based f0 estimates  pitch estimates == nr. Sources?
A FRAMEWORK FOR SOUND SEGREGATION IN MUSIC SIGNALS
wrapping operation would be perfect with the prior knowledge of the fundamental fre-
quency. With this knowledge it would be possible to parametrize the wrapping operation
h as:
h = min(f0
k
l , f0
k+n
m ) (26)
where f0
k
l is the fundamental frequency of the source of the peak pk
l . Without such prior,
a conservative approach h is considered instead, although it will tend to over estimate
the fundamental frequency:
h
= min(fk
l , fk+n
m ) (27)
Notice that the value of the wrapping frequency function h is the same for both pat-
terns corresponding to the peaks under consideration. Therefore the resulting shifted and
wrapped frequency pattern will be more similar if the peaks belong to the same harmonic
“source”. The resulting shifted and wrapped patterns are pitch invariant and can be seen
in the middle plot of Figures 19 and 20.
Different approaches could have been taken for the definition of the fundamental fre-
quency estimation function h. One possibility would be to select the highest amplitude
peak in the union of the two spectral patterns under consideration as the f0 estimate
(i.e. h = {fi|i = argmaxi(Ai), ∀i ∈ [1, #A], where A = Ak
l ∪ Ak+n
m , #A is its number
of elements and Ak
l is the set of amplitudes corresponding to the spectral pattern Fk
l ).
The motivation for this approach is the fact that the highest amplitude partial in musical
signals often corresponds to the fundamental frequency of the most prominent harmonic
‘source” active in that frame, although this assumption will not always hold.
A more robust approach, though more computationally expensive, would be to calcu-
late all the frequency differences between all peaks in each spectral pattern and compute a
A FRAMEWORK FOR SOUND SEGREGATION IN MUSIC SIGNALS
wrapping operation would be perfect with the prior knowledge of the fundamental fre-
quency. With this knowledge it would be possible to parametrize the wrapping operation
h as:
h = min(f0
k
l , f0
k+n
m ) (26)
where f0
k
l is the fundamental frequency of the source of the peak pk
l . Without such prior,
a conservative approach h is considered instead, although it will tend to over estimate
the fundamental frequency:
h
= min(fk
l , fk+n
m ) (27)
Notice that the value of the wrapping frequency function h is the same for both pat-
terns corresponding to the peaks under consideration. Therefore the resulting shifted and
wrapped frequency pattern will be more similar if the peaks belong to the same harmonic
“source”. The resulting shifted and wrapped patterns are pitch invariant and can be seen
in the middle plot of Figures 19 and 20.
Different approaches could have been taken for the definition of the fundamental fre-
quency estimation function h. One possibility would be to select the highest amplitude
peak in the union of the two spectral patterns under consideration as the f0 estimate
(i.e. h = {fi|i = argmaxi(Ai), ∀i ∈ [1, #A], where A = Ak
l ∪ Ak+n
m , #A is its number
of elements and Ak
l is the set of amplitudes corresponding to the spectral pattern Fk
l ).
The motivation for this approach is the fact that the highest amplitude partial in musical
signals often corresponds to the fundamental frequency of the most prominent harmonic
‘source” active in that frame, although this assumption will not always hold.
A more robust approach, though more computationally expensive, would be to calcu-
late all the frequency differences between all peaks in each spectral pattern and compute a
histogram. The peaks in these histograms would be good candidates for the fundamental
frequencies in each frame (in order to avoid octave ambiguities, a second histogram with
the differences between all the candidate f0 values could be again computed, where the
highest peaks would be selected as the final f0 candidates). The HWPS could then be
where f0l is the fundamental frequency of the source of the peak pl . Without such prio
a conservative approach h is considered instead, although it will tend to over estima
the fundamental frequency:
h
= min(fk
l , fk+n
m ) (2
Notice that the value of the wrapping frequency function h is the same for both pa
terns corresponding to the peaks under consideration. Therefore the resulting shifted an
wrapped frequency pattern will be more similar if the peaks belong to the same harmon
“source”. The resulting shifted and wrapped patterns are pitch invariant and can be se
in the middle plot of Figures 19 and 20.
Different approaches could have been taken for the definition of the fundamental fr
quency estimation function h. One possibility would be to select the highest amplitud
peak in the union of the two spectral patterns under consideration as the f0 estima
(i.e. h = {fi|i = argmaxi(Ai), ∀i ∈ [1, #A], where A = Ak
l ∪ Ak+n
m , #A is its numb
of elements and Ak
l is the set of amplitudes corresponding to the spectral pattern Fk
l
The motivation for this approach is the fact that the highest amplitude partial in music
signals often corresponds to the fundamental frequency of the most prominent harmon
‘source” active in that frame, although this assumption will not always hold.
A more robust approach, though more computationally expensive, would be to calc
late all the frequency differences between all peaks in each spectral pattern and compute
histogram. The peaks in these histograms would be good candidates for the fundament
frequencies in each frame (in order to avoid octave ambiguities, a second histogram wi
the differences between all the candidate f0 values could be again computed, where th
highest peaks would be selected as the final f0 candidates). The HWPS could then b
iteratively calculated using each f0 candidate in this short list, and select the one wi
the best value as the final choice. In fact, this technique could prove an interesting way
robustly estimate the number of harmonic “sources” in each frame, including their pitche
but experimental evaluations are still required to validate these approaches.
—————
0 500 1000 1500 2000 2500 3000
0
0.2
0.4
0.6
0.8
1
A0
A1
A2
A3 A4
, B3
B0
B1
B2
B4
Frequency (Hz)
Amplitude
Similarity Combination
A Computational Framework for Sound Segregation in Music Signals36
Similarity Computation
AMPLITUDE
SIMILARITY
FREQUENCY
SIMILARITY
HARMONIC
SIMILARITY
(HWPS)
AZIMUTH
PROXIMITY
COMMON
ONSET
OFFSET
SOURCE
MODELS
COMBINER
Spectral Peaks
(over Texture Window)
150ms
OVERALL
SIMILARITY MATRIX
Normalized Cut
...
Similarity Combination
A Computational Framework for Sound Segregation in Music Signals38
}  Combining cues
}  Product operator [ShiMalik2000]
  High overall similarity only if all cues are high…
}  More expressive operators?
to represent the different sound events in a complex mixture. Therefore, the combination
of different similarity cues could allow to make the best use of their isolated grouping
abilities towards a more meaningful segregation of a sound mixture.
Following the work of Shi and Malik [Shi and Malik, 2000], who proposed to compute
the overall similarity function as the product of the individual similarity cues used for
image segmentation, the current system combines the amplitude, frequency and HWPS
grouping cues presented in the previous sections into a combined similarity function W as
follows:
W(pl, pm) = Wafh(pl, pm) = Wa(pl, pm) × Wf (pl, pm) × Wh(pl, pm) (30)
Plots g in Figures 15 and 16 show the histogram of the values resulting from the com-
bined similarity functions for the two sound examples, Tones A+B and Jazz1, respectively.
5
Audio clips of the signals plotted in Figures 17 and 18 are available at http://www.inescporto.
pt/˜lmartins/Research/Phd/Phd.htmXXX
105Wafh = [(Wf ∧ Wa) ∨ Wh] ∧ Ws
Segregating Sound Events
A Computational Framework for Sound Segregation in Music Signals39
}  Segregation task
}  Carried out by clustering components that are close in the similarity space
}  Novel method based on Spectral Clustering
}  Normalized Cut (Ncut) criterion
  Originally proposed for ComputerVision
  Takes cues as pair-wise similarities
  Cluster the peaks into groups taking into account simultaneously all cues
Similarity Computation
AMPLITUDE
SIMILARITY
FREQUENCY
SIMILARITY
HARMONIC
SIMILARITY
(HWPS)
AZIMUTH
PROXIMITY
COMMON
ONSET
OFFSET
SOURCE
MODELS
COMBINER
Spectral Peaks
(over Texture Window)
150ms
OVERALL
SIMILARITY MATRIX
Normalized Cut
...
Segregating Sound Events
A Computational Framework for Sound Segregation in Music Signals40
}  Segregation Task
}  Normalized Cut criterion
}  Achieves a balanced clustering of elements
}  Relies on the eigenstructure of a similarity matrix to partition points
into disjoint clusters
  Points in the same cluster  high similarity
  Points in different clusters  low similarity
xi
xj
xk
wij = wji
better cut mincut
xq
xp
xl
Segregating Sound Events
A Computational Framework for Sound Segregation in Music Signals41
}  Spectral Clustering
}  Alternative to the EM and k-means traditional algorithms:
}  Does not assume a convex shaped data representation
}  Does not assume Gaussian distribution of data
}  Does not present multiple minima in log-likelihood
  Avoids multiple restarts of the iterative process
}  Correctly handles complex and unknown shapes
}  Usual in audio signals [Bach and Jordan 2004]
Segregating Sound Events
A Computational Framework for Sound Segregation in Music Signals42
}  Divisive clustering approach
}  Recursive two-way cut
}  Hierarchical partition of the data
  Recursively partitions the data into two sets
  Until pre-defined number of clusters is reached (requires prior knowledge!)
  Until a stopping criteria is met
}  Current implementation
  Requires definition of number of clusters [Martins et al., 2007]
  Or alternatively partitions data into 5 clusters and selects the 2 “denser”
ones
   Segregation of the dominant clusters in the mixture [Lagrange et al., 2008a]
Segregation Results
A Computational Framework for Sound Segregation in Music Signals43
a) Jazz1
b) AMPLITUDE SIMILARITY
CLUSTER 1
c) AMPLITUDE SIMILARITY
CLUSTER 2
d) FREQUENCY SIMILARITY
CLUSTER 1
e) FREQUENCY SIMILARITY
CLUSTER 2
f) HWPS SIMILARITY
CLUSTER 1
g) HWPS SIMILARITY
CLUSTER 2
h) COMBINED SIMILARITIES
CLUSTER 1
i) COMBINED SIMILARITIES
CLUSTER 2
FREQUENCY(Hz)
TIME (secs)
TIME (secs) TIME (secs)
FREQUENCY(Hz)FREQUENCY(Hz)FREQUENCY(Hz)FREQUENCY(Hz)
a) Tones A+B
b) AMPLITUDE SIMILARITY
CLUSTER 1
c) AMPLITUDE SIMILARITY
CLUSTER 2
d) FREQUENCY SIMILARITY
CLUSTER 1
e) FREQUENCY SIMILARITY
CLUSTER 2
f) HWPS SIMILARITY
CLUSTER 1
g) HWPS SIMILARITY
CLUSTER 2
h) COMBINED SIMILARITIES
CLUSTER 1
i) COMBINED SIMILARITIES
CLUSTER 2
FREQUENCY(Hz)
TIME (secs)
TIME (secs) TIME (secs)
FREQUENCY(Hz)FREQUENCY(Hz)FREQUENCY(Hz)FREQUENCY(Hz)
B0
B1
B2
A4 + B3
A3
A2
A1
A0
0 500 1000 1500 2000 2500 3000
0
0.2
0.4
0.6
0.8
1
A0
A1
A2
A3 A4
, B3
B0
B1
B2
B4
Frequency (Hz)
Amplitude
Results
A Computational Framework for Sound Segregation in Music Signals45
}  Predominant Melodic Source Segregation
}  Dataset of real-world polyphonic music recordings
}  Availability of the original isolated tracks (ground truth)
}  Results (the higher the better)
  HWPS improves results
  When combined with other similarity features
  When compared with other state-of-the-art harmonicity features [Srinivasan and Kankanhalli, 2003]
[Virtanen and Klapuri, 2000]
0 1 2 3 4 5 6 7
Mean SDR (dB) for a 10 song dataset
A+F+HWPS
A+F+rHWPS
A+F+HV
A+F+HS
A+F
Results
A Computational Framework for Sound Segregation in Music Signals47
}  Predominant Melodic Source Segregation
}  On the use of Dynamic Texture Windows
}  Results (the higher the better)
  Smaller improvement (0.15 dB) than expected
  Probably due to the cluster selection approach being used…
  More computationally intensive (for longer texture windows)
Results
A Computational Framework for Sound Segregation in Music Signals51
}  Main Melody Pitch Estimation
}  Resynthesize the segregated main voice clusters
}  Perform pitch estimation using well known monophonic pitch estimation technique
(Praat)
}  Comparison with two techniques:
}  Monophonic pitch estimation applied to mixture audio (from Praat)
}  State-of-the-Art multi-pitch and main melody estimation algorithm applied to mixture
audio [Klapuri, 2006]
}  Results (the lower the better)
Results
A Computational Framework for Sound Segregation in Music Signals56
}  Voicing Detection
}  Identifying portions of a music file containing vocals
}  Evaluated three feature sets:
  MFCC features extracted from the polyphonic signal
  MFCC features extracted from the segregated main voice
  Cluster Peak Ratio (CPR) feature
  extracted from the segregated main voice clusters
Results
A Computational Framework for Sound Segregation in Music Signals57
}  Timbre Identification in polyphonic music signals [Martins et al., 2007]
}  Polyphonic, multi-instrumental audio signals
}  Artificial mixtures of 2-, 3- and 4-notes from real instruments
}  Automatic separation of the sound sources
}  Sound sources and events are reasonably captured, corresponding in
most cases to played notes
}  Matching of the separated events to a collection of 6 timbre models
note 1
note n
...
Sound
Source
Formation
note 1 / inst 1
note n / inst i
...
Timbre
Models
Matching
Matching
Peak
Picking
Sinusoidal
Analysis
......
...
Results
A Computational Framework for Sound Segregation in Music Signals58
}  Timbre Identification in polyphonic music signals [Martins et al., 2007]
}  Sound sources and events are reasonably captured,
corresponding in most cases to played notes
Results
A Computational Framework for Sound Segregation in Music Signals59
}  Timbre Identification in polyphonic music signals [Martins et al., 2007]
}  6 instruments modeled [Burred et al., 2006]:
}  Piano, violin, oboe, clarinet, trumpet and alto sax
}  Modeled as a set of time-frequency templates
  Describe the typical evolution in time of the spectral envelope of a note
  Matches the salient peaks of the spectrum
0
0.2
0.4
0.6
0.8
1
2000 4000 6000 8000 10000
-80
-60
-40
-20
0
Frequency (Hz)
Time(normalized)
Amplitude(dB)
PIANO
0.2
0.4
0.6
0.8
1
2000 4000 6000 8000 10000
-80
-60
-40
-20
0
Frequency (Hz)
Time(normalized)
Amplitude(dB)
OBOE
Results
A Computational Framework for Sound Segregation in Music Signals60
}  Timbre Identification in polyphonic music signals [Martins et al., 2007]
}  Instrument presence detection in mixtures of notes
}  56% of instruments occurrences correctly detected, with a precision of
64% [Martins et al., 2007]
Weak Matching
Alto sax cluster  piano prototype
Strong Matching
Piano cluster  piano prototype
Software Implementation
A Computational Framework for Sound Segregation in Music Signals62
}  Modular, flexible and efficient software implementation
}  Based on Marsyas
}  Free and Open Source framework for audio analysis and processing
http://marsyas.sourceforge.net
 peakClustering myAudio.wav
Software Implementation
A Computational Framework for Sound Segregation in Music Signals63
}  Marsyas
}  peakClustering Overview
Series/mainNet
frameMaxNumPeaks
totalNumPeaks
PeakViewSink/
peSink
PeakLabeler/
labeler
PeakConvert/
conv
Accumulator/textWinNet
... ... ...
1
FlowThru/clustNet
... ... ...
Shredder/synthNet
... ... ...
2 3
nTimes
A B
peakLabels
nTimestotalNumPeaks
frameMaxNumPeaks
innerOut
B
Software Implementation
A Computational Framework for Sound Segregation in Music Signals64
}  Marsyas
}  Sinusoidal analysis front-end
Accumulator/textWinNet
Series/analysisNet
Series/peakExtract
ShiftInput/
si
Fanout/stereoFo
Series/stereoSpkNet
Parallel/LRnet
Series/spkL
Windowing/
win
Spectrum/
spk
Series/spkR
Windowing/
win
Spectrum/
spk
EnhADRessStereoSpectrum/
stereoSpk
EnhADRess/
ADRess
Series/spectrumNet
Stereo2Mono/
s2m
Shifter/
sh
Windowing/
wi
Parallel/par
Spectrum/
spk1
Spectrum/
spk2
FlowThru/onsetdetector
... ... ...
1a
FanOutIn/mixer
+
Series/mixSeries
Delay/
noiseDelay
SoundFileSource/
src
Gain/
noiseGain
Series/oriNet
SoundFileSource/
src
Gain/
oriGain
A
1
onsetDetected
flush
FlowThru/onsetdetector
Windowing/
wi
Spectrum/
spk
PowerSpectrum/
pspk
Flux/
flux
ShiftInput/
sif
Filter/
filt1
Filter/
filt2
Reverse/
rev1
Reverse/
rev2
PeakerOnset/
peaker
1a
onsetDetected
I
S
Software Implementation
A Computational Framework for Sound Segregation in Music Signals65
}  Marsyas
}  Onset detection
ShiftInput/
si
Series/stereoSpkNet
Parallel/LRnet
Series/spkL
Windowing/
win
Spectrum/
spk
Series/spkR
Windowing/
win
Spectrum/
spk
EnhADRessStereoSpectrum/
stereoSpk
EnhADRess/
ADRess
s2m sh wi
Spectrum/
spk2
... ... ...
FanOutIn/mixer
+
Series/mixSeries
Delay/
noiseDelay
SoundFileSource/
src
Gain/
noiseGain
Series/oriNet
SoundFileSource/
src
Gain/
oriGain
A
onsetDetected
flush
FlowThru/onsetdetector
Windowing/
wi
Spectrum/
spk
PowerSpectrum/
pspk
Flux/
flux
ShiftInput/
sif
Filter/
filt1
Filter/
filt2
Reverse/
rev1
Reverse/
rev2
PeakerOnset/
peaker
1a
onsetDetected
I
Software Implementation
A Computational Framework for Sound Segregation in Music Signals66
}  Marsyas
}  Similarity matrix computation and Clustering
PeakConvert
/conv
FlowThru/clustNet
frameMaxNumPeaks
totalNumPeaks
FanOutIn/simNet
x
Series/freqSim
SimilarityMatrix/FREQsimMat
Metric/
FreqL2Norm
RBF/
FREQrbf
Series/ampSim
SimilarityMatrix/AMPsimMat
Metric/
AmpL2Norm
RBF/
AMPrbf
Series/HWPSim
SimilarityMatrix/HWPSsimMat
HWPS/
hwps
RBF/
HWPSrbf
Series/panSim
SimilarityMatrix/PANsimMat
Metric/
PanL2Norm
RBF/
PANrbf
PeakFeatureSelect/
FREQfeatSelect
2
B
D
D
Series/NCutNet
Fanout/stack
NormCut/
NCut
Gain/
ID
PeakClusterSelect/
clusterSelect
E
innerOut
PeakLabeler/
labeler
B
labels
D
D
D
PeakFeatureSelect/
AMPfeatSelect
PeakFeatureSelect/
PANfeatSelect
PeakFeatureSelect/
HWPSfeatSelect
F
C1
C2
C3
Software Implementation
A Computational Framework for Sound Segregation in Music Signals67
}  Marsyas
}  More flexible Similarity expression
FanOutIn/simNet
Series/panSim
SimilarityMatrix/PANsimMat
Metric/
PanL2Norm
RBF/
PANrbf
PeakFeatureSelect/
PANfeatSelect
.*
FanOutIn/ORnet
FanOutIn/ANDnet
.*
Series/freqSim
SimilarityMatrix/FREQsimMat
Metric/
FreqL2Norm
RBF/
FREQrbf
PeakFeatureSelect/
FREQfeatSelect
Series/ampSim
SimilarityMatrix/AMPsimMat
Metric/
AmpL2Norm
RBF/
AMPrbf
PeakFeatureSelect/
AMPfeatSelect max
Series/HWPSim
SimilarityMatrix/HWPSsimMat
HWPS/
hwps
RBF/
HWPSrbf
PeakFeatureSelect/
HWPSfeatSelect
Software Implementation
A Computational Framework for Sound Segregation in Music Signals68
}  Marsyas
}  Cluster Resynthesis
Shredder/synthNet
Series/postNet
Gain/
outGain
PeakSynthOsc/
pso
Windowing/
wiSyn
OverlapAdd/
ov
SoundFileSink/
dest
3
B
Software Implementation
A Computational Framework for Sound Segregation in Music Signals69
}  Marsyas
}  Data structures
D
totalnumbe
intextureSIMILARITY
C1
f2 f5f4f1 f3 f6
peaks'
frequency
total number of peaks
A
Re(0)
Re(N/2)
Re(1)
Im(1)
Im(N/2-1)
Re(N/2-1)
...
...
...
...
...
...
...
Re(0)
Re(N/2)
Re(1)
Im(1)
Im(N/2-1)
Re(N/2-1)
...
...
...
...
...
...
...
complexspectrum1
(Npoints)
Pan(0)
Pan(1)
Pan(N/2)
...
...
...
...
...
...
...
stereo
spectrum
(N/2+1points)
texture
window frames
complexspectrum2
(Npoints)
B
peaks
FREQUENCY
peaks
AMPLITUDE
peaks
PHASE
peaks
GROUP
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
frameMaxNumPeaks
texture
window frames
peaks
TRACK
...
...
...
...
...
...
...
...
...
...
...
...
...
audio frame
(N+1 samples)
I
31 42 50
1 430 2 5 Ch1 samples
Ch2 samples
analysis window
(N samples)
S
1 30 2 5 Audio Samples
430 2 5 Shifted Audio
Samples
1
4
Software Implementation
A Computational Framework for Sound Segregation in Music Signals70
}  Marsyas
}  Data structures
D
total number of peaks
in texture window
totalnumberofpeaks
intexturewindow
SIMILARITY
MATRIX
E
total number of peaks
in texture window
totalnumberofpeaks
intexturewindow
3 221 1 3 NCUT indicator
SIMILARITY
MATRIX
F
3 -1-11 1 3
cluster selection
indicator
C1
f2 f5f4f1 f3 f6
peaks'
frequency
total number of peaks
in texture window
C2
a2 a5a4a1 a3 a6
peaks'
amplitude
total number of peaks
in texture window
C3
3 21 2 1 3
f2 f4f1 f3 f5 f6peaks' frequency
XX aa XX
a aX XX X
X aa aa X
aX Xf a f
f fa f a f
f ff f f f
NumPeaks in frame
peak
spectralpattern
total number of peaks
in texture window
Im(N/2-1)
Re(N/2-1)
Re(0)
Re(N/2)
Re(1)
Im(1)
Im(N/2-1)
Re(N/2-1)
...
...
...
...
...
...
...
m1
Pan(0)
Pan(1)
Pan(N/2)
...
...
...
...
...
...
...
stereo
spectrum
(N/2+1points)
texture
window frames
complexspectrum2
(Npoints)
peaks
FREQUENCY
peaks
AMPLITUDE
peaks
PHASE
GROUP
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
..
..
..
..
..
..
frameMaxNumPeaks
texture
window frames
analysis window
(N samples)
S
1 30 2 5 Audio Samples
430 2 5 Shifted Audio
Samples
1
4
Conclusions
A Computational Framework for Sound Segregation in Music Signals71
}  Proposal of a framework for sound source segregation
}  Inspired by ideas of CASA
}  Focused on “real-world” music signals
}  Designed to be causal and efficient
}  Data-driven
}  Does not require any training or prior knowledge about audio signals under analysis
}  Approaches partial tracking and source separation jointly
}  Flexible enough to include new perceptually motivated auditory cues
}  Based on a Spectral Clustering technique
}  Shows good potential for applications
}  Source segregation/separation,
}  Monophonic or polyphonic instrument classification,
}  Main melody estimation
}  Pre-processing for polyphonic transcription, ...
Conclusions
A Computational Framework for Sound Segregation in Music Signals72
}  Definition of a novel harmonicity cue
}  Termed Harmonically Wrapped Peak Similarity (HWPS)
}  Experimentally shown as:
}  Good grouping criteria for sound segregation in polyphonic music signals.
}  Compares favorably to other state-of-the-art harmonicity cues
}  Software development of the sound segregation framework
}  Used for validation and evaluation
}  Made available as Free and Open Source Software (FOSS)
}  Based on Marsyas
}  Free for everyone to try, evaluate, modify and improve
Future Work
A Computational Framework for Sound Segregation in Music Signals73
}  Analysis front-end
}  Evaluate alternative analysis frontends
}  Perceptually-informed filterbanks
}  Sinusoid+transient representations
}  A different auditory front-end (as long as it is invertible).…
}  Evaluate alternative frequency estimation methods for spectral peaks
}  Parabolic interpolation
}  Subspace methods
}  …
}  Use of a beat-synchronous approach
}  Based on the use of onset detectors and beat estimators for dynamic
adjustment of texture windows
}  Perceptually motivated
Future Work
A Computational Framework for Sound Segregation in Music Signals74
}  Grouping Cues
}  Improve HWPS
}  Better f0 candidate estimation
}  Reduce negative impact of sound events in different audio frames
}  Inclusion of new perceptually motivated auditory cues
}  Time and frequency masking
}  Stereo placement of spectral components (for stereo signals)
}  Timbre models as a priori information
}  Peak tracking as a pre- and post-processing
}  Common fate (onsets, offsets, modulation)
Future Work
A Computational Framework for Sound Segregation in Music Signals75
}  Implement Sequential integration
}  between texture windows
}  Cluster segregated clusters?
}  Timbre similarity [Martins et al. 2007]
Cluster 1
Cluster 2
Future Work
A Computational Framework for Sound Segregation in Music Signals76
}  Clustering
}  Definition of the neighborhood width (σ) in similarity
functions
}  JNDs?
}  Define and evaluate more expressive combinations of similarity
functions
}  Automatic estimation of the number of clusters in each
texture window
}  Extraction of new descriptors directly from segregated
cluster parameters (e.g., CPR):
}  Pitch, spectral features, frequency tracks, timing information
Future Work
A Computational Framework for Sound Segregation in Music Signals77
}  Creation of a sound/music evaluation dataset
}  Simple and synthetic sound examples
}  For preliminary testing, fine tuning, validation
}  “real-world” polyphonic recordings
}  More complex signals, for final stress-test evaluations
}  To be made publicly available
}  Software Framework
}  Analysis an processing framework based on Marsyas
}  FOSS, C++, multi-platform, real-time
}  Feature rich software visualization and sonification tools
Related Publications
A Computational Framework for Sound Segregation in Music Signals78
}  PhD Thesis:
}  Martins, L. G. (2009).A Computational
Framework for Sound Segregation in Music
Signals. PhD thesis, FEUP.
}  Book:
}  Martins, L. G. (2009).A Computational
Framework for Sound Segregation in Music
Signals – An Auditory Scene Analysis Approach
for Modeling Perceptual Grouping in Music
Listening. Lambert Academic Publishing.
}  Book Chapter:
}  Martins, L. G., Lagrange, M., and Tzanetakis, G.
(2010). Modeling grouping cues for auditory
scene analysis using a spectral clustering
formulation. Machine Audition: Principles,
Algorithms and Systems. IGI Global.
Related Publications
A Computational Framework for Sound Segregation in Music Signals79
}  Lagrange, M., Martins, L. G., Murdoch, J., and Tzanetakis, G. (2008). Normalized cuts for
predominant melodic source separation. IEEETransactions on Audio, Speech, and
Language Processing, 16(2). Special Issue on MIR.
}  Martins, L. G., Burred, J. J.,Tzanetakis, G., and Lagrange, M. (2007). Polyphonic instrument
recognition using spectral clustering. In Proc. International Conference on Music
Information Retrieval (ISMIR),Vienna,Austria.
}  Lagrange, M., Martins, L. G., and Tzanetakis, G. (2008).A computationally efficient scheme
for dominant harmonic source separation. In Proc. IEEE International Conference on
Acoustics, Speech, and Signal Processing (ICASSP), LasVegas, Nevada, USA.
}  Tzanetakis, G., Martins, L. G.,Teixeira, L. F., Castillo, C., Jones, R., and Lagrange, M. (2008).
Interoperability and the Marsyas 0.2 runtime. In Proc. International Computer Music
Conference (ICMC), Belfast, Northern Ireland.
}  Lagrange, M., Martins, L. G., and Tzanetakis, G. (2007). Semi-automatic mono to stereo
up-mixing using sound source formation. In Proc. 112th Convention of the Audio
Engineering Society,Vienna,Austria.
Thank you
A Computational Framework for Sound Segregation in Music Signals80
Questions?
lmartins@porto.ucp.pt
http://www.artes.ucp.pt/citar/

Contenu connexe

En vedette

Extended case studies
Extended case studiesExtended case studies
Extended case studiesSapna2410
 
Justifying price rise
Justifying price riseJustifying price rise
Justifying price riseSapna2410
 
Episode 33 : Project Execution Part (4)
Episode 33 :  Project Execution Part (4)Episode 33 :  Project Execution Part (4)
Episode 33 : Project Execution Part (4)SAJJAD KHUDHUR ABBAS
 
For vals reading
For vals readingFor vals reading
For vals readingSapna2410
 
Episode 54 : CAPE Problem Formulations
Episode 54 : CAPE Problem FormulationsEpisode 54 : CAPE Problem Formulations
Episode 54 : CAPE Problem FormulationsSAJJAD KHUDHUR ABBAS
 
Episode 45 : 4 Stages Of Solid Liquid Separations
Episode 45 :  4 Stages Of Solid Liquid SeparationsEpisode 45 :  4 Stages Of Solid Liquid Separations
Episode 45 : 4 Stages Of Solid Liquid SeparationsSAJJAD KHUDHUR ABBAS
 
Methodology - Statistic
Methodology - StatisticMethodology - Statistic
Methodology - Statistichassilah
 
Episode 51 : Integrated Process Simulation
Episode 51 : Integrated Process Simulation Episode 51 : Integrated Process Simulation
Episode 51 : Integrated Process Simulation SAJJAD KHUDHUR ABBAS
 
Episode 36 : What is Powder Technology?
Episode 36 :  What is Powder Technology?Episode 36 :  What is Powder Technology?
Episode 36 : What is Powder Technology?SAJJAD KHUDHUR ABBAS
 
Episode 44 : 4 Stages Of Solid Liquid Separations
Episode 44 :  4 Stages Of Solid Liquid SeparationsEpisode 44 :  4 Stages Of Solid Liquid Separations
Episode 44 : 4 Stages Of Solid Liquid SeparationsSAJJAD KHUDHUR ABBAS
 
Technology Trends in Creativity and Business
Technology Trends in Creativity and BusinessTechnology Trends in Creativity and Business
Technology Trends in Creativity and BusinessLuís Gustavo Martins
 
Episode 35 : Design Approach to Dilute Phase Pneumatic Conveying
Episode 35 :  Design Approach to Dilute Phase Pneumatic ConveyingEpisode 35 :  Design Approach to Dilute Phase Pneumatic Conveying
Episode 35 : Design Approach to Dilute Phase Pneumatic ConveyingSAJJAD KHUDHUR ABBAS
 
Episode 52 : Flow sheeting Case Study
Episode 52 :  Flow sheeting Case StudyEpisode 52 :  Flow sheeting Case Study
Episode 52 : Flow sheeting Case StudySAJJAD KHUDHUR ABBAS
 
Episode 48 : Computer Aided Process Engineering Simulation Problem
Episode 48 :  Computer Aided Process Engineering Simulation Problem Episode 48 :  Computer Aided Process Engineering Simulation Problem
Episode 48 : Computer Aided Process Engineering Simulation Problem SAJJAD KHUDHUR ABBAS
 
Episode 47 : CONCEPTUAL DESIGN OF CHEMICAL PROCESSES
Episode 47 :  CONCEPTUAL DESIGN OF CHEMICAL PROCESSESEpisode 47 :  CONCEPTUAL DESIGN OF CHEMICAL PROCESSES
Episode 47 : CONCEPTUAL DESIGN OF CHEMICAL PROCESSESSAJJAD KHUDHUR ABBAS
 
Episode 40 : DESIGN EXAMPLE – DILUTE PHASE PNEUMATIC CONVEYING (Part 2)
Episode 40 : DESIGN EXAMPLE – DILUTE PHASE PNEUMATIC CONVEYING (Part 2)Episode 40 : DESIGN EXAMPLE – DILUTE PHASE PNEUMATIC CONVEYING (Part 2)
Episode 40 : DESIGN EXAMPLE – DILUTE PHASE PNEUMATIC CONVEYING (Part 2)SAJJAD KHUDHUR ABBAS
 
Episode 53 : Computer Aided Process Engineering
Episode 53 :  Computer Aided Process EngineeringEpisode 53 :  Computer Aided Process Engineering
Episode 53 : Computer Aided Process EngineeringSAJJAD KHUDHUR ABBAS
 
The t Test for Two Related Samples
The t Test for Two Related SamplesThe t Test for Two Related Samples
The t Test for Two Related Samplesjasondroesch
 

En vedette (20)

Extended case studies
Extended case studiesExtended case studies
Extended case studies
 
Justifying price rise
Justifying price riseJustifying price rise
Justifying price rise
 
Episode 33 : Project Execution Part (4)
Episode 33 :  Project Execution Part (4)Episode 33 :  Project Execution Part (4)
Episode 33 : Project Execution Part (4)
 
For vals reading
For vals readingFor vals reading
For vals reading
 
Episode 54 : CAPE Problem Formulations
Episode 54 : CAPE Problem FormulationsEpisode 54 : CAPE Problem Formulations
Episode 54 : CAPE Problem Formulations
 
Episode 45 : 4 Stages Of Solid Liquid Separations
Episode 45 :  4 Stages Of Solid Liquid SeparationsEpisode 45 :  4 Stages Of Solid Liquid Separations
Episode 45 : 4 Stages Of Solid Liquid Separations
 
Marsyas
MarsyasMarsyas
Marsyas
 
Methodology - Statistic
Methodology - StatisticMethodology - Statistic
Methodology - Statistic
 
Speaker Segmentation (2006)
Speaker Segmentation (2006)Speaker Segmentation (2006)
Speaker Segmentation (2006)
 
Episode 51 : Integrated Process Simulation
Episode 51 : Integrated Process Simulation Episode 51 : Integrated Process Simulation
Episode 51 : Integrated Process Simulation
 
Episode 36 : What is Powder Technology?
Episode 36 :  What is Powder Technology?Episode 36 :  What is Powder Technology?
Episode 36 : What is Powder Technology?
 
Episode 44 : 4 Stages Of Solid Liquid Separations
Episode 44 :  4 Stages Of Solid Liquid SeparationsEpisode 44 :  4 Stages Of Solid Liquid Separations
Episode 44 : 4 Stages Of Solid Liquid Separations
 
Technology Trends in Creativity and Business
Technology Trends in Creativity and BusinessTechnology Trends in Creativity and Business
Technology Trends in Creativity and Business
 
Episode 35 : Design Approach to Dilute Phase Pneumatic Conveying
Episode 35 :  Design Approach to Dilute Phase Pneumatic ConveyingEpisode 35 :  Design Approach to Dilute Phase Pneumatic Conveying
Episode 35 : Design Approach to Dilute Phase Pneumatic Conveying
 
Episode 52 : Flow sheeting Case Study
Episode 52 :  Flow sheeting Case StudyEpisode 52 :  Flow sheeting Case Study
Episode 52 : Flow sheeting Case Study
 
Episode 48 : Computer Aided Process Engineering Simulation Problem
Episode 48 :  Computer Aided Process Engineering Simulation Problem Episode 48 :  Computer Aided Process Engineering Simulation Problem
Episode 48 : Computer Aided Process Engineering Simulation Problem
 
Episode 47 : CONCEPTUAL DESIGN OF CHEMICAL PROCESSES
Episode 47 :  CONCEPTUAL DESIGN OF CHEMICAL PROCESSESEpisode 47 :  CONCEPTUAL DESIGN OF CHEMICAL PROCESSES
Episode 47 : CONCEPTUAL DESIGN OF CHEMICAL PROCESSES
 
Episode 40 : DESIGN EXAMPLE – DILUTE PHASE PNEUMATIC CONVEYING (Part 2)
Episode 40 : DESIGN EXAMPLE – DILUTE PHASE PNEUMATIC CONVEYING (Part 2)Episode 40 : DESIGN EXAMPLE – DILUTE PHASE PNEUMATIC CONVEYING (Part 2)
Episode 40 : DESIGN EXAMPLE – DILUTE PHASE PNEUMATIC CONVEYING (Part 2)
 
Episode 53 : Computer Aided Process Engineering
Episode 53 :  Computer Aided Process EngineeringEpisode 53 :  Computer Aided Process Engineering
Episode 53 : Computer Aided Process Engineering
 
The t Test for Two Related Samples
The t Test for Two Related SamplesThe t Test for Two Related Samples
The t Test for Two Related Samples
 

Similaire à A Computational Framework for Sound Segregation in Music Signals using Marsyas

Extraction and Conversion of Vocals
Extraction and Conversion of VocalsExtraction and Conversion of Vocals
Extraction and Conversion of VocalsIRJET Journal
 
AI THROUGH THE EYES OF ORGANISE SOUND
AI THROUGH THE EYES OF ORGANISE SOUNDAI THROUGH THE EYES OF ORGANISE SOUND
AI THROUGH THE EYES OF ORGANISE SOUNDJaideep Ghosh
 
Performance Comparison of Musical Instrument Family Classification Using Soft...
Performance Comparison of Musical Instrument Family Classification Using Soft...Performance Comparison of Musical Instrument Family Classification Using Soft...
Performance Comparison of Musical Instrument Family Classification Using Soft...Waqas Tariq
 
Nithin Xavier research_proposal
Nithin Xavier research_proposalNithin Xavier research_proposal
Nithin Xavier research_proposalNithin Xavier
 
Annotating Soundscapes.pdf
Annotating Soundscapes.pdfAnnotating Soundscapes.pdf
Annotating Soundscapes.pdfMichelle Shaw
 
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...IRJET Journal
 
Koyama ASA ASJ joint meeting 2016
Koyama ASA ASJ joint meeting 2016Koyama ASA ASJ joint meeting 2016
Koyama ASA ASJ joint meeting 2016SaruwatariLabUTokyo
 
MLConf2013: Teaching Computer to Listen to Music
MLConf2013: Teaching Computer to Listen to MusicMLConf2013: Teaching Computer to Listen to Music
MLConf2013: Teaching Computer to Listen to MusicEric Battenberg
 
Ml conf2013 teaching_computers_share
Ml conf2013 teaching_computers_shareMl conf2013 teaching_computers_share
Ml conf2013 teaching_computers_shareMLconf
 
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUES
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUESCONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUES
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUESAM Publications
 
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...TELKOMNIKA JOURNAL
 
dcase2016_taslp.pdf
dcase2016_taslp.pdfdcase2016_taslp.pdf
dcase2016_taslp.pdfzkdcxoan
 
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATIONHUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATIONIRJET Journal
 
Optimized audio classification and segmentation algorithm by using ensemble m...
Optimized audio classification and segmentation algorithm by using ensemble m...Optimized audio classification and segmentation algorithm by using ensemble m...
Optimized audio classification and segmentation algorithm by using ensemble m...Venkat Projects
 
IRJET- A Survey on Sound Recognition
IRJET- A Survey on Sound RecognitionIRJET- A Survey on Sound Recognition
IRJET- A Survey on Sound RecognitionIRJET Journal
 
IRJET- Music Genre Recognition using Convolution Neural Network
IRJET- Music Genre Recognition using Convolution Neural NetworkIRJET- Music Genre Recognition using Convolution Neural Network
IRJET- Music Genre Recognition using Convolution Neural NetworkIRJET Journal
 
musical instrument sound morphing
musical instrument sound morphingmusical instrument sound morphing
musical instrument sound morphingnoufiya
 
Music Gesture for Visual Sound Separation
Music Gesture for Visual Sound SeparationMusic Gesture for Visual Sound Separation
Music Gesture for Visual Sound Separationivaderivader
 
Human Perception and Recognition of Musical Instruments: A Review
Human Perception and Recognition of Musical Instruments: A ReviewHuman Perception and Recognition of Musical Instruments: A Review
Human Perception and Recognition of Musical Instruments: A ReviewEditor IJCATR
 
Shane Myrbeck - Listening to Design - Immersive Acoustics Modeling in the ARU...
Shane Myrbeck - Listening to Design - Immersive Acoustics Modeling in the ARU...Shane Myrbeck - Listening to Design - Immersive Acoustics Modeling in the ARU...
Shane Myrbeck - Listening to Design - Immersive Acoustics Modeling in the ARU...swissnex San Francisco
 

Similaire à A Computational Framework for Sound Segregation in Music Signals using Marsyas (20)

Extraction and Conversion of Vocals
Extraction and Conversion of VocalsExtraction and Conversion of Vocals
Extraction and Conversion of Vocals
 
AI THROUGH THE EYES OF ORGANISE SOUND
AI THROUGH THE EYES OF ORGANISE SOUNDAI THROUGH THE EYES OF ORGANISE SOUND
AI THROUGH THE EYES OF ORGANISE SOUND
 
Performance Comparison of Musical Instrument Family Classification Using Soft...
Performance Comparison of Musical Instrument Family Classification Using Soft...Performance Comparison of Musical Instrument Family Classification Using Soft...
Performance Comparison of Musical Instrument Family Classification Using Soft...
 
Nithin Xavier research_proposal
Nithin Xavier research_proposalNithin Xavier research_proposal
Nithin Xavier research_proposal
 
Annotating Soundscapes.pdf
Annotating Soundscapes.pdfAnnotating Soundscapes.pdf
Annotating Soundscapes.pdf
 
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...
 
Koyama ASA ASJ joint meeting 2016
Koyama ASA ASJ joint meeting 2016Koyama ASA ASJ joint meeting 2016
Koyama ASA ASJ joint meeting 2016
 
MLConf2013: Teaching Computer to Listen to Music
MLConf2013: Teaching Computer to Listen to MusicMLConf2013: Teaching Computer to Listen to Music
MLConf2013: Teaching Computer to Listen to Music
 
Ml conf2013 teaching_computers_share
Ml conf2013 teaching_computers_shareMl conf2013 teaching_computers_share
Ml conf2013 teaching_computers_share
 
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUES
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUESCONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUES
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUES
 
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
 
dcase2016_taslp.pdf
dcase2016_taslp.pdfdcase2016_taslp.pdf
dcase2016_taslp.pdf
 
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATIONHUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
 
Optimized audio classification and segmentation algorithm by using ensemble m...
Optimized audio classification and segmentation algorithm by using ensemble m...Optimized audio classification and segmentation algorithm by using ensemble m...
Optimized audio classification and segmentation algorithm by using ensemble m...
 
IRJET- A Survey on Sound Recognition
IRJET- A Survey on Sound RecognitionIRJET- A Survey on Sound Recognition
IRJET- A Survey on Sound Recognition
 
IRJET- Music Genre Recognition using Convolution Neural Network
IRJET- Music Genre Recognition using Convolution Neural NetworkIRJET- Music Genre Recognition using Convolution Neural Network
IRJET- Music Genre Recognition using Convolution Neural Network
 
musical instrument sound morphing
musical instrument sound morphingmusical instrument sound morphing
musical instrument sound morphing
 
Music Gesture for Visual Sound Separation
Music Gesture for Visual Sound SeparationMusic Gesture for Visual Sound Separation
Music Gesture for Visual Sound Separation
 
Human Perception and Recognition of Musical Instruments: A Review
Human Perception and Recognition of Musical Instruments: A ReviewHuman Perception and Recognition of Musical Instruments: A Review
Human Perception and Recognition of Musical Instruments: A Review
 
Shane Myrbeck - Listening to Design - Immersive Acoustics Modeling in the ARU...
Shane Myrbeck - Listening to Design - Immersive Acoustics Modeling in the ARU...Shane Myrbeck - Listening to Design - Immersive Acoustics Modeling in the ARU...
Shane Myrbeck - Listening to Design - Immersive Acoustics Modeling in the ARU...
 

Plus de Luís Gustavo Martins

Creativity and Design Thinking - 2024.pdf
Creativity and Design Thinking  - 2024.pdfCreativity and Design Thinking  - 2024.pdf
Creativity and Design Thinking - 2024.pdfLuís Gustavo Martins
 
Inteligência Artificial - do hype, ao mito, passando pelas oportunidades e ri...
Inteligência Artificial - do hype, ao mito, passando pelas oportunidades e ri...Inteligência Artificial - do hype, ao mito, passando pelas oportunidades e ri...
Inteligência Artificial - do hype, ao mito, passando pelas oportunidades e ri...Luís Gustavo Martins
 
ANDROIDS, REPLICANTS AND BLADE RUNNERS - ARE WE ALL DEEP DREAMING OF ELECTRI...
 ANDROIDS, REPLICANTS AND BLADE RUNNERS - ARE WE ALL DEEP DREAMING OF ELECTRI... ANDROIDS, REPLICANTS AND BLADE RUNNERS - ARE WE ALL DEEP DREAMING OF ELECTRI...
ANDROIDS, REPLICANTS AND BLADE RUNNERS - ARE WE ALL DEEP DREAMING OF ELECTRI...Luís Gustavo Martins
 
Smart research? A retórica da Excelência.
Smart research? A retórica da Excelência.Smart research? A retórica da Excelência.
Smart research? A retórica da Excelência.Luís Gustavo Martins
 
Artificial intelligence and Creativity
Artificial intelligence and CreativityArtificial intelligence and Creativity
Artificial intelligence and CreativityLuís Gustavo Martins
 
The impact of Cultural Context on the Perception of Sound and Musical Languag...
The impact of Cultural Context on the Perception of Sound and Musical Languag...The impact of Cultural Context on the Perception of Sound and Musical Languag...
The impact of Cultural Context on the Perception of Sound and Musical Languag...Luís Gustavo Martins
 
Introdução à programação em Android e iOS - iOS
Introdução à programação em Android e iOS - iOSIntrodução à programação em Android e iOS - iOS
Introdução à programação em Android e iOS - iOSLuís Gustavo Martins
 
Introdução à programação em Android e iOS - OOP Java
Introdução à programação em Android e iOS - OOP JavaIntrodução à programação em Android e iOS - OOP Java
Introdução à programação em Android e iOS - OOP JavaLuís Gustavo Martins
 
Introdução à programação em Android e iOS - OOP em ObjC
Introdução à programação em Android e iOS - OOP em ObjCIntrodução à programação em Android e iOS - OOP em ObjC
Introdução à programação em Android e iOS - OOP em ObjCLuís Gustavo Martins
 
Introdução à programação em Android e iOS - Conceitos fundamentais de program...
Introdução à programação em Android e iOS - Conceitos fundamentais de program...Introdução à programação em Android e iOS - Conceitos fundamentais de program...
Introdução à programação em Android e iOS - Conceitos fundamentais de program...Luís Gustavo Martins
 
Research methodology - What is a PhD?
Research methodology - What is a PhD?Research methodology - What is a PhD?
Research methodology - What is a PhD?Luís Gustavo Martins
 

Plus de Luís Gustavo Martins (13)

Creativity and Design Thinking - 2024.pdf
Creativity and Design Thinking  - 2024.pdfCreativity and Design Thinking  - 2024.pdf
Creativity and Design Thinking - 2024.pdf
 
Inteligência Artificial - do hype, ao mito, passando pelas oportunidades e ri...
Inteligência Artificial - do hype, ao mito, passando pelas oportunidades e ri...Inteligência Artificial - do hype, ao mito, passando pelas oportunidades e ri...
Inteligência Artificial - do hype, ao mito, passando pelas oportunidades e ri...
 
ANDROIDS, REPLICANTS AND BLADE RUNNERS - ARE WE ALL DEEP DREAMING OF ELECTRI...
 ANDROIDS, REPLICANTS AND BLADE RUNNERS - ARE WE ALL DEEP DREAMING OF ELECTRI... ANDROIDS, REPLICANTS AND BLADE RUNNERS - ARE WE ALL DEEP DREAMING OF ELECTRI...
ANDROIDS, REPLICANTS AND BLADE RUNNERS - ARE WE ALL DEEP DREAMING OF ELECTRI...
 
Smart research? A retórica da Excelência.
Smart research? A retórica da Excelência.Smart research? A retórica da Excelência.
Smart research? A retórica da Excelência.
 
Artificial intelligence and Creativity
Artificial intelligence and CreativityArtificial intelligence and Creativity
Artificial intelligence and Creativity
 
Creativity and Design Thinking
Creativity and Design ThinkingCreativity and Design Thinking
Creativity and Design Thinking
 
The impact of Cultural Context on the Perception of Sound and Musical Languag...
The impact of Cultural Context on the Perception of Sound and Musical Languag...The impact of Cultural Context on the Perception of Sound and Musical Languag...
The impact of Cultural Context on the Perception of Sound and Musical Languag...
 
Introdução à programação em Android e iOS - iOS
Introdução à programação em Android e iOS - iOSIntrodução à programação em Android e iOS - iOS
Introdução à programação em Android e iOS - iOS
 
Introdução à programação em Android e iOS - OOP Java
Introdução à programação em Android e iOS - OOP JavaIntrodução à programação em Android e iOS - OOP Java
Introdução à programação em Android e iOS - OOP Java
 
Introdução à programação em Android e iOS - OOP em ObjC
Introdução à programação em Android e iOS - OOP em ObjCIntrodução à programação em Android e iOS - OOP em ObjC
Introdução à programação em Android e iOS - OOP em ObjC
 
Introdução à programação em Android e iOS - Conceitos fundamentais de program...
Introdução à programação em Android e iOS - Conceitos fundamentais de program...Introdução à programação em Android e iOS - Conceitos fundamentais de program...
Introdução à programação em Android e iOS - Conceitos fundamentais de program...
 
Research methodology - What is a PhD?
Research methodology - What is a PhD?Research methodology - What is a PhD?
Research methodology - What is a PhD?
 
Introduction to pattern recognition
Introduction to pattern recognitionIntroduction to pattern recognition
Introduction to pattern recognition
 

Dernier

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Dernier (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

A Computational Framework for Sound Segregation in Music Signals using Marsyas

  • 1. A Computational Framework for Sound Segregation in Music Signals Luís Gustavo Martins CITAR / Escola das Artes da UCP lmartins@porto.ucp.pt Porto, Portugal Auditory Modeling Workshop Google, MountainView, CA, USA 19.11.2010
  • 2. Acknowledgments A Computational Framework for Sound Segregation in Music Signals2 }  This work is the result of the collaboration with: }  University ofVictoria, BC, Canada }  GeorgeTzanetakis, Mathieu Lagrange, Jennifer Murdock }  All the Marsyas team }  INESC Porto }  Luis Filipe Teixeira }  Jaime Cardoso }  Fabien Gouyon }  Technical University of Berlin, Germany }  Juan José Burred }  FEUP PhD Advisor Professor }  Aníbal Ferreira }  Supporting entities }  Fundação para a Ciência e aTecnologia - FCT }  Fundação Calouste Gulbenkian }  VISNET II, NoE European Project
  • 3. Research Project A Computational Framework for Sound Segregation in Music Signals3 }  FCT R&D Project (APPROVED FOR FUNDING) }  A Computational Auditory Scene Analysis Framework for Sound Segregation in Music Signals  }  3-year project (starting Jan. 2011) }  Partners: }  CITAR (Porto, Portugal)   Luís Gustavo Martins (PI), Álvaro Barbosa, Daniela Coimbra }  INESC Porto (Porto, Portugal)   Fabien Gouyon }  UVic (Victoria, BC, Canada)   George Tzanetakis }  IRCAM (Paris, France)   Mathieu Lagrange }  Consultants }  FEUP (Porto, Portugal)   Prof. Aníbal Ferreira, Prof. Jaime Cardoso }  McGill University / CIRMMT (Montreal, QC, Canada)   Prof. Stephan McAdams
  • 4. Summary A Computational Framework for Sound Segregation in Music Signals4 }  Problem Statement }  The Main Challenges }  Current State }  Related Research Areas }  Main Contributions }  Proposed Approach }  Results }  Software Implementation }  Conclusions and Future Work
  • 5. Problem Statement A Computational Framework for Sound Segregation in Music Signals5 }  Propose a computational sound segregation framework }  Focused on music signals }  But not necessarily limited to music signals }  Perceptually inspired }  So it can build upon the current knowledge of how listeners perceive sound events in music signals }  Causal }  So it mimics the human auditory system and allows online processing of sounds }  Flexible }  So it can accommodate different perceptually inspired grouping cues }  Generic }  So it can be used in different audio and MIR application scenarios }  Effective }  So it can improve the extraction of perceptually relevant information from musical mixtures }  Efficient }  So it can find practical use in audio processing and MIR tasks
  • 6. MUSIC LISTENING ABSTRACT KNOWLEDGE STRUCTURES EVENT STRUCTURE PROCESSING EXTRACTION OF ATTRIBUTES AUDITORY GROUPING PROCESSES MENTAL REPRESENTATION OF SOUND ENVIRONMENT TRANSDUCTION TRANSDUCTION ATTENTIONAL PROCESSES Figure 2: The main types of auditory processing and their interactions (adapted from [McAdams and Bigand, 1993]). possible to extract perceptual attributes which provide a representation of each element in the auditory system. }  Human listeners are able to perceive individual sound events in complex mixtures }  Even if listening to: }  Monaural music recordings }  Unknown sounds, timbres or instruments }  Perception is influenced by several complex factors }  Listener’s prior knowledge, context, attention, … }  Based on both low-level and high-level cues }  Difficult to replicate computationally… The Main Challenges A Computational Framework for Sound Segregation in Music Signals8
  • 7. The Main Challenges A Computational Framework for Sound Segregation in Music Signals9 }  Why Music Signals? }  Music sound is, in some senses, more challenging to analyse than non-musical sounds }  High time-frequency overlap of sources and sound events   Music composition and orchestration   Sources that often play simultaneously  polyphony   Favor consonant pitch intervals   Sound sources are highly correlated }  High variety of spectral and temporal characteristics   Musical instruments present a wide range of sound production mechanisms }  Techniques traditionally used for monophonic, non-musical or speech signals perform poorly }  Yet, music signals are usually well organized and structured
  • 8. Current State A Computational Framework for Sound Segregation in Music Signals10 }  Typical systems in MIR }  Represent statistically the entire sound mixture }  Analysis and retrieval performance reached a “glass ceiling” [Aucouturier and Pachet, 2004] }  New Paradigm }  Attempt to individually characterize the different sound events in a sound mixture }  Performance still quite limited when compared to human auditory system }  But already provides alternative and improved approaches to common sound analysis and MIR tasks
  • 9. Applications A Computational Framework for Sound Segregation in Music Signals11 }  “Holy grail” applications }  “The Listening Machine” }  “The Robotic Ear” }  “Down to earth” applications }  Sound and Music Description }  Sound Manipulation }  Robust Speech and Speaker Recognition }  Object-based Audio Coding }  Automatic Music Transcription }  Audio and Music Information Retrieval }  Auditory Scene Reconstruction }  Hearing Prostheses }  Up-mixing }  …
  • 10. Related Research Areas A Computational Framework for Sound Segregation in Music Signals12 }  Sound and Music Computing (SMC) [Serra et al., 2007] }  Computational Auditory Scene Analysis (CASA) [Wang and Brown, 2006] }  Perception Research }  Psychoacoustics [Stevens, 1957] }  Auditory Scene Analysis (ASA) [Bregman, 1990] }  Digital Signal Processing [Oppenheim and Schafer, 1975] }  Music Information Retrieval (MIR) [Downie, 2003] }  Machine Learning [Duda et al., 2000] }  ComputerVision [Marr, 1982]
  • 11. Related Areas A Computational Framework for Sound Segregation in Music Signals13 }  Auditory Scene Analysis (ASA) [Bregman, 1990] }  How do humans “understand” sound mixtures? }  Find packages of acoustic evidence such that each package has arisen from a single sound source }  Grouping Cues }  Integration   Simultaneous vs. Sequential   Primitive vs. schema-based }  Cues   Common amplitude, frequency, fate   Harmonicity   Time continuity   … Time
  • 12. Related Areas A Computational Framework for Sound Segregation in Music Signals14 }  Computational Auditory Scene Analysis (CASA) [Wang and Brown, 2006] }  “Field of computational study that aims to achieve human performance in ASA by using one or two microphone recordings of the acoustic scene.” [Wang and Brown, 2006] MUSIC LISTENING SOURCE MODELS ANALYSIS FRONT-END MID-LEVEL REPRESENTATION SCENE ORGANIZATION GROUPING CUES STREAM RESYNTHESIS ACOUSTIC MIXTURE SEGREGATED SIGNALS Figure 3: System Architecture of a typical CASA system. reference in the development of sound source separation systems, since it is the only ex-
  • 13. Main Contributions A Computational Framework for Sound Segregation in Music Signals15 }  Proposal and experimental validation of a flexible and efficient framework for sound segregation }  Focused on “real-world” polyphonic music }  Inspired by ideas from CASA }  Causal and data-driven }  Definition of a novel harmonicity cue }  Termed HarmonicallyWrapped Peak Similarity (HWPS) }  Experimentally shown as a good grouping criteria }  Software implementation of the proposed sound segregation framework }  Modular, extensible and efficient }  Made available as free and open source software (FOSS) }  Based on the MARSYAS framework
  • 14. Proposed Approach A Computational Framework for Sound Segregation in Music Signals16 }  Assumptions }  Perception primarily depends on the use of low-level sensory information }  Does not necessarily require prior knowledge (i.e. training) }  Still able to perform primitive identification and segregation of sound events in a sound mixture }  Prior knowledge and high-level information can still be used }  To award additional meaning to the primitive observations }  To consolidate primitive observations as relevant sound events }  To modify the listener’s focus of attention
  • 15. Proposed Approach A Computational Framework for Sound Segregation in Music Signals19 }  System overview Sinusoidal Synthesis Texture Window Spectral Peaks (over Texture Window) 150ms Spectral Peaks 46ms Sinusoidal Analysis Spectral Peaks 46ms Cluster Selection Similarity Computation Normalized Cut
  • 16. Analysis Front-end A Computational Framework for Sound Segregation in Music Signals22 }  Sinusoidal Modeling }  Sum of highest amplitude sinusoids at each frame  peaks }  Maximum of 20 peaks/frame }  Window = 46ms ; hop = 11ms }  Parametric model: Estimate Amplitude, Frequency, Phase of each peak frequency Spectral Peaks Sinusoidal Analysis Spectral Peaks 46ms
  • 17. Time Segmentation A Computational Framework for Sound Segregation in Music Signals23 }  Texture Windows }  Construct a graph over a texture window of the sound mixture }  Provides time integration   Approaches partial tracking and source separation jointly   Traditionally two separated, consecutive stages Spectral Peaks Sinusoidal Analysis time frequency Spectral Peaks Sinusoidal Analysis Texture Window
  • 18. Time Segmentation A Computational Framework for Sound Segregation in Music Signals24 }  Fixed length texture windows }  E.g. 150 ms }  Dynamically adjusted texture windows }  Onset detector }  Perceptually more relevant }  50ms ~ 300ms AmplitudeFrequency 0 0.8 1.6 Time (secs) SpectralFlux 1 TEXTURE WINDOW 2 TEXTURE WINDOW 3 TEXTURE WINDOW 4 TEXTURE WINDOW 5 6 TEXTURE WIN. 7
  • 19. Perceptual Cues as Similarity Functions A Computational Framework for Sound Segregation in Music Signals25 Similarity Computation AMPLITUDE SIMILARITY FREQUENCY SIMILARITY HARMONIC SIMILARITY (HWPS) AZIMUTH PROXIMITY COMMON ONSET OFFSET SOURCE MODELS COMBINER Spectral Peaks (over Texture Window) 150ms OVERALL SIMILARITY MATRIX Normalized Cut ...
  • 20. Perceptual Cues as Similarity Functions A Computational Framework for Sound Segregation in Music Signals26 }  Grouping Cues (inspired from ASA) }  Similarity between time-frequency components in a texture window }  Frequency proximity }  Amplitude proximity }  Harmonicity proximity (HWPS) }  … }  Encode topological knowledge into a similarity graph/matrix }  Simultaneous integration (peaks within the same frame) }  Sequential integration over the texture window Similarity Matrix A0 A1 A2 A3 B3, A4 B0 B1 B2 B4 A0 A1 A2 A3 B3, A4 B0 B1 B2 B4 xi xj xk wij = wji xq xp xl
  • 21. Perceptual Cues as Similarity Functions A Computational Framework for Sound Segregation in Music Signals27 }  Defining a Generic Similarity Function }  Fully connected graphs }  Gaussian similarity function   How to define neighborhood width (σ)?   Local statistics from data in a Texture Window   Use prior knowledge (e.g. JNDs)    Use σ as weights (after normalizing the Sim. Fun. to [0,1]) 0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5 0.25 0.5 0.75 1 d(xi, xj) wij σ=0.4 σ = 1.0 σ = 1.2 wij = e − “ d(xi,xj ) σ ”2 xi xj wij = wji
  • 22. Perceptual Cues as Similarity Functions A Computational Framework for Sound Segregation in Music Signals28 }  Amplitude and Frequency Similarity }  Amplitude }  Gaussian function of the Euclidean distances   In dB  more perceptually relevant }  Frequency }  Gaussian function of the Euclidean distances   In Bark  more perceptually relevant }  Not sufficient to segregate harmonic events }  Nevertheless are important to group peaks from:   Inharmonic or noisy frequency components in harmonic sounds   Non-harmonic sounds (unpitched sounds) Two of the most basic similarities explored by the auditory system a frequency and amplitude features of the sound components in a sound m tion 2.3.1). Accordingly, the edge weight connecting two peaks pk l and pk+n m will frequency and amplitude proximities. Following the generic considerati the definition of a similarity function for spectral clustering in Section and frequency similarities, Wa and Wf respectively, are defined as follow Wa(pk l , pk+n m ) = e − „ ak l −ak+n m σa «2 Wf (pk l , pk+n m ) = e − „ fk l −fk+n m σf «2 where the Euclidean distances are modeled as two Gaussian functions, fined in Equation 8. The amplitudes are measured in Decibels (dB) an are measured in Barks (a frequency scale approximately linear below 500 mic above), since these scales have shown to better model the the sensib the human ear [Hartmann, 1998]. 79 frequency and amplitude features of the sound components in a sound m tion 2.3.1). Accordingly, the edge weight connecting two peaks pk l and pk+n m will frequency and amplitude proximities. Following the generic considerati the definition of a similarity function for spectral clustering in Section and frequency similarities, Wa and Wf respectively, are defined as follow Wa(pk l , pk+n m ) = e − „ ak l −ak+n m σa «2 Wf (pk l , pk+n m ) = e − „ fk l −fk+n m σf «2 where the Euclidean distances are modeled as two Gaussian functions, fined in Equation 8. The amplitudes are measured in Decibels (dB) an are measured in Barks (a frequency scale approximately linear below 500 mic above), since these scales have shown to better model the the sensib the human ear [Hartmann, 1998]. 79
  • 23. Perceptual Cues as Similarity Functions A Computational Framework for Sound Segregation in Music Signals29 }  Harmonically Wrapped Peak Similarity (HWPS) }  Harmonicity is one of the most powerful ASA cues [Wang and Brown, 2006] }  Proposal of a novel harmonicity similarity function }  Does not rely on the prior knowledge of f0 in the signal }  Takes into account spectral information in a global manner (spectral patterns)   For peaks in a same frame or in different frames in a Texture Window   Takes into consideration the amplitudes of the spectral peaks }  3 step algorithm   Shifted Spectral Pattern   Wrapped Frequency Space  Histogram computation   Discrete Cosine Similarity  [0,1] STEP 3 – Discrete Cosine Similarity The last step is now to correlate the two shifted and harmonically wrapped spec- tral patterns ( ˆF k l and ˆF k+n m ) to obtain the HWPS measure between the two correspond- ing peaks. This correlation can be done using an algorithmic approach as proposed in [Lagrange and Marchand, 2006], but this was found not to be reliable or robust in prac- tice. Alternatively, the proposal is to discretize each shifted and harmonically wrapped spectral pattern into an amplitude weighted histogram, Hk l , corresponding to each spec- tral pattern ˆF k l . The contribution of each peak to the histogram is equal to its amplitude and the range between 0 and 1 of the Harmonically-Wrapped Frequency is divided into 20 equal-size bins (a 12 or a 24 bin histogram would provide a more musically meaning- ful chroma-based representation, but preliminary and empirical tests have shown better results when using 20 bin histograms). In addition, the harmonically wrapped spectral patterns are also folded into an octave to form a pitch-invariant “chroma” profile. For example, in Figure 19, the energy of the spectral pattern in wrapped frequency 1 (all integer multiples of the wrapping frequency) is mapped to histogram bin 0. The HWPS similarity between the peaks pk l and pk+n m is then defined based on the cosine distance between the two corresponding discretized histograms as follows: Wh(pk l , pk+n m ) = HWPS(pk l , pk+n m ) = e 0 @ c(Hk l ,Hk+n m ) r c(Hk l ,Hk l )·c(Hk+n m ,Hk+n m ) 1 A 2 (28) where c(Hb a, Hd c ) = i Hb a(i) × Hd c (i) . (29) One may notice that due to the wrapping operation of Equation 25, the size of the histograms can be relatively small (e.g. 20 bins), thus being computationally efficient. A Gaussian function is also used for controlling the neighborhood width of the harmonicity cue, where σh = 1 is implicitly used in the current system implementation. Wh(pk l , pk+n m ) = HWPS(pk l , pk+n m ) = e − 1− c(Hk l ,Hk+n m ) √ c(Hk l ,Hk l )×c(H k+n m ,H k+n m ) 2
  • 24. Perceptual Cues as Similarity Functions A Computational Framework for Sound Segregation in Music Signals30 }  HWPS }  Between peaks of a same harmonic “source” }  In a same frame  High similarity (~1.0) A0 B0 A1 B1 B2 A2 f0A f0B 2f0A 3f0A 3f0B2f0B0 frame k 0 1 3f0 0 −f0A f0A 2f0A 3f0A 4f0A 0 1 3f0 0 −f0A f0A 2f0A 3f0A 4f0A 0 1 3f0 0 −f0A f0A 2f0A 3f0A 4f0A 0 1 3f0 0 −f0A f0A 2f0A 3f0A 4f0A fk A1 = 2f0A SHIFTING SHIFTING fk A0 = f0A A1 A0 HWPS(A1, A0)|h=f0A ¯Fk A1 ˜Fk A1 ˜Fk A0 ¯Fk A0 ˆFk A0 ˆFk A1 dB High HWPS(A1, A0)|h=f0A = = 0 1 A1 A0 Fk A1 = = Fk A0 ˜A1 ˜A0
  • 25. Perceptual Cues as Similarity Functions A Computational Framework for Sound Segregation in Music Signals31 }  HWPS }  Between peaks of different harmonic “sources” }  In a same frame  Low similarity (~0.0) A0 B0 A1 B1 B2 A2 f0A f0B 2f0A 3f0A 3f0B2f0B0 frame k 0 1 3f0 0 −f0A f0A 2f0A 3f0A 4f0A 0 1 3f0 0 −f0A f0A 2f0A 3f0A 4f0A 0 1 3f0 0 −f0A f0A 2f0A 3f0A 4f0A 0 1 3f0 0 −f0A f0A 2f0A 3f0A 4f0A Fk A1 = = Fk B0 fk A1 = 2f0A SHIFTING SHIFTING fk B0 = f0B A1 HWPS(A1, B0)|h=f0A ¯Fk A1 ˜Fk A1 ˜Fk B0 ¯Fk B0 ˆFk B0 ˆFk A1 dB B0 ! A1 B0 ˜A1 ˜B0 Low HWPS(A1, B0)|h=f0A = 0 1
  • 26. Perceptual Cues as Similarity Functions A Computational Framework for Sound Segregation in Music Signals32 }  HWPS }  Between peaks of a same harmonic “source” }  In different frames  Mid-High similarity   Interfering spectral content may be different   Degrades HWPS…   Only consider bin 0? A0 B0 A1 B1 B2 A2 f0A f0B 2f0A 3f0A 3f0B2f0B0 frame k Fk A1 = = Fk+n A0 dB A0 A1 A2 f0A 2f0A 3f0A0 dB frame k + n C0 C1 C2 f0C 2f0C 3f0C 0 1 3f0 0 −f0A f0A 2f0A 3f0A 4f0A 0 1 3f0 0 −f0A f0A 2f0A 3f0A 4f0A 0 1 3f0 0 −f0A f0A 2f0A 3f0A 4f0A 0 1 3f0 0 −f0A f0A 2f0A 3f0A 4f0A fk A1 = 2f0A SHIFTING SHIFTING Ak 1 HWPS(Ak 1, Ak+n 0 )|h=f0A ¯Fk A1 ˜Fk A1 ˜Fk+n A0 ¯Fk+n A0 ˆFk+n A0 ˆFk A1 Ak+n 0 Ak 1 Ak+n 0 fk+n A0 = f0A ˜Ak 1 ˜Ak+n 0 Mid-High HWPS(Ak 1, Ak+n 0 )|h=f0A = 0 1 =
  • 27. Perceptual Cues as Similarity Functions A Computational Framework for Sound Segregation in Music Signals33 }  HWPS }  Impact of f0 estimates (h’) }  Ideal }  Min peak frequency }  Highest amplitude peak }  Histogram-based f0 estimates  pitch estimates == nr. Sources? A FRAMEWORK FOR SOUND SEGREGATION IN MUSIC SIGNALS wrapping operation would be perfect with the prior knowledge of the fundamental fre- quency. With this knowledge it would be possible to parametrize the wrapping operation h as: h = min(f0 k l , f0 k+n m ) (26) where f0 k l is the fundamental frequency of the source of the peak pk l . Without such prior, a conservative approach h is considered instead, although it will tend to over estimate the fundamental frequency: h = min(fk l , fk+n m ) (27) Notice that the value of the wrapping frequency function h is the same for both pat- terns corresponding to the peaks under consideration. Therefore the resulting shifted and wrapped frequency pattern will be more similar if the peaks belong to the same harmonic “source”. The resulting shifted and wrapped patterns are pitch invariant and can be seen in the middle plot of Figures 19 and 20. Different approaches could have been taken for the definition of the fundamental fre- quency estimation function h. One possibility would be to select the highest amplitude peak in the union of the two spectral patterns under consideration as the f0 estimate (i.e. h = {fi|i = argmaxi(Ai), ∀i ∈ [1, #A], where A = Ak l ∪ Ak+n m , #A is its number of elements and Ak l is the set of amplitudes corresponding to the spectral pattern Fk l ). The motivation for this approach is the fact that the highest amplitude partial in musical signals often corresponds to the fundamental frequency of the most prominent harmonic ‘source” active in that frame, although this assumption will not always hold. A more robust approach, though more computationally expensive, would be to calcu- late all the frequency differences between all peaks in each spectral pattern and compute a A FRAMEWORK FOR SOUND SEGREGATION IN MUSIC SIGNALS wrapping operation would be perfect with the prior knowledge of the fundamental fre- quency. With this knowledge it would be possible to parametrize the wrapping operation h as: h = min(f0 k l , f0 k+n m ) (26) where f0 k l is the fundamental frequency of the source of the peak pk l . Without such prior, a conservative approach h is considered instead, although it will tend to over estimate the fundamental frequency: h = min(fk l , fk+n m ) (27) Notice that the value of the wrapping frequency function h is the same for both pat- terns corresponding to the peaks under consideration. Therefore the resulting shifted and wrapped frequency pattern will be more similar if the peaks belong to the same harmonic “source”. The resulting shifted and wrapped patterns are pitch invariant and can be seen in the middle plot of Figures 19 and 20. Different approaches could have been taken for the definition of the fundamental fre- quency estimation function h. One possibility would be to select the highest amplitude peak in the union of the two spectral patterns under consideration as the f0 estimate (i.e. h = {fi|i = argmaxi(Ai), ∀i ∈ [1, #A], where A = Ak l ∪ Ak+n m , #A is its number of elements and Ak l is the set of amplitudes corresponding to the spectral pattern Fk l ). The motivation for this approach is the fact that the highest amplitude partial in musical signals often corresponds to the fundamental frequency of the most prominent harmonic ‘source” active in that frame, although this assumption will not always hold. A more robust approach, though more computationally expensive, would be to calcu- late all the frequency differences between all peaks in each spectral pattern and compute a histogram. The peaks in these histograms would be good candidates for the fundamental frequencies in each frame (in order to avoid octave ambiguities, a second histogram with the differences between all the candidate f0 values could be again computed, where the highest peaks would be selected as the final f0 candidates). The HWPS could then be where f0l is the fundamental frequency of the source of the peak pl . Without such prio a conservative approach h is considered instead, although it will tend to over estima the fundamental frequency: h = min(fk l , fk+n m ) (2 Notice that the value of the wrapping frequency function h is the same for both pa terns corresponding to the peaks under consideration. Therefore the resulting shifted an wrapped frequency pattern will be more similar if the peaks belong to the same harmon “source”. The resulting shifted and wrapped patterns are pitch invariant and can be se in the middle plot of Figures 19 and 20. Different approaches could have been taken for the definition of the fundamental fr quency estimation function h. One possibility would be to select the highest amplitud peak in the union of the two spectral patterns under consideration as the f0 estima (i.e. h = {fi|i = argmaxi(Ai), ∀i ∈ [1, #A], where A = Ak l ∪ Ak+n m , #A is its numb of elements and Ak l is the set of amplitudes corresponding to the spectral pattern Fk l The motivation for this approach is the fact that the highest amplitude partial in music signals often corresponds to the fundamental frequency of the most prominent harmon ‘source” active in that frame, although this assumption will not always hold. A more robust approach, though more computationally expensive, would be to calc late all the frequency differences between all peaks in each spectral pattern and compute histogram. The peaks in these histograms would be good candidates for the fundament frequencies in each frame (in order to avoid octave ambiguities, a second histogram wi the differences between all the candidate f0 values could be again computed, where th highest peaks would be selected as the final f0 candidates). The HWPS could then b iteratively calculated using each f0 candidate in this short list, and select the one wi the best value as the final choice. In fact, this technique could prove an interesting way robustly estimate the number of harmonic “sources” in each frame, including their pitche but experimental evaluations are still required to validate these approaches. ————— 0 500 1000 1500 2000 2500 3000 0 0.2 0.4 0.6 0.8 1 A0 A1 A2 A3 A4 , B3 B0 B1 B2 B4 Frequency (Hz) Amplitude
  • 28. Similarity Combination A Computational Framework for Sound Segregation in Music Signals36 Similarity Computation AMPLITUDE SIMILARITY FREQUENCY SIMILARITY HARMONIC SIMILARITY (HWPS) AZIMUTH PROXIMITY COMMON ONSET OFFSET SOURCE MODELS COMBINER Spectral Peaks (over Texture Window) 150ms OVERALL SIMILARITY MATRIX Normalized Cut ...
  • 29. Similarity Combination A Computational Framework for Sound Segregation in Music Signals38 }  Combining cues }  Product operator [ShiMalik2000]   High overall similarity only if all cues are high… }  More expressive operators? to represent the different sound events in a complex mixture. Therefore, the combination of different similarity cues could allow to make the best use of their isolated grouping abilities towards a more meaningful segregation of a sound mixture. Following the work of Shi and Malik [Shi and Malik, 2000], who proposed to compute the overall similarity function as the product of the individual similarity cues used for image segmentation, the current system combines the amplitude, frequency and HWPS grouping cues presented in the previous sections into a combined similarity function W as follows: W(pl, pm) = Wafh(pl, pm) = Wa(pl, pm) × Wf (pl, pm) × Wh(pl, pm) (30) Plots g in Figures 15 and 16 show the histogram of the values resulting from the com- bined similarity functions for the two sound examples, Tones A+B and Jazz1, respectively. 5 Audio clips of the signals plotted in Figures 17 and 18 are available at http://www.inescporto. pt/˜lmartins/Research/Phd/Phd.htmXXX 105Wafh = [(Wf ∧ Wa) ∨ Wh] ∧ Ws
  • 30. Segregating Sound Events A Computational Framework for Sound Segregation in Music Signals39 }  Segregation task }  Carried out by clustering components that are close in the similarity space }  Novel method based on Spectral Clustering }  Normalized Cut (Ncut) criterion   Originally proposed for ComputerVision   Takes cues as pair-wise similarities   Cluster the peaks into groups taking into account simultaneously all cues Similarity Computation AMPLITUDE SIMILARITY FREQUENCY SIMILARITY HARMONIC SIMILARITY (HWPS) AZIMUTH PROXIMITY COMMON ONSET OFFSET SOURCE MODELS COMBINER Spectral Peaks (over Texture Window) 150ms OVERALL SIMILARITY MATRIX Normalized Cut ...
  • 31. Segregating Sound Events A Computational Framework for Sound Segregation in Music Signals40 }  Segregation Task }  Normalized Cut criterion }  Achieves a balanced clustering of elements }  Relies on the eigenstructure of a similarity matrix to partition points into disjoint clusters   Points in the same cluster  high similarity   Points in different clusters  low similarity xi xj xk wij = wji better cut mincut xq xp xl
  • 32. Segregating Sound Events A Computational Framework for Sound Segregation in Music Signals41 }  Spectral Clustering }  Alternative to the EM and k-means traditional algorithms: }  Does not assume a convex shaped data representation }  Does not assume Gaussian distribution of data }  Does not present multiple minima in log-likelihood   Avoids multiple restarts of the iterative process }  Correctly handles complex and unknown shapes }  Usual in audio signals [Bach and Jordan 2004]
  • 33. Segregating Sound Events A Computational Framework for Sound Segregation in Music Signals42 }  Divisive clustering approach }  Recursive two-way cut }  Hierarchical partition of the data   Recursively partitions the data into two sets   Until pre-defined number of clusters is reached (requires prior knowledge!)   Until a stopping criteria is met }  Current implementation   Requires definition of number of clusters [Martins et al., 2007]   Or alternatively partitions data into 5 clusters and selects the 2 “denser” ones    Segregation of the dominant clusters in the mixture [Lagrange et al., 2008a]
  • 34. Segregation Results A Computational Framework for Sound Segregation in Music Signals43 a) Jazz1 b) AMPLITUDE SIMILARITY CLUSTER 1 c) AMPLITUDE SIMILARITY CLUSTER 2 d) FREQUENCY SIMILARITY CLUSTER 1 e) FREQUENCY SIMILARITY CLUSTER 2 f) HWPS SIMILARITY CLUSTER 1 g) HWPS SIMILARITY CLUSTER 2 h) COMBINED SIMILARITIES CLUSTER 1 i) COMBINED SIMILARITIES CLUSTER 2 FREQUENCY(Hz) TIME (secs) TIME (secs) TIME (secs) FREQUENCY(Hz)FREQUENCY(Hz)FREQUENCY(Hz)FREQUENCY(Hz) a) Tones A+B b) AMPLITUDE SIMILARITY CLUSTER 1 c) AMPLITUDE SIMILARITY CLUSTER 2 d) FREQUENCY SIMILARITY CLUSTER 1 e) FREQUENCY SIMILARITY CLUSTER 2 f) HWPS SIMILARITY CLUSTER 1 g) HWPS SIMILARITY CLUSTER 2 h) COMBINED SIMILARITIES CLUSTER 1 i) COMBINED SIMILARITIES CLUSTER 2 FREQUENCY(Hz) TIME (secs) TIME (secs) TIME (secs) FREQUENCY(Hz)FREQUENCY(Hz)FREQUENCY(Hz)FREQUENCY(Hz) B0 B1 B2 A4 + B3 A3 A2 A1 A0 0 500 1000 1500 2000 2500 3000 0 0.2 0.4 0.6 0.8 1 A0 A1 A2 A3 A4 , B3 B0 B1 B2 B4 Frequency (Hz) Amplitude
  • 35. Results A Computational Framework for Sound Segregation in Music Signals45 }  Predominant Melodic Source Segregation }  Dataset of real-world polyphonic music recordings }  Availability of the original isolated tracks (ground truth) }  Results (the higher the better)   HWPS improves results   When combined with other similarity features   When compared with other state-of-the-art harmonicity features [Srinivasan and Kankanhalli, 2003] [Virtanen and Klapuri, 2000] 0 1 2 3 4 5 6 7 Mean SDR (dB) for a 10 song dataset A+F+HWPS A+F+rHWPS A+F+HV A+F+HS A+F
  • 36. Results A Computational Framework for Sound Segregation in Music Signals47 }  Predominant Melodic Source Segregation }  On the use of Dynamic Texture Windows }  Results (the higher the better)   Smaller improvement (0.15 dB) than expected   Probably due to the cluster selection approach being used…   More computationally intensive (for longer texture windows)
  • 37. Results A Computational Framework for Sound Segregation in Music Signals51 }  Main Melody Pitch Estimation }  Resynthesize the segregated main voice clusters }  Perform pitch estimation using well known monophonic pitch estimation technique (Praat) }  Comparison with two techniques: }  Monophonic pitch estimation applied to mixture audio (from Praat) }  State-of-the-Art multi-pitch and main melody estimation algorithm applied to mixture audio [Klapuri, 2006] }  Results (the lower the better)
  • 38. Results A Computational Framework for Sound Segregation in Music Signals56 }  Voicing Detection }  Identifying portions of a music file containing vocals }  Evaluated three feature sets:   MFCC features extracted from the polyphonic signal   MFCC features extracted from the segregated main voice   Cluster Peak Ratio (CPR) feature   extracted from the segregated main voice clusters
  • 39. Results A Computational Framework for Sound Segregation in Music Signals57 }  Timbre Identification in polyphonic music signals [Martins et al., 2007] }  Polyphonic, multi-instrumental audio signals }  Artificial mixtures of 2-, 3- and 4-notes from real instruments }  Automatic separation of the sound sources }  Sound sources and events are reasonably captured, corresponding in most cases to played notes }  Matching of the separated events to a collection of 6 timbre models note 1 note n ... Sound Source Formation note 1 / inst 1 note n / inst i ... Timbre Models Matching Matching Peak Picking Sinusoidal Analysis ...... ...
  • 40. Results A Computational Framework for Sound Segregation in Music Signals58 }  Timbre Identification in polyphonic music signals [Martins et al., 2007] }  Sound sources and events are reasonably captured, corresponding in most cases to played notes
  • 41. Results A Computational Framework for Sound Segregation in Music Signals59 }  Timbre Identification in polyphonic music signals [Martins et al., 2007] }  6 instruments modeled [Burred et al., 2006]: }  Piano, violin, oboe, clarinet, trumpet and alto sax }  Modeled as a set of time-frequency templates   Describe the typical evolution in time of the spectral envelope of a note   Matches the salient peaks of the spectrum 0 0.2 0.4 0.6 0.8 1 2000 4000 6000 8000 10000 -80 -60 -40 -20 0 Frequency (Hz) Time(normalized) Amplitude(dB) PIANO 0.2 0.4 0.6 0.8 1 2000 4000 6000 8000 10000 -80 -60 -40 -20 0 Frequency (Hz) Time(normalized) Amplitude(dB) OBOE
  • 42. Results A Computational Framework for Sound Segregation in Music Signals60 }  Timbre Identification in polyphonic music signals [Martins et al., 2007] }  Instrument presence detection in mixtures of notes }  56% of instruments occurrences correctly detected, with a precision of 64% [Martins et al., 2007] Weak Matching Alto sax cluster  piano prototype Strong Matching Piano cluster  piano prototype
  • 43. Software Implementation A Computational Framework for Sound Segregation in Music Signals62 }  Modular, flexible and efficient software implementation }  Based on Marsyas }  Free and Open Source framework for audio analysis and processing http://marsyas.sourceforge.net peakClustering myAudio.wav
  • 44. Software Implementation A Computational Framework for Sound Segregation in Music Signals63 }  Marsyas }  peakClustering Overview Series/mainNet frameMaxNumPeaks totalNumPeaks PeakViewSink/ peSink PeakLabeler/ labeler PeakConvert/ conv Accumulator/textWinNet ... ... ... 1 FlowThru/clustNet ... ... ... Shredder/synthNet ... ... ... 2 3 nTimes A B peakLabels nTimestotalNumPeaks frameMaxNumPeaks innerOut B
  • 45. Software Implementation A Computational Framework for Sound Segregation in Music Signals64 }  Marsyas }  Sinusoidal analysis front-end Accumulator/textWinNet Series/analysisNet Series/peakExtract ShiftInput/ si Fanout/stereoFo Series/stereoSpkNet Parallel/LRnet Series/spkL Windowing/ win Spectrum/ spk Series/spkR Windowing/ win Spectrum/ spk EnhADRessStereoSpectrum/ stereoSpk EnhADRess/ ADRess Series/spectrumNet Stereo2Mono/ s2m Shifter/ sh Windowing/ wi Parallel/par Spectrum/ spk1 Spectrum/ spk2 FlowThru/onsetdetector ... ... ... 1a FanOutIn/mixer + Series/mixSeries Delay/ noiseDelay SoundFileSource/ src Gain/ noiseGain Series/oriNet SoundFileSource/ src Gain/ oriGain A 1 onsetDetected flush FlowThru/onsetdetector Windowing/ wi Spectrum/ spk PowerSpectrum/ pspk Flux/ flux ShiftInput/ sif Filter/ filt1 Filter/ filt2 Reverse/ rev1 Reverse/ rev2 PeakerOnset/ peaker 1a onsetDetected I S
  • 46. Software Implementation A Computational Framework for Sound Segregation in Music Signals65 }  Marsyas }  Onset detection ShiftInput/ si Series/stereoSpkNet Parallel/LRnet Series/spkL Windowing/ win Spectrum/ spk Series/spkR Windowing/ win Spectrum/ spk EnhADRessStereoSpectrum/ stereoSpk EnhADRess/ ADRess s2m sh wi Spectrum/ spk2 ... ... ... FanOutIn/mixer + Series/mixSeries Delay/ noiseDelay SoundFileSource/ src Gain/ noiseGain Series/oriNet SoundFileSource/ src Gain/ oriGain A onsetDetected flush FlowThru/onsetdetector Windowing/ wi Spectrum/ spk PowerSpectrum/ pspk Flux/ flux ShiftInput/ sif Filter/ filt1 Filter/ filt2 Reverse/ rev1 Reverse/ rev2 PeakerOnset/ peaker 1a onsetDetected I
  • 47. Software Implementation A Computational Framework for Sound Segregation in Music Signals66 }  Marsyas }  Similarity matrix computation and Clustering PeakConvert /conv FlowThru/clustNet frameMaxNumPeaks totalNumPeaks FanOutIn/simNet x Series/freqSim SimilarityMatrix/FREQsimMat Metric/ FreqL2Norm RBF/ FREQrbf Series/ampSim SimilarityMatrix/AMPsimMat Metric/ AmpL2Norm RBF/ AMPrbf Series/HWPSim SimilarityMatrix/HWPSsimMat HWPS/ hwps RBF/ HWPSrbf Series/panSim SimilarityMatrix/PANsimMat Metric/ PanL2Norm RBF/ PANrbf PeakFeatureSelect/ FREQfeatSelect 2 B D D Series/NCutNet Fanout/stack NormCut/ NCut Gain/ ID PeakClusterSelect/ clusterSelect E innerOut PeakLabeler/ labeler B labels D D D PeakFeatureSelect/ AMPfeatSelect PeakFeatureSelect/ PANfeatSelect PeakFeatureSelect/ HWPSfeatSelect F C1 C2 C3
  • 48. Software Implementation A Computational Framework for Sound Segregation in Music Signals67 }  Marsyas }  More flexible Similarity expression FanOutIn/simNet Series/panSim SimilarityMatrix/PANsimMat Metric/ PanL2Norm RBF/ PANrbf PeakFeatureSelect/ PANfeatSelect .* FanOutIn/ORnet FanOutIn/ANDnet .* Series/freqSim SimilarityMatrix/FREQsimMat Metric/ FreqL2Norm RBF/ FREQrbf PeakFeatureSelect/ FREQfeatSelect Series/ampSim SimilarityMatrix/AMPsimMat Metric/ AmpL2Norm RBF/ AMPrbf PeakFeatureSelect/ AMPfeatSelect max Series/HWPSim SimilarityMatrix/HWPSsimMat HWPS/ hwps RBF/ HWPSrbf PeakFeatureSelect/ HWPSfeatSelect
  • 49. Software Implementation A Computational Framework for Sound Segregation in Music Signals68 }  Marsyas }  Cluster Resynthesis Shredder/synthNet Series/postNet Gain/ outGain PeakSynthOsc/ pso Windowing/ wiSyn OverlapAdd/ ov SoundFileSink/ dest 3 B
  • 50. Software Implementation A Computational Framework for Sound Segregation in Music Signals69 }  Marsyas }  Data structures D totalnumbe intextureSIMILARITY C1 f2 f5f4f1 f3 f6 peaks' frequency total number of peaks A Re(0) Re(N/2) Re(1) Im(1) Im(N/2-1) Re(N/2-1) ... ... ... ... ... ... ... Re(0) Re(N/2) Re(1) Im(1) Im(N/2-1) Re(N/2-1) ... ... ... ... ... ... ... complexspectrum1 (Npoints) Pan(0) Pan(1) Pan(N/2) ... ... ... ... ... ... ... stereo spectrum (N/2+1points) texture window frames complexspectrum2 (Npoints) B peaks FREQUENCY peaks AMPLITUDE peaks PHASE peaks GROUP ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... frameMaxNumPeaks texture window frames peaks TRACK ... ... ... ... ... ... ... ... ... ... ... ... ... audio frame (N+1 samples) I 31 42 50 1 430 2 5 Ch1 samples Ch2 samples analysis window (N samples) S 1 30 2 5 Audio Samples 430 2 5 Shifted Audio Samples 1 4
  • 51. Software Implementation A Computational Framework for Sound Segregation in Music Signals70 }  Marsyas }  Data structures D total number of peaks in texture window totalnumberofpeaks intexturewindow SIMILARITY MATRIX E total number of peaks in texture window totalnumberofpeaks intexturewindow 3 221 1 3 NCUT indicator SIMILARITY MATRIX F 3 -1-11 1 3 cluster selection indicator C1 f2 f5f4f1 f3 f6 peaks' frequency total number of peaks in texture window C2 a2 a5a4a1 a3 a6 peaks' amplitude total number of peaks in texture window C3 3 21 2 1 3 f2 f4f1 f3 f5 f6peaks' frequency XX aa XX a aX XX X X aa aa X aX Xf a f f fa f a f f ff f f f NumPeaks in frame peak spectralpattern total number of peaks in texture window Im(N/2-1) Re(N/2-1) Re(0) Re(N/2) Re(1) Im(1) Im(N/2-1) Re(N/2-1) ... ... ... ... ... ... ... m1 Pan(0) Pan(1) Pan(N/2) ... ... ... ... ... ... ... stereo spectrum (N/2+1points) texture window frames complexspectrum2 (Npoints) peaks FREQUENCY peaks AMPLITUDE peaks PHASE GROUP ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. .. .. .. .. .. frameMaxNumPeaks texture window frames analysis window (N samples) S 1 30 2 5 Audio Samples 430 2 5 Shifted Audio Samples 1 4
  • 52. Conclusions A Computational Framework for Sound Segregation in Music Signals71 }  Proposal of a framework for sound source segregation }  Inspired by ideas of CASA }  Focused on “real-world” music signals }  Designed to be causal and efficient }  Data-driven }  Does not require any training or prior knowledge about audio signals under analysis }  Approaches partial tracking and source separation jointly }  Flexible enough to include new perceptually motivated auditory cues }  Based on a Spectral Clustering technique }  Shows good potential for applications }  Source segregation/separation, }  Monophonic or polyphonic instrument classification, }  Main melody estimation }  Pre-processing for polyphonic transcription, ...
  • 53. Conclusions A Computational Framework for Sound Segregation in Music Signals72 }  Definition of a novel harmonicity cue }  Termed Harmonically Wrapped Peak Similarity (HWPS) }  Experimentally shown as: }  Good grouping criteria for sound segregation in polyphonic music signals. }  Compares favorably to other state-of-the-art harmonicity cues }  Software development of the sound segregation framework }  Used for validation and evaluation }  Made available as Free and Open Source Software (FOSS) }  Based on Marsyas }  Free for everyone to try, evaluate, modify and improve
  • 54. Future Work A Computational Framework for Sound Segregation in Music Signals73 }  Analysis front-end }  Evaluate alternative analysis frontends }  Perceptually-informed filterbanks }  Sinusoid+transient representations }  A different auditory front-end (as long as it is invertible).… }  Evaluate alternative frequency estimation methods for spectral peaks }  Parabolic interpolation }  Subspace methods }  … }  Use of a beat-synchronous approach }  Based on the use of onset detectors and beat estimators for dynamic adjustment of texture windows }  Perceptually motivated
  • 55. Future Work A Computational Framework for Sound Segregation in Music Signals74 }  Grouping Cues }  Improve HWPS }  Better f0 candidate estimation }  Reduce negative impact of sound events in different audio frames }  Inclusion of new perceptually motivated auditory cues }  Time and frequency masking }  Stereo placement of spectral components (for stereo signals) }  Timbre models as a priori information }  Peak tracking as a pre- and post-processing }  Common fate (onsets, offsets, modulation)
  • 56. Future Work A Computational Framework for Sound Segregation in Music Signals75 }  Implement Sequential integration }  between texture windows }  Cluster segregated clusters? }  Timbre similarity [Martins et al. 2007] Cluster 1 Cluster 2
  • 57. Future Work A Computational Framework for Sound Segregation in Music Signals76 }  Clustering }  Definition of the neighborhood width (σ) in similarity functions }  JNDs? }  Define and evaluate more expressive combinations of similarity functions }  Automatic estimation of the number of clusters in each texture window }  Extraction of new descriptors directly from segregated cluster parameters (e.g., CPR): }  Pitch, spectral features, frequency tracks, timing information
  • 58. Future Work A Computational Framework for Sound Segregation in Music Signals77 }  Creation of a sound/music evaluation dataset }  Simple and synthetic sound examples }  For preliminary testing, fine tuning, validation }  “real-world” polyphonic recordings }  More complex signals, for final stress-test evaluations }  To be made publicly available }  Software Framework }  Analysis an processing framework based on Marsyas }  FOSS, C++, multi-platform, real-time }  Feature rich software visualization and sonification tools
  • 59. Related Publications A Computational Framework for Sound Segregation in Music Signals78 }  PhD Thesis: }  Martins, L. G. (2009).A Computational Framework for Sound Segregation in Music Signals. PhD thesis, FEUP. }  Book: }  Martins, L. G. (2009).A Computational Framework for Sound Segregation in Music Signals – An Auditory Scene Analysis Approach for Modeling Perceptual Grouping in Music Listening. Lambert Academic Publishing. }  Book Chapter: }  Martins, L. G., Lagrange, M., and Tzanetakis, G. (2010). Modeling grouping cues for auditory scene analysis using a spectral clustering formulation. Machine Audition: Principles, Algorithms and Systems. IGI Global.
  • 60. Related Publications A Computational Framework for Sound Segregation in Music Signals79 }  Lagrange, M., Martins, L. G., Murdoch, J., and Tzanetakis, G. (2008). Normalized cuts for predominant melodic source separation. IEEETransactions on Audio, Speech, and Language Processing, 16(2). Special Issue on MIR. }  Martins, L. G., Burred, J. J.,Tzanetakis, G., and Lagrange, M. (2007). Polyphonic instrument recognition using spectral clustering. In Proc. International Conference on Music Information Retrieval (ISMIR),Vienna,Austria. }  Lagrange, M., Martins, L. G., and Tzanetakis, G. (2008).A computationally efficient scheme for dominant harmonic source separation. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), LasVegas, Nevada, USA. }  Tzanetakis, G., Martins, L. G.,Teixeira, L. F., Castillo, C., Jones, R., and Lagrange, M. (2008). Interoperability and the Marsyas 0.2 runtime. In Proc. International Computer Music Conference (ICMC), Belfast, Northern Ireland. }  Lagrange, M., Martins, L. G., and Tzanetakis, G. (2007). Semi-automatic mono to stereo up-mixing using sound source formation. In Proc. 112th Convention of the Audio Engineering Society,Vienna,Austria.
  • 61. Thank you A Computational Framework for Sound Segregation in Music Signals80 Questions? lmartins@porto.ucp.pt http://www.artes.ucp.pt/citar/