SlideShare une entreprise Scribd logo
1  sur  66
Télécharger pour lire hors ligne
Computational Approaches to Melodic
Analysis of Indian Art Music
Indian Institute of Sciences, Bengaluru, India 2016
Sankalp Gulati
Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain
Tonic
Melody
Intonation
Raga
Motifs
Similarity
Melodic
description
Tonic Identification
Tonic Identification
time (s)
Frequency(Hz)
0 1 2 3 4 5 6 7 8
0
1000
2000
3000
4000
5000
100 150 200 250 300
0
0.2
0.4
0.6
0.8
1
Frequency (bins), 1bin=10 cents, Ref=55 Hz
Normalizedsalience
f2
f3
f4
f
5f6
Tonic
Signal processing Learning
q  Tanpura / drone background sound
q  Extent of gamakas on Sa and Pa svara
q  Vadi, sam-vadi svara of the rāga
S. Gulati, A. Bellur, J. Salamon, H. Ranjani, V. Ishwar, H.A. Murthy, and X. Serra. Automatic tonic identification in Indian art music: approaches
and evaluation. Journal of New Music Research, 43(01):55–73, 2014.
Salamon, J., Gulati, S., & Serra, X. (2012). A multipitch approach to tonic identification in Indian classical music. In Proc. of Int. Conf. on Music
Information Retrieval (ISMIR) (pp. 499–504), Porto, Portugal.
Bellur, A., Ishwar, V., Serra, X., & Murthy, H. (2012). A knowledge based signal processing approach to tonic identification in Indian classical music. In 2nd
CompMusic Workshop (pp. 113–118) Istanbul, Turkey.
Ranjani, H. G., Arthi, S., & Sreenivas, T. V. (2011). Carnatic music analysis: Shadja, swara identification and raga verification in Alapana using stochastic
models. Applications of Signal Processing to Audio and Acoustics (WASPAA), IEEE Workshop , 29–32, New Paltz, NY.
Accuracy : ~90% !!!
Tonic Identification: Multipitch Approach
q  Audio example:
q  Utilizing drone sound
q  Multi-pitch analysis
Vocals	
Drone	
J. Salamon, E. G´omez, and J. Bonada. Sinusoid extraction and salience function design for predominant melody
estimation. In Proc. 14th Int. Conf. on Digital Audio Effects (DAFX-11), pages 73–80, Paris, France, Sep. 2011.
Tonic Identification: Block Diagram
STFT	
Spectral	Peak	
Picking	
Frequency/	Amplitude	
correc<on	
Salience	peak	
picking	
Mul<-pitch	
histogram	
Histogram	peak	
picking	
Bin	salience	mapping	
Harmonic	summa<on	
Audio	
Sinusoids	
Time	frequency	salience	
Sinusoid	Extrac<on	
Tonic	candidates	
Salience	func<on	
computa<on	
Tonic	candidate	
genera<on
Tonic Identification: Signal Processing
q  STFT
§  Hop size: 11 ms
§  Window length: 46 ms
§  Window type: hamming
§  FFT = 8192 points
STFT
Tonic Identification: Signal Processing
q  Spectral peak picking
§  Absolute threshold: -60 dB
Spectral	Peak	
Picking
Tonic Identification: Signal Processing
q  Frequency/Amplitude
correction
§  Parabolic interpolation
Frequency/	Amplitude	
correc<on
Tonic Identification: Signal Processing
q  Harmonic summation
§  Spectrum considered: 55-7200 Hz
§  Frequency range: 55-1760 Hz
§  Base frequency: 55 Hz
§  Bin resolution: 10 cents per bin (120
per octave)
§  N octaves: 5
§  Maximum harmonics: 20
§  Square cosine window across 50 cents
Bin	salience	mapping	
Harmonic	summa<on
Tonic Identification: Signal Processing
q  Tonic candidate generation
§  Number of salience peaks per
frame: 5
§  Frequency range: 110-550 Hz
Mul<-pitch	
histogram
Tonic Identification: Feature Exraction
q  Identifying tonic in correct octave using multi-pitch
histogram
q  Classification based template learning
q  Class of an instance is the rank of the tonic
100 150 200 250 300 350 400
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Frequency bins (1 bin = 10 cents), Ref: 55Hz
Normalizedsalience
Multipitch Histogram
f2	
f3	
f4	
f5
q  Decision Tree:
f2	
f3	
f2	
f3	
f5	
1st	
1st	2nd	
3rd	
4th	 5th	
>5	<=5	
>-7	<=-7	
>-11	<=-11	
>5	<=5	 >-6	<=-6	
Sa	
Sa	
Pa	
salience	
Frequency	
Sa	
Sa	
Pa	
salience	
Frequency	
Tonic Identification: Classification
Tonic Identification: Results
S. Gulati, A. Bellur, J. Salamon, H. Ranjani, V. Ishwar, H.A. Murthy, and X. Serra. Automatic tonic
identification in Indian art music: approaches and evaluation. Journal of New Music Research, 43(01):
55–73, 2014.
Predominant Pitch Estimation
Pitch Estimation Algorithms
q  Time-domain approaches
§  ACF-based (Rabiner 1977)
§  AMDF-based (YIN) Cheveigné et al.
q  Frequency-domain approaches
§  Two-way mismatch (Maher and
Beauchamp 1994)
§  Subharmonic summation (Hermes 1988)
Rabiner, L. (1977, February). On the use of autocorrelation analysis for pitch detection. IEEE Transactions on Acoustics, Speech, and Signal
Processing, 25(1), 24–33
De Cheveigné, A., and Kawahara, H., "YIN, a fundamental frequency estimator for speech and music." The Journal of the Acoustical Society
of America 111, no. 4 (2002): 1917-1930.
§  Multi-pitch approaches
§  Source separation-based (Klapuri, 2003)
§  Harmonic summation (Melodia) (Salamon and
Gómez, 2012)
Medan, Y., & Yair, E. (1991). Super resolution pitch determination of speech signals. IEEE transactions on signal processing, 39(1), 40–48.
Maher, R., & Beauchamp, J. W. (1994). Fundamental frequency estimation of musical signals using a two-way mismatch procedure. The
Journal of the Acoustical Society of , 95 (April), 2254–2263.
Hermes, D. (1988, 1988). Measurement of pitch by subharmonic summation. Journal of the Acoustical Society of America, 83, 257 - 264.
Klapuri, A. (2003b, November). Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE
Transactions on Speech and Audio Processing, 11(6), 804–816.
Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE
Transactions on Audio, Speech, and Language Processing, 20(6), 1759–1770.
Pitch Estimation Algorithms
q  Time-domain approaches
§  ACF-based (Rabiner 1977)
§  AMDF-based (YIN) Cheveigné et al.
q  Frequency-domain approaches
§  Two-way mismatch (Maher and
Beauchamp 1994)
§  Subharmonic summation (Hermes 1988)
Rabiner, L. (1977, February). On the use of autocorrelation analysis for pitch detection. IEEE Transactions on Acoustics, Speech, and Signal
Processing, 25(1), 24–33
De Cheveigné, A., and Kawahara, H., "YIN, a fundamental frequency estimator for speech and music." The Journal of the Acoustical
Society of America 111, no. 4 (2002): 1917-1930.
§  Multi-pitch approaches
§  Source separation-based (Klapuri, 2003)
§  Harmonic summation (Melodia) (Salamon
and Gómez, 2012)
Medan, Y., & Yair, E. (1991). Super resolution pitch determination of speech signals. IEEE transactions on signal processing, 39(1), 40–48.
Maher, R., & Beauchamp, J. W. (1994). Fundamental frequency estimation of musical signals using a two-way mismatch procedure. The
Journal of the Acoustical Society of , 95 (April), 2254–2263.
Hermes, D. (1988, 1988). Measurement of pitch by subharmonic summation. Journal of the Acoustical Society of America, 83, 257 - 264.
Klapuri, A. (2003b, November). Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE
Transactions on Speech and Audio Processing, 11(6), 804–816.
Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics.
IEEE Transactions on Audio, Speech, and Language Processing, 20(6), 1759–1770.
Predominant Pitch Estimation: YIN
Signal
Difference function
Auto-correlation
Cumulative difference
function
rt͑␶͒ϭ ͚jϭtϩ1
tϩW
xjxjϩ␶, ͑1͒
where rt(␶) is the autocorrelation function of lag ␶ calculated
at time index t, and W is the integration window size. This
function is illustrated in Fig. 1͑b͒ for the signal plotted in
Fig. 1͑a͒. It is common in signal processing to use a slightly
different definition:
rtЈ͑␶͒ϭ ͚jϭtϩ1
tϩWϪ␶
xjxjϩ␶. ͑2͒
Here the integration window size shrinks with increasing
values of ␶, with the result that the envelope of the function
decreases as a function of lag as illustrated in Fig. 1͑c͒. The
FIG. 1. ͑a͒ Example of a speech waveform. ͑b͒ Autocorrelation function
͑ACF͒ calculated from the waveform in ͑a͒ according to Eq. ͑1͒. ͑c͒ Same,
calculated according to Eq. ͑2͒. The envelope of this function is tapered to
zero because of the smaller number of terms in the summation at larger ␶.
FIG. 2. F0 estimation error rates as a function of the slope of the envelope
of the ACF, quantified by its intercept with the abscissa. The dotted line
represents errors for which the F0 estimate was too high, the dashed line
those for which it was too low, and the full line their sum. Triangles at the
right represent error rates for ACF calculated as in Eq. ͑1͒ (␶maxϭϱ). These
rates were measured over a subset of the database used in Sec. III.
Lag	(samples)	
The present article introduces a method for F0 estima-
tion that produces fewer errors than other well-known meth-
ods. The name YIN ͑from ‘‘yin’’ and ‘‘yang’’ of oriental
philosophy͒ alludes to the interplay between autocorrelation
and cancellation that it involves. This article is the first of a
rt͑␶͒ϭ
where rt(␶
at time ind
function is
Fig. 1͑a͒. I
different d
rtЈ͑␶͒ϭ
Here the
values of ␶
decreases
two definit
side ͓tϩ1,
this articl
‘‘modified
correlation
In resp
multiples
FIG. 1. ͑a͒ Example of a speech waveform. ͑b͒ Autocorrelation function
͑ACF͒ calculated from the waveform in ͑a͒ according to Eq. ͑1͒. ͑c͒ Same,
calculated according to Eq. ͑2͒. The envelope of this function is tapered to
zero because of the smaller number of terms in the summation at larger ␶.
The horizontal arrows symbolize the search range for the period.
FIG. 2. F0 e
of the ACF,
represents er
those for wh
right represen
rates were m
␶max . The parameter ␶max allows the algorithm to be biased
to favor one form of error at the expense of the other, with a
minimum of total error for intermediate values. Using Eq. ͑2͒
rather than Eq. ͑1͒ introduces a natural bias that can be tuned
by adjusting W. However, changing the window size has
other effects, and one can argue that a bias of this sort, if
useful, should be applied explicitly rather than implicitly.
This is one reason to prefer the definition of Eq. ͑1͒.
The autocorrelation method compares the signal to its
shifted self. In that sense it is related to the AMDF method
͑average magnitude difference function, Ross et al., 1974;
Ney, 1982͒ that performs its comparison using differences
rather than products, and more generally to time-domain
methods that measure intervals between events in time
͑Hess, 1983͒. The ACF is the Fourier transform of the power
spectrum, and can be seen as measuring the regular spacing
of harmonics within that spectrum. The cepstrum method
͑Noll, 1967͒ replaces the power spectrum by the log magni-
tude spectrum and thus puts less weight on high-amplitude
parts of the spectrum ͑particularly near the first formant that
often dominates the ACF͒. Similar ‘‘spectral whitening’’ ef-
fects can be obtained by linear predictive inverse filtering or
center-clipping ͑Rabiner and Schafer, 1978͒, or by splitting
the signal over a bank of filters, calculating ACFs within
each channel, and adding the results after amplitude normal-
ization ͑de Cheveigne´, 1991͒. Auditory models based on au-
tocorrelation are currently one of the more popular ways to
The same is true after taking the square and averaging over a
window:
͚jϭtϩ1
tϩW
͑xjϪxjϩT͒2
ϭ0. ͑5͒
Conversely, an unknown period may be found by forming
the difference function:
dt͑␶͒ϭ ͚jϭ1
W
͑xjϪxjϩ␶͒2
, ͑6͒
and searching for the values of ␶ for which the function is
zero. There is an infinite set of such values, all multiples of
the period. The difference function calculated from the signal
in Fig. 1͑a͒ is illustrated in Fig. 3͑a͒. The squared sum may
FIG. 3. ͑a͒ Difference function calculated for the speech signal of Fig. 1͑a͒.
͑b͒ Cumulative mean normalized difference function. Note that the function
starts at 1 rather than 0 and remains high until the dip at the period.
size was 25 ms, window shift was one sample, search range was 40 to 800
Hz, and threshold ͑step 4͒ was 0.1.
Version Gross error ͑%͒
Step 1 10.0
Step 2 1.95
Step 3 1.69
Step 4 0.78
Step 5 0.77
Step 6 0.50
Lag	(samples)	
ed
a
od
re
ow
00
sed
h a
͑2͒
ned
has
if
tly.
its
hod
74;
ces
ain
The same is true after taking the square and averaging over a
FIG. 3. ͑a͒ Difference function calculated for the speech signal of Fig. 1͑a͒.
͑b͒ Cumulative mean normalized difference function. Note that the function
starts at 1 rather than 0 and remains high until the dip at the period.
hod
were
dow
800
Lag	(samples)	
␶max . The parameter ␶max allows the algorithm to be biased
to favor one form of error at the expense of the other, with a
minimum of total error for intermediate values. Using Eq. ͑2͒
rather than Eq. ͑1͒ introduces a natural bias that can be tuned
by adjusting W. However, changing the window size has
other effects, and one can argue that a bias of this sort, if
useful, should be applied explicitly rather than implicitly.
This is one reason to prefer the definition of Eq. ͑1͒.
The autocorrelation method compares the signal to its
shifted self. In that sense it is related to the AMDF method
͑average magnitude difference function, Ross et al., 1974;
Ney, 1982͒ that performs its comparison using differences
rather than products, and more generally to time-domain
The same is true after taking the square and averaging over a
window:
FIG. 3. ͑a͒ Difference function calculated for the speech signal of Fig. 1͑a͒.
͑b͒ Cumulative mean normalized difference function. Note that the function
starts at 1 rather than 0 and remains high until the dip at the period.
TABLE I. Gross error rates for the simple unbiased autocorrelation method
͑step 1͒, and for the cumulated steps described in the text. These rates were
measured over a subset of the database used in Sec. III. Integration window
size was 25 ms, window shift was one sample, search range was 40 to 800
Hz, and threshold ͑step 4͒ was 0.1.
Version Gross error ͑%͒
Step 1 10.0
Step 2 1.95
Step 3 1.69
Step 4 0.78
Step 5 0.77
Step 6 0.50
Lag	(samples)	
De Cheveigné, A., and Kawahara, H., "YIN, a fundamental frequency estimator for speech and music." The Journal of the
Acoustical Society of America 111, no. 4 (2002): 1917-1930.
Predominant Pitch Estimation: YIN
Predominant Pitch Estimation: Melodia
Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE
Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
Predominant Pitch Estimation: Melodia
Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE
Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
Predominant Pitch Estimation: Melodia
audio
Spectrogram
Spectral peaks
Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE
Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
Predominant Pitch Estimation: Melodia
Spectral peaks
Time-frequency
salience
Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE
Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
Predominant Pitch Estimation: Melodia
Time-frequency
salience
Salience peaks
Contours
Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE
Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
Predominant Pitch Estimation: Melodia
Contours
Predominant
melody contours
Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE
Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
Essentia implementation of Melodia
Essentia implementation of Melodia
Essentia implementation of Melodia
Essentia implementation of Melodia
Essentia implementation of Melodia
Essentia implementation of Melodia
Essentia implementation of Melodia
Essentia implementation of Melodia
Essentia implementation of Melodia
Audio
Spectrogram
Essentia implementation of Melodia
Essentia implementation of Melodia
Spectral peaks
Spectrogram
Essentia implementation of Melodia
Essentia implementation of Melodia
Time-frequency
salience
Spectral peaks
Essentia implementation of Melodia
Essentia implementation of Melodia
Salience peaks
Time-frequency
salience
Essentia implementation of Melodia
Essentia implementation of Melodia
All contours
Salience peaks
Essentia implementation of Melodia
Essentia implementation of Melodia
Predominant
melody contours
All contours
Essentia implementation of Melodia
Essentia implementation of Melodia
Essentia implementation of Melodia
Essentia implementation of Melodia
Predominant Pitch Estimation: Melodia
What about loudness and timbre?
What about loudness and timbre?
Loudness features in Essentia
Loudness of predominant voice
Loudness of predominant voiceFrequency	
Time
Loudness of predominant voiceFrequency	
Time
Loudness of predominant voiceFrequency	
Time	
F0
Loudness of predominant voiceFrequency	
Time	
F0
Loudness of predominant voiceFrequency	
Time	
F0
Loudness of predominant voiceFrequency	
Time	
F0
Loudness of predominant voice: example
Spectral centroid of predominant voice
CompMusic: Dunya
CompMusic: Dunya
API	 Internet
CompMusic: Dunya Web
CompMusic: Dunya API
hTps://github.com/MTG/pycompmusic
Dunya API Examples
q  Metadata
q  Features

Contenu connexe

Similaire à [Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music

An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...csandit
 
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...CSCJournals
 
Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...
Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...
Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...Cemal Ardil
 
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...IRJET Journal
 
129966863723746268[1]
129966863723746268[1]129966863723746268[1]
129966863723746268[1]威華 王
 
Voice morphing document
Voice morphing documentVoice morphing document
Voice morphing documenthimadrigupta
 
Graphical visualization of musical emotions
Graphical visualization of musical emotionsGraphical visualization of musical emotions
Graphical visualization of musical emotionsPranay Prasoon
 
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNALCORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNALijcseit
 
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNALCORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNALijcseit
 
Robot navigation in unknown environment with obstacle recognition using laser...
Robot navigation in unknown environment with obstacle recognition using laser...Robot navigation in unknown environment with obstacle recognition using laser...
Robot navigation in unknown environment with obstacle recognition using laser...IJECEIAES
 
Handling Ihnarmonic Series with Median-Adjustive Trajectories
Handling Ihnarmonic Series with Median-Adjustive TrajectoriesHandling Ihnarmonic Series with Median-Adjustive Trajectories
Handling Ihnarmonic Series with Median-Adjustive TrajectoriesMatthieu Hodgkinson
 
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...IJERA Editor
 
Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...
Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...
Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...Venkata Sudhir Vedurla
 
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNALCORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNALijcseit
 
Modified synthesis strategy for vowels and semi vowels klatt synthesize
Modified synthesis strategy for vowels and semi vowels klatt synthesizeModified synthesis strategy for vowels and semi vowels klatt synthesize
Modified synthesis strategy for vowels and semi vowels klatt synthesizeIAEME Publication
 
Broad Phoneme Classification Using Signal Based Features
Broad Phoneme Classification Using Signal Based Features  Broad Phoneme Classification Using Signal Based Features
Broad Phoneme Classification Using Signal Based Features ijsc
 
Broad phoneme classification using signal based features
Broad phoneme classification using signal based featuresBroad phoneme classification using signal based features
Broad phoneme classification using signal based featuresijsc
 

Similaire à [Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music (20)

An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...
 
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
 
Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...
Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...
Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...
 
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
 
129966863723746268[1]
129966863723746268[1]129966863723746268[1]
129966863723746268[1]
 
Voice morphing document
Voice morphing documentVoice morphing document
Voice morphing document
 
cr1503
cr1503cr1503
cr1503
 
Graphical visualization of musical emotions
Graphical visualization of musical emotionsGraphical visualization of musical emotions
Graphical visualization of musical emotions
 
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNALCORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
 
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNALCORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
 
Robot navigation in unknown environment with obstacle recognition using laser...
Robot navigation in unknown environment with obstacle recognition using laser...Robot navigation in unknown environment with obstacle recognition using laser...
Robot navigation in unknown environment with obstacle recognition using laser...
 
Handling Ihnarmonic Series with Median-Adjustive Trajectories
Handling Ihnarmonic Series with Median-Adjustive TrajectoriesHandling Ihnarmonic Series with Median-Adjustive Trajectories
Handling Ihnarmonic Series with Median-Adjustive Trajectories
 
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
 
Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...
Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...
Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...
 
F010334548
F010334548F010334548
F010334548
 
Ijeer journal
Ijeer journalIjeer journal
Ijeer journal
 
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNALCORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
 
Modified synthesis strategy for vowels and semi vowels klatt synthesize
Modified synthesis strategy for vowels and semi vowels klatt synthesizeModified synthesis strategy for vowels and semi vowels klatt synthesize
Modified synthesis strategy for vowels and semi vowels klatt synthesize
 
Broad Phoneme Classification Using Signal Based Features
Broad Phoneme Classification Using Signal Based Features  Broad Phoneme Classification Using Signal Based Features
Broad Phoneme Classification Using Signal Based Features
 
Broad phoneme classification using signal based features
Broad phoneme classification using signal based featuresBroad phoneme classification using signal based features
Broad phoneme classification using signal based features
 

Plus de Sankalp Gulati

Mining Melodic Patterns in Large Audio Collections of Indian Art Music
	Mining Melodic Patterns in Large Audio Collections of Indian Art Music	Mining Melodic Patterns in Large Audio Collections of Indian Art Music
Mining Melodic Patterns in Large Audio Collections of Indian Art MusicSankalp Gulati
 
Landmark Detection in Hindustani Music Melodies
Landmark Detection in Hindustani Music MelodiesLandmark Detection in Hindustani Music Melodies
Landmark Detection in Hindustani Music MelodiesSankalp Gulati
 
Phrase-based Rāga Recognition Using Vector Space Modeling
Phrase-based Rāga Recognition Using Vector Space ModelingPhrase-based Rāga Recognition Using Vector Space Modeling
Phrase-based Rāga Recognition Using Vector Space ModelingSankalp Gulati
 
Computational Approaches to Melodic Analysis of Indian Art Music
Computational Approaches to Melodic Analysis of Indian Art MusicComputational Approaches to Melodic Analysis of Indian Art Music
Computational Approaches to Melodic Analysis of Indian Art MusicSankalp Gulati
 
Computational Melodic Analysis of Indian Art Music
Computational Melodic Analysis of Indian Art MusicComputational Melodic Analysis of Indian Art Music
Computational Melodic Analysis of Indian Art MusicSankalp Gulati
 
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music CorporaComputational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music CorporaSankalp Gulati
 
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...Sankalp Gulati
 
Tonic Identification System for Hindustani and Carnatic Music
Tonic Identification System for Hindustani and Carnatic MusicTonic Identification System for Hindustani and Carnatic Music
Tonic Identification System for Hindustani and Carnatic MusicSankalp Gulati
 
Tonic Identification System for Indian Art Music
Tonic Identification System for Indian Art MusicTonic Identification System for Indian Art Music
Tonic Identification System for Indian Art MusicSankalp Gulati
 

Plus de Sankalp Gulati (10)

Mining Melodic Patterns in Large Audio Collections of Indian Art Music
	Mining Melodic Patterns in Large Audio Collections of Indian Art Music	Mining Melodic Patterns in Large Audio Collections of Indian Art Music
Mining Melodic Patterns in Large Audio Collections of Indian Art Music
 
Landmark Detection in Hindustani Music Melodies
Landmark Detection in Hindustani Music MelodiesLandmark Detection in Hindustani Music Melodies
Landmark Detection in Hindustani Music Melodies
 
Phrase-based Rāga Recognition Using Vector Space Modeling
Phrase-based Rāga Recognition Using Vector Space ModelingPhrase-based Rāga Recognition Using Vector Space Modeling
Phrase-based Rāga Recognition Using Vector Space Modeling
 
Computational Approaches to Melodic Analysis of Indian Art Music
Computational Approaches to Melodic Analysis of Indian Art MusicComputational Approaches to Melodic Analysis of Indian Art Music
Computational Approaches to Melodic Analysis of Indian Art Music
 
Computational Melodic Analysis of Indian Art Music
Computational Melodic Analysis of Indian Art MusicComputational Melodic Analysis of Indian Art Music
Computational Melodic Analysis of Indian Art Music
 
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music CorporaComputational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
 
Hindify
HindifyHindify
Hindify
 
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
 
Tonic Identification System for Hindustani and Carnatic Music
Tonic Identification System for Hindustani and Carnatic MusicTonic Identification System for Hindustani and Carnatic Music
Tonic Identification System for Hindustani and Carnatic Music
 
Tonic Identification System for Indian Art Music
Tonic Identification System for Indian Art MusicTonic Identification System for Indian Art Music
Tonic Identification System for Indian Art Music
 

Dernier

Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Philosophy of china and it's charactistics
Philosophy of china and it's charactisticsPhilosophy of china and it's charactistics
Philosophy of china and it's charactisticshameyhk98
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptxJoelynRubio1
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Basic Intentional Injuries Health Education
Basic Intentional Injuries Health EducationBasic Intentional Injuries Health Education
Basic Intentional Injuries Health EducationNeilDeclaro1
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
latest AZ-104 Exam Questions and Answers
latest AZ-104 Exam Questions and Answerslatest AZ-104 Exam Questions and Answers
latest AZ-104 Exam Questions and Answersdalebeck957
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 

Dernier (20)

Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Philosophy of china and it's charactistics
Philosophy of china and it's charactisticsPhilosophy of china and it's charactistics
Philosophy of china and it's charactistics
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Basic Intentional Injuries Health Education
Basic Intentional Injuries Health EducationBasic Intentional Injuries Health Education
Basic Intentional Injuries Health Education
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
latest AZ-104 Exam Questions and Answers
latest AZ-104 Exam Questions and Answerslatest AZ-104 Exam Questions and Answers
latest AZ-104 Exam Questions and Answers
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 

[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music

  • 1. Computational Approaches to Melodic Analysis of Indian Art Music Indian Institute of Sciences, Bengaluru, India 2016 Sankalp Gulati Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain
  • 4. Tonic Identification time (s) Frequency(Hz) 0 1 2 3 4 5 6 7 8 0 1000 2000 3000 4000 5000 100 150 200 250 300 0 0.2 0.4 0.6 0.8 1 Frequency (bins), 1bin=10 cents, Ref=55 Hz Normalizedsalience f2 f3 f4 f 5f6 Tonic Signal processing Learning q  Tanpura / drone background sound q  Extent of gamakas on Sa and Pa svara q  Vadi, sam-vadi svara of the rāga S. Gulati, A. Bellur, J. Salamon, H. Ranjani, V. Ishwar, H.A. Murthy, and X. Serra. Automatic tonic identification in Indian art music: approaches and evaluation. Journal of New Music Research, 43(01):55–73, 2014. Salamon, J., Gulati, S., & Serra, X. (2012). A multipitch approach to tonic identification in Indian classical music. In Proc. of Int. Conf. on Music Information Retrieval (ISMIR) (pp. 499–504), Porto, Portugal. Bellur, A., Ishwar, V., Serra, X., & Murthy, H. (2012). A knowledge based signal processing approach to tonic identification in Indian classical music. In 2nd CompMusic Workshop (pp. 113–118) Istanbul, Turkey. Ranjani, H. G., Arthi, S., & Sreenivas, T. V. (2011). Carnatic music analysis: Shadja, swara identification and raga verification in Alapana using stochastic models. Applications of Signal Processing to Audio and Acoustics (WASPAA), IEEE Workshop , 29–32, New Paltz, NY. Accuracy : ~90% !!!
  • 5. Tonic Identification: Multipitch Approach q  Audio example: q  Utilizing drone sound q  Multi-pitch analysis Vocals Drone J. Salamon, E. G´omez, and J. Bonada. Sinusoid extraction and salience function design for predominant melody estimation. In Proc. 14th Int. Conf. on Digital Audio Effects (DAFX-11), pages 73–80, Paris, France, Sep. 2011.
  • 6. Tonic Identification: Block Diagram STFT Spectral Peak Picking Frequency/ Amplitude correc<on Salience peak picking Mul<-pitch histogram Histogram peak picking Bin salience mapping Harmonic summa<on Audio Sinusoids Time frequency salience Sinusoid Extrac<on Tonic candidates Salience func<on computa<on Tonic candidate genera<on
  • 7. Tonic Identification: Signal Processing q  STFT §  Hop size: 11 ms §  Window length: 46 ms §  Window type: hamming §  FFT = 8192 points STFT
  • 8. Tonic Identification: Signal Processing q  Spectral peak picking §  Absolute threshold: -60 dB Spectral Peak Picking
  • 9. Tonic Identification: Signal Processing q  Frequency/Amplitude correction §  Parabolic interpolation Frequency/ Amplitude correc<on
  • 10. Tonic Identification: Signal Processing q  Harmonic summation §  Spectrum considered: 55-7200 Hz §  Frequency range: 55-1760 Hz §  Base frequency: 55 Hz §  Bin resolution: 10 cents per bin (120 per octave) §  N octaves: 5 §  Maximum harmonics: 20 §  Square cosine window across 50 cents Bin salience mapping Harmonic summa<on
  • 11. Tonic Identification: Signal Processing q  Tonic candidate generation §  Number of salience peaks per frame: 5 §  Frequency range: 110-550 Hz Mul<-pitch histogram
  • 12. Tonic Identification: Feature Exraction q  Identifying tonic in correct octave using multi-pitch histogram q  Classification based template learning q  Class of an instance is the rank of the tonic 100 150 200 250 300 350 400 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Frequency bins (1 bin = 10 cents), Ref: 55Hz Normalizedsalience Multipitch Histogram f2 f3 f4 f5
  • 13. q  Decision Tree: f2 f3 f2 f3 f5 1st 1st 2nd 3rd 4th 5th >5 <=5 >-7 <=-7 >-11 <=-11 >5 <=5 >-6 <=-6 Sa Sa Pa salience Frequency Sa Sa Pa salience Frequency Tonic Identification: Classification
  • 14. Tonic Identification: Results S. Gulati, A. Bellur, J. Salamon, H. Ranjani, V. Ishwar, H.A. Murthy, and X. Serra. Automatic tonic identification in Indian art music: approaches and evaluation. Journal of New Music Research, 43(01): 55–73, 2014.
  • 16. Pitch Estimation Algorithms q  Time-domain approaches §  ACF-based (Rabiner 1977) §  AMDF-based (YIN) Cheveigné et al. q  Frequency-domain approaches §  Two-way mismatch (Maher and Beauchamp 1994) §  Subharmonic summation (Hermes 1988) Rabiner, L. (1977, February). On the use of autocorrelation analysis for pitch detection. IEEE Transactions on Acoustics, Speech, and Signal Processing, 25(1), 24–33 De Cheveigné, A., and Kawahara, H., "YIN, a fundamental frequency estimator for speech and music." The Journal of the Acoustical Society of America 111, no. 4 (2002): 1917-1930. §  Multi-pitch approaches §  Source separation-based (Klapuri, 2003) §  Harmonic summation (Melodia) (Salamon and Gómez, 2012) Medan, Y., & Yair, E. (1991). Super resolution pitch determination of speech signals. IEEE transactions on signal processing, 39(1), 40–48. Maher, R., & Beauchamp, J. W. (1994). Fundamental frequency estimation of musical signals using a two-way mismatch procedure. The Journal of the Acoustical Society of , 95 (April), 2254–2263. Hermes, D. (1988, 1988). Measurement of pitch by subharmonic summation. Journal of the Acoustical Society of America, 83, 257 - 264. Klapuri, A. (2003b, November). Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE Transactions on Speech and Audio Processing, 11(6), 804–816. Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20(6), 1759–1770.
  • 17. Pitch Estimation Algorithms q  Time-domain approaches §  ACF-based (Rabiner 1977) §  AMDF-based (YIN) Cheveigné et al. q  Frequency-domain approaches §  Two-way mismatch (Maher and Beauchamp 1994) §  Subharmonic summation (Hermes 1988) Rabiner, L. (1977, February). On the use of autocorrelation analysis for pitch detection. IEEE Transactions on Acoustics, Speech, and Signal Processing, 25(1), 24–33 De Cheveigné, A., and Kawahara, H., "YIN, a fundamental frequency estimator for speech and music." The Journal of the Acoustical Society of America 111, no. 4 (2002): 1917-1930. §  Multi-pitch approaches §  Source separation-based (Klapuri, 2003) §  Harmonic summation (Melodia) (Salamon and Gómez, 2012) Medan, Y., & Yair, E. (1991). Super resolution pitch determination of speech signals. IEEE transactions on signal processing, 39(1), 40–48. Maher, R., & Beauchamp, J. W. (1994). Fundamental frequency estimation of musical signals using a two-way mismatch procedure. The Journal of the Acoustical Society of , 95 (April), 2254–2263. Hermes, D. (1988, 1988). Measurement of pitch by subharmonic summation. Journal of the Acoustical Society of America, 83, 257 - 264. Klapuri, A. (2003b, November). Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE Transactions on Speech and Audio Processing, 11(6), 804–816. Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20(6), 1759–1770.
  • 18. Predominant Pitch Estimation: YIN Signal Difference function Auto-correlation Cumulative difference function rt͑␶͒ϭ ͚jϭtϩ1 tϩW xjxjϩ␶, ͑1͒ where rt(␶) is the autocorrelation function of lag ␶ calculated at time index t, and W is the integration window size. This function is illustrated in Fig. 1͑b͒ for the signal plotted in Fig. 1͑a͒. It is common in signal processing to use a slightly different definition: rtЈ͑␶͒ϭ ͚jϭtϩ1 tϩWϪ␶ xjxjϩ␶. ͑2͒ Here the integration window size shrinks with increasing values of ␶, with the result that the envelope of the function decreases as a function of lag as illustrated in Fig. 1͑c͒. The FIG. 1. ͑a͒ Example of a speech waveform. ͑b͒ Autocorrelation function ͑ACF͒ calculated from the waveform in ͑a͒ according to Eq. ͑1͒. ͑c͒ Same, calculated according to Eq. ͑2͒. The envelope of this function is tapered to zero because of the smaller number of terms in the summation at larger ␶. FIG. 2. F0 estimation error rates as a function of the slope of the envelope of the ACF, quantified by its intercept with the abscissa. The dotted line represents errors for which the F0 estimate was too high, the dashed line those for which it was too low, and the full line their sum. Triangles at the right represent error rates for ACF calculated as in Eq. ͑1͒ (␶maxϭϱ). These rates were measured over a subset of the database used in Sec. III. Lag (samples) The present article introduces a method for F0 estima- tion that produces fewer errors than other well-known meth- ods. The name YIN ͑from ‘‘yin’’ and ‘‘yang’’ of oriental philosophy͒ alludes to the interplay between autocorrelation and cancellation that it involves. This article is the first of a rt͑␶͒ϭ where rt(␶ at time ind function is Fig. 1͑a͒. I different d rtЈ͑␶͒ϭ Here the values of ␶ decreases two definit side ͓tϩ1, this articl ‘‘modified correlation In resp multiples FIG. 1. ͑a͒ Example of a speech waveform. ͑b͒ Autocorrelation function ͑ACF͒ calculated from the waveform in ͑a͒ according to Eq. ͑1͒. ͑c͒ Same, calculated according to Eq. ͑2͒. The envelope of this function is tapered to zero because of the smaller number of terms in the summation at larger ␶. The horizontal arrows symbolize the search range for the period. FIG. 2. F0 e of the ACF, represents er those for wh right represen rates were m ␶max . The parameter ␶max allows the algorithm to be biased to favor one form of error at the expense of the other, with a minimum of total error for intermediate values. Using Eq. ͑2͒ rather than Eq. ͑1͒ introduces a natural bias that can be tuned by adjusting W. However, changing the window size has other effects, and one can argue that a bias of this sort, if useful, should be applied explicitly rather than implicitly. This is one reason to prefer the definition of Eq. ͑1͒. The autocorrelation method compares the signal to its shifted self. In that sense it is related to the AMDF method ͑average magnitude difference function, Ross et al., 1974; Ney, 1982͒ that performs its comparison using differences rather than products, and more generally to time-domain methods that measure intervals between events in time ͑Hess, 1983͒. The ACF is the Fourier transform of the power spectrum, and can be seen as measuring the regular spacing of harmonics within that spectrum. The cepstrum method ͑Noll, 1967͒ replaces the power spectrum by the log magni- tude spectrum and thus puts less weight on high-amplitude parts of the spectrum ͑particularly near the first formant that often dominates the ACF͒. Similar ‘‘spectral whitening’’ ef- fects can be obtained by linear predictive inverse filtering or center-clipping ͑Rabiner and Schafer, 1978͒, or by splitting the signal over a bank of filters, calculating ACFs within each channel, and adding the results after amplitude normal- ization ͑de Cheveigne´, 1991͒. Auditory models based on au- tocorrelation are currently one of the more popular ways to The same is true after taking the square and averaging over a window: ͚jϭtϩ1 tϩW ͑xjϪxjϩT͒2 ϭ0. ͑5͒ Conversely, an unknown period may be found by forming the difference function: dt͑␶͒ϭ ͚jϭ1 W ͑xjϪxjϩ␶͒2 , ͑6͒ and searching for the values of ␶ for which the function is zero. There is an infinite set of such values, all multiples of the period. The difference function calculated from the signal in Fig. 1͑a͒ is illustrated in Fig. 3͑a͒. The squared sum may FIG. 3. ͑a͒ Difference function calculated for the speech signal of Fig. 1͑a͒. ͑b͒ Cumulative mean normalized difference function. Note that the function starts at 1 rather than 0 and remains high until the dip at the period. size was 25 ms, window shift was one sample, search range was 40 to 800 Hz, and threshold ͑step 4͒ was 0.1. Version Gross error ͑%͒ Step 1 10.0 Step 2 1.95 Step 3 1.69 Step 4 0.78 Step 5 0.77 Step 6 0.50 Lag (samples) ed a od re ow 00 sed h a ͑2͒ ned has if tly. its hod 74; ces ain The same is true after taking the square and averaging over a FIG. 3. ͑a͒ Difference function calculated for the speech signal of Fig. 1͑a͒. ͑b͒ Cumulative mean normalized difference function. Note that the function starts at 1 rather than 0 and remains high until the dip at the period. hod were dow 800 Lag (samples) ␶max . The parameter ␶max allows the algorithm to be biased to favor one form of error at the expense of the other, with a minimum of total error for intermediate values. Using Eq. ͑2͒ rather than Eq. ͑1͒ introduces a natural bias that can be tuned by adjusting W. However, changing the window size has other effects, and one can argue that a bias of this sort, if useful, should be applied explicitly rather than implicitly. This is one reason to prefer the definition of Eq. ͑1͒. The autocorrelation method compares the signal to its shifted self. In that sense it is related to the AMDF method ͑average magnitude difference function, Ross et al., 1974; Ney, 1982͒ that performs its comparison using differences rather than products, and more generally to time-domain The same is true after taking the square and averaging over a window: FIG. 3. ͑a͒ Difference function calculated for the speech signal of Fig. 1͑a͒. ͑b͒ Cumulative mean normalized difference function. Note that the function starts at 1 rather than 0 and remains high until the dip at the period. TABLE I. Gross error rates for the simple unbiased autocorrelation method ͑step 1͒, and for the cumulated steps described in the text. These rates were measured over a subset of the database used in Sec. III. Integration window size was 25 ms, window shift was one sample, search range was 40 to 800 Hz, and threshold ͑step 4͒ was 0.1. Version Gross error ͑%͒ Step 1 10.0 Step 2 1.95 Step 3 1.69 Step 4 0.78 Step 5 0.77 Step 6 0.50 Lag (samples) De Cheveigné, A., and Kawahara, H., "YIN, a fundamental frequency estimator for speech and music." The Journal of the Acoustical Society of America 111, no. 4 (2002): 1917-1930.
  • 20. Predominant Pitch Estimation: Melodia Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
  • 21. Predominant Pitch Estimation: Melodia Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
  • 22. Predominant Pitch Estimation: Melodia audio Spectrogram Spectral peaks Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
  • 23. Predominant Pitch Estimation: Melodia Spectral peaks Time-frequency salience Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
  • 24. Predominant Pitch Estimation: Melodia Time-frequency salience Salience peaks Contours Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
  • 25. Predominant Pitch Estimation: Melodia Contours Predominant melody contours Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
  • 34. Essentia implementation of Melodia Audio Spectrogram
  • 36. Essentia implementation of Melodia Spectral peaks Spectrogram
  • 38. Essentia implementation of Melodia Time-frequency salience Spectral peaks
  • 40. Essentia implementation of Melodia Salience peaks Time-frequency salience
  • 42. Essentia implementation of Melodia All contours Salience peaks
  • 44. Essentia implementation of Melodia Predominant melody contours All contours
  • 50. What about loudness and timbre?
  • 51. What about loudness and timbre?
  • 54. Loudness of predominant voiceFrequency Time
  • 55. Loudness of predominant voiceFrequency Time
  • 56. Loudness of predominant voiceFrequency Time F0
  • 57. Loudness of predominant voiceFrequency Time F0
  • 58. Loudness of predominant voiceFrequency Time F0
  • 59. Loudness of predominant voiceFrequency Time F0
  • 60. Loudness of predominant voice: example
  • 61. Spectral centroid of predominant voice
  • 66. Dunya API Examples q  Metadata q  Features