ISSCS2011

Environmental Sound Recognition with
CELP-based Features
EnShuo Tsau, Seung-Hwan Kim and C.-C. Jay Kuo
Dept. of Electrical Engineering
University of Southern California
Los Angeles, CA 90089-2564
http://viola.usc.edu/

Outline
 Environmental Sound Recognition (ESR) and Challenge
 Conventional Audio Features
 Motivation and Proposed Solution
 Experimental Results
 Conclusion and Future Work
2

Environmental Sound Recognition (ESR)
 Environmental Sound
• Restaurants, streets, parks, airport and train stations, hallway, etc
 Environmental Sound Recognition (ESR)
• Use audio information to assist activities,
• Easy storage and process
• Robotic navigation and human-computer interactions
• Lacking of lighting and angle of the camera problems
• Other applications: surveillance, search and rescue
 Challenges of ESR
• Similar sounds
• Multiple generating sources
• Noise 3
Unlike speech and music
Unstructured
Difficult to build model

Conventional Audio Features
 Conventional features
• MFCCs, MFCC derivatives, sub-band energy, fundamental frequency, LPCCs,
energy, zerocrossing, and spectral- centroid, bandwidth, matching pursuit (MP)
 Problems with conventional features
• MFCCs
• Describe the shape of the overall spectrum
• Only works well for structured sounds such as speech and music
• Performance degrades in the presence of noise
• MP
• Relatively works well for both structured sound and unstructured sound
• Require significant computational complexity
4

Motivation for CELP-based Features
Feature Set CELP MFCCs MP
Preserve Data Featuresdata
(reversible)
Easy
Implementation
ITU-T G.723.1
Low Complexity Real Time
Compact Feature
Classification Rate
ESR
Different
Applications
Speech Music
Potential Side
Benefits
Mobile applications
(5.3/6.3 kbps)
Fix point
 Comparison with MFCCs and MP
Bit streams Information Features
5

Code Excited Linear Prediction (CELP)
.
• M. R. Schroeder and B. S. Atal, "Code-excited linear prediction (CELP): high-quality speech at
very low bit rates," in Proceedings of the IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP), vol. 10, pp. 937–940, 1985
6
Analysis-by-Synthesis
Linear Prediction
Short Term
Prediction (STP):
Linear Prediction
Coefficients
Long Term
Prediction
(LTP): Pitch T
Residual
Description

Proposed CELP Features
 240 samples/frame; 4 subframes/frame;
 Available CELP features from bit streams
• LPC(Linear Prediction Coefficients) – 10 order
• or LSF(Line Spectral Frequencies)
• Pitch Lag
• Open loop
• Close loop 20≤ p ≤ 147
• GAIN of two excitation
• Pitch filter (5 tap)
• Fixed codebook pulse
• POS
• Location and sign of fixed codebook pulse
CELP
7

Proposed Solution
CELP: 11 dim
MFCC: 21 dim (full bank)
Classifier
(Bayesian Network)
Data Preprocessing
Normalization,
Cleaning
Feature
Extraction
Classification
8

Experimental Setup and Result
 10 classes:
• Transportation (3): airplane, motorcycle and train.
• Weather (4): rain, thunder, wind and stream
• Rural Areas (2): bird, insect.
• Indoor (1): restaurant.
 Feature Extraction
• Modifying standard code ITU-T G.723.1
 Classifier
• Bayesian Network
9

Comparison of Features
ClassificationAirplane Bird Insect Motor Rain Rest. Stream Thunder Train Wind Overall
PITCH 77.8 28.8 1.1 27.1 1.2 62.6 10.5 0 29.1 21.2 26.8
GAIN 66.3 8.5 44 18.5 32 8.3 8.3 2.4 15.9 11.5 22.2
LPC 85.4 96.3 99.6 89.8 99.1 63.7 98 77 74.1 98.5 88.5
CELP+GAIN 88.7 96.8 99.6 90.4 99 77.8 97.6 79.5 81.6 98.7 91
CELP+GAIN+
POS
92.6 99.5 98.7 73.7 96.3 55.9 96 30 61 93 81.3
MFCC 87.8 90 95.8 86.2 76.8 69.4 77 43.2 86.9 100 82.5
CELP 88.4 96.8 99.6 90.4 99 77.9 97.7 78.8 81.3 98.7 91.2
CELP+MFCC 92.3 97.7 99.5 95.5 99 87.5 98.7 85.4 93.4 99.9 95.2
10
0
10
20
30
40
50
60
70
80
90
100
Airplane Bird Insect Motor Rain Rest. Stream Thunder Train Wind Overall
ClassificationRate(%)
Comparison of Features
MFCC
CELP
CELP+MFCC
Short Term and Long
Term Prediction
Speech like

Confusion Matrix of CELP Features
Classification Rate Airplane Bird Insect Motor Rain Rest. Stream Thunder Train Wind
Airplane 88.4 – – – – 1.9 – 0.2 5.1 4.4
Bird – 96.8 – 0.1 – 1.6 0.3 0.2 1.1 –
Insect – – 99.6 – – 0.4 – – – –
Motor 0.1 – – 90.4 – 5.7 – 0.3 3.5 –
Rain – – – – 99 0.3 0.4 0.1 – –
Rest. 1 2.2 – 8.1 0.1 77.9 1.4 2.6 6.8 0.1
Stream – 0.2 – – 0.3 1 97.7 0.2 0.5 –
Thunder 1.9 0.6 0.1 3 0.3 7.5 3.8 78.8 3.4 0.7
Train 5.1 0.7 – 5 0.1 7.1 0.1 0.7 81.3 –
Wind – – – – – – – 1.3 – 98.7
11

Principal Component Analysis
12
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
ClassificationRate(%)
Number of Dimension
Principal Component Analysis
CELP
MFCC
MFCC+CELP

Speed and Complexity
 Speed
• Feature extraction:
• Real time
• Classification
• Training:
• Depends on different classifier/kernel
• Testing:
• Fast and neglect able
13
Avg Run Time Training(sec) Testing(sec)
CELP 659 8
MFCC 672 9
CELP+MFCC 912 10

14
Summary of ESR topic
 Conclusion
• A novel set of CELP-based features are proposed by exploring the CELP bit stream
information
• MFCCs representing bank energy not suitable for ESR
• CELP and CELP+MFCC performs better than MFCC by 10% margin (Bayesian
network classifier) in ESR problem
• Long and short term prediction
• more robust with respect to background noise
• CELP enjoys low complexity, easy implementation and extendible benefits
• Recognition based on CELP features is desirable since the additional effort required
by feature extraction is almost negligible

Conclusion
 A novel set of CELP-based features are proposed by exploring the
CELP bit stream information
 MFCCs representing bank energy not suitable for ESR
 CELP and CELP+MFCC performs better than MFCC by 10% margin
(Bayesian network classifier) in ESR problem
• Long and short term prediction
• more robust with respect to background noise
 CELP enjoys low complexity, easy implementation and extendible
benefits
 Recognition based on CELP features is desirable since the additional
effort required by feature extraction is almost negligible
15

Future Work
 Explore more features
 Speaker recognition and identification
 Longer term signature capture
16

ISSCS2011

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (7)

Similaire à ISSCS2011

Similaire à ISSCS2011 (20)

ISSCS2011