Universal approximators for Direct Policy Search in multi-purpose water reser...
ISSCS2011
1. Environmental Sound Recognition with
CELP-based Features
EnShuo Tsau, Seung-Hwan Kim and C.-C. Jay Kuo
Dept. of Electrical Engineering
University of Southern California
Los Angeles, CA 90089-2564
http://viola.usc.edu/
2. Outline
Environmental Sound Recognition (ESR) and Challenge
Conventional Audio Features
Motivation and Proposed Solution
Experimental Results
Conclusion and Future Work
2
3. Environmental Sound Recognition (ESR)
Environmental Sound
• Restaurants, streets, parks, airport and train stations, hallway, etc
Environmental Sound Recognition (ESR)
• Use audio information to assist activities,
• Easy storage and process
• Robotic navigation and human-computer interactions
• Lacking of lighting and angle of the camera problems
• Other applications: surveillance, search and rescue
Challenges of ESR
• Similar sounds
• Multiple generating sources
• Noise 3
Unlike speech and music
Unstructured
Difficult to build model
4. Conventional Audio Features
Conventional features
• MFCCs, MFCC derivatives, sub-band energy, fundamental frequency, LPCCs,
energy, zerocrossing, and spectral- centroid, bandwidth, matching pursuit (MP)
Problems with conventional features
• MFCCs
• Describe the shape of the overall spectrum
• Only works well for structured sounds such as speech and music
• Performance degrades in the presence of noise
• MP
• Relatively works well for both structured sound and unstructured sound
• Require significant computational complexity
4
5. Motivation for CELP-based Features
Feature Set CELP MFCCs MP
Preserve Data Featuresdata
(reversible)
Easy
Implementation
ITU-T G.723.1
Low Complexity Real Time
Compact Feature
Classification Rate
ESR
Different
Applications
Speech Music
Potential Side
Benefits
Mobile applications
(5.3/6.3 kbps)
Fix point
Comparison with MFCCs and MP
Bit streams Information Features
5
6. Code Excited Linear Prediction (CELP)
.
• M. R. Schroeder and B. S. Atal, "Code-excited linear prediction (CELP): high-quality speech at
very low bit rates," in Proceedings of the IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP), vol. 10, pp. 937–940, 1985
6
Analysis-by-Synthesis
Linear Prediction
Short Term
Prediction (STP):
Linear Prediction
Coefficients
Long Term
Prediction
(LTP): Pitch T
Residual
Description
7. Proposed CELP Features
240 samples/frame; 4 subframes/frame;
Available CELP features from bit streams
• LPC(Linear Prediction Coefficients) – 10 order
• or LSF(Line Spectral Frequencies)
• Pitch Lag
• Open loop
• Close loop 20≤ p ≤ 147
• GAIN of two excitation
• Pitch filter (5 tap)
• Fixed codebook pulse
• POS
• Location and sign of fixed codebook pulse
CELP
7
8. Proposed Solution
CELP: 11 dim
MFCC: 21 dim (full bank)
Classifier
(Bayesian Network)
Data Preprocessing
Normalization,
Cleaning
Feature
Extraction
Classification
8
9. Experimental Setup and Result
10 classes:
• Transportation (3): airplane, motorcycle and train.
• Weather (4): rain, thunder, wind and stream
• Rural Areas (2): bird, insect.
• Indoor (1): restaurant.
Feature Extraction
• Modifying standard code ITU-T G.723.1
Classifier
• Bayesian Network
9
13. Speed and Complexity
Speed
• Feature extraction:
• Real time
• Classification
• Training:
• Depends on different classifier/kernel
• Testing:
• Fast and neglect able
13
Avg Run Time Training(sec) Testing(sec)
CELP 659 8
MFCC 672 9
CELP+MFCC 912 10
14. 14
Summary of ESR topic
Conclusion
• A novel set of CELP-based features are proposed by exploring the CELP bit stream
information
• MFCCs representing bank energy not suitable for ESR
• CELP and CELP+MFCC performs better than MFCC by 10% margin (Bayesian
network classifier) in ESR problem
• Long and short term prediction
• more robust with respect to background noise
• CELP enjoys low complexity, easy implementation and extendible benefits
• Recognition based on CELP features is desirable since the additional effort required
by feature extraction is almost negligible
15. Conclusion
A novel set of CELP-based features are proposed by exploring the
CELP bit stream information
MFCCs representing bank energy not suitable for ESR
CELP and CELP+MFCC performs better than MFCC by 10% margin
(Bayesian network classifier) in ESR problem
• Long and short term prediction
• more robust with respect to background noise
CELP enjoys low complexity, easy implementation and extendible
benefits
Recognition based on CELP features is desirable since the additional
effort required by feature extraction is almost negligible
15
16. Future Work
Explore more features
Speaker recognition and identification
Longer term signature capture
16