SlideShare une entreprise Scribd logo
1  sur  17
Environmental Sound Recognition with
CELP-based Features
EnShuo Tsau, Seung-Hwan Kim and C.-C. Jay Kuo
Dept. of Electrical Engineering
University of Southern California
Los Angeles, CA 90089-2564
http://viola.usc.edu/
Outline
 Environmental Sound Recognition (ESR) and Challenge
 Conventional Audio Features
 Motivation and Proposed Solution
 Experimental Results
 Conclusion and Future Work
2
Environmental Sound Recognition (ESR)
 Environmental Sound
• Restaurants, streets, parks, airport and train stations, hallway, etc
 Environmental Sound Recognition (ESR)
• Use audio information to assist activities,
• Easy storage and process
• Robotic navigation and human-computer interactions
• Lacking of lighting and angle of the camera problems
• Other applications: surveillance, search and rescue
 Challenges of ESR
• Similar sounds
• Multiple generating sources
• Noise 3
Unlike speech and music
Unstructured
Difficult to build model
Conventional Audio Features
 Conventional features
• MFCCs, MFCC derivatives, sub-band energy, fundamental frequency, LPCCs,
energy, zerocrossing, and spectral- centroid, bandwidth, matching pursuit (MP)
 Problems with conventional features
• MFCCs
• Describe the shape of the overall spectrum
• Only works well for structured sounds such as speech and music
• Performance degrades in the presence of noise
• MP
• Relatively works well for both structured sound and unstructured sound
• Require significant computational complexity
4
Motivation for CELP-based Features
Feature Set CELP MFCCs MP
Preserve Data Featuresdata
(reversible)
Easy
Implementation
ITU-T G.723.1
Low Complexity Real Time
Compact Feature
Classification Rate
ESR
Different
Applications
Speech Music
Potential Side
Benefits
Mobile applications
(5.3/6.3 kbps)
Fix point
 Comparison with MFCCs and MP
Bit streams Information Features
5
Code Excited Linear Prediction (CELP)
.
• M. R. Schroeder and B. S. Atal, "Code-excited linear prediction (CELP): high-quality speech at
very low bit rates," in Proceedings of the IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP), vol. 10, pp. 937–940, 1985
6
Analysis-by-Synthesis
Linear Prediction
Short Term
Prediction (STP):
Linear Prediction
Coefficients
Long Term
Prediction
(LTP): Pitch T
Residual
Description
Proposed CELP Features
 240 samples/frame; 4 subframes/frame;
 Available CELP features from bit streams
• LPC(Linear Prediction Coefficients) – 10 order
• or LSF(Line Spectral Frequencies)
• Pitch Lag
• Open loop
• Close loop 20≤ p ≤ 147
• GAIN of two excitation
• Pitch filter (5 tap)
• Fixed codebook pulse
• POS
• Location and sign of fixed codebook pulse
CELP
7
Proposed Solution
CELP: 11 dim
MFCC: 21 dim (full bank)
Classifier
(Bayesian Network)
Data Preprocessing
Normalization,
Cleaning
Feature
Extraction
Classification
8
Experimental Setup and Result
 10 classes:
• Transportation (3): airplane, motorcycle and train.
• Weather (4): rain, thunder, wind and stream
• Rural Areas (2): bird, insect.
• Indoor (1): restaurant.
 Feature Extraction
• Modifying standard code ITU-T G.723.1
 Classifier
• Bayesian Network
9
Comparison of Features
ClassificationAirplane Bird Insect Motor Rain Rest. Stream Thunder Train Wind Overall
PITCH 77.8 28.8 1.1 27.1 1.2 62.6 10.5 0 29.1 21.2 26.8
GAIN 66.3 8.5 44 18.5 32 8.3 8.3 2.4 15.9 11.5 22.2
LPC 85.4 96.3 99.6 89.8 99.1 63.7 98 77 74.1 98.5 88.5
CELP+GAIN 88.7 96.8 99.6 90.4 99 77.8 97.6 79.5 81.6 98.7 91
CELP+GAIN+
POS
92.6 99.5 98.7 73.7 96.3 55.9 96 30 61 93 81.3
MFCC 87.8 90 95.8 86.2 76.8 69.4 77 43.2 86.9 100 82.5
CELP 88.4 96.8 99.6 90.4 99 77.9 97.7 78.8 81.3 98.7 91.2
CELP+MFCC 92.3 97.7 99.5 95.5 99 87.5 98.7 85.4 93.4 99.9 95.2
10
0
10
20
30
40
50
60
70
80
90
100
Airplane Bird Insect Motor Rain Rest. Stream Thunder Train Wind Overall
ClassificationRate(%)
Comparison of Features
MFCC
CELP
CELP+MFCC
Short Term and Long
Term Prediction
Speech like
Confusion Matrix of CELP Features
Classification Rate Airplane Bird Insect Motor Rain Rest. Stream Thunder Train Wind
Airplane 88.4 – – – – 1.9 – 0.2 5.1 4.4
Bird – 96.8 – 0.1 – 1.6 0.3 0.2 1.1 –
Insect – – 99.6 – – 0.4 – – – –
Motor 0.1 – – 90.4 – 5.7 – 0.3 3.5 –
Rain – – – – 99 0.3 0.4 0.1 – –
Rest. 1 2.2 – 8.1 0.1 77.9 1.4 2.6 6.8 0.1
Stream – 0.2 – – 0.3 1 97.7 0.2 0.5 –
Thunder 1.9 0.6 0.1 3 0.3 7.5 3.8 78.8 3.4 0.7
Train 5.1 0.7 – 5 0.1 7.1 0.1 0.7 81.3 –
Wind – – – – – – – 1.3 – 98.7
11
Principal Component Analysis
12
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
ClassificationRate(%)
Number of Dimension
Principal Component Analysis
CELP
MFCC
MFCC+CELP
Speed and Complexity
 Speed
• Feature extraction:
• Real time
• Classification
• Training:
• Depends on different classifier/kernel
• Testing:
• Fast and neglect able
13
Avg Run Time Training(sec) Testing(sec)
CELP 659 8
MFCC 672 9
CELP+MFCC 912 10
14
Summary of ESR topic
 Conclusion
• A novel set of CELP-based features are proposed by exploring the CELP bit stream
information
• MFCCs representing bank energy not suitable for ESR
• CELP and CELP+MFCC performs better than MFCC by 10% margin (Bayesian
network classifier) in ESR problem
• Long and short term prediction
• more robust with respect to background noise
• CELP enjoys low complexity, easy implementation and extendible benefits
• Recognition based on CELP features is desirable since the additional effort required
by feature extraction is almost negligible
Conclusion
 A novel set of CELP-based features are proposed by exploring the
CELP bit stream information
 MFCCs representing bank energy not suitable for ESR
 CELP and CELP+MFCC performs better than MFCC by 10% margin
(Bayesian network classifier) in ESR problem
• Long and short term prediction
• more robust with respect to background noise
 CELP enjoys low complexity, easy implementation and extendible
benefits
 Recognition based on CELP features is desirable since the additional
effort required by feature extraction is almost negligible
15
Future Work
 Explore more features
 Speaker recognition and identification
 Longer term signature capture
16
17
Q&A
Thanks
Q & A

Contenu connexe

Tendances

Real-time Implementation of Sphere Decoder-based MIMO Wireless System (EUSIPC...
Real-time Implementation of Sphere Decoder-based MIMO Wireless System (EUSIPC...Real-time Implementation of Sphere Decoder-based MIMO Wireless System (EUSIPC...
Real-time Implementation of Sphere Decoder-based MIMO Wireless System (EUSIPC...TSC University of Mondragon
 
CLIC plans for the next European strategy update, and beyond
CLIC plans for the next European strategy update, and beyondCLIC plans for the next European strategy update, and beyond
CLIC plans for the next European strategy update, and beyondasafrona
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012Jinwon Lee
 
Low Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard PlatformLow Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard Platforma3labdsp
 
IGARSS_DESDynI_2011.pptx
IGARSS_DESDynI_2011.pptxIGARSS_DESDynI_2011.pptx
IGARSS_DESDynI_2011.pptxgrssieee
 
Stability Analysis in a Cognitive Radio System with Cooperative Beamforming
Stability Analysis in a Cognitive Radio System with Cooperative BeamformingStability Analysis in a Cognitive Radio System with Cooperative Beamforming
Stability Analysis in a Cognitive Radio System with Cooperative BeamformingMohamed Seif
 
Smart mm-Wave Beam Steering Algorithm for Fast Link Re-Establishment under No...
Smart mm-Wave Beam Steering Algorithm for Fast Link Re-Establishment under No...Smart mm-Wave Beam Steering Algorithm for Fast Link Re-Establishment under No...
Smart mm-Wave Beam Steering Algorithm for Fast Link Re-Establishment under No...Avishek Patra
 
Experimental Evaluation of a Novel Fast Beamsteering Algorithm for Link Re-Es...
Experimental Evaluation of a Novel Fast Beamsteering Algorithm for Link Re-Es...Experimental Evaluation of a Novel Fast Beamsteering Algorithm for Link Re-Es...
Experimental Evaluation of a Novel Fast Beamsteering Algorithm for Link Re-Es...Avishek Patra
 
Mars CubeSat Telecom Relay Constellation_JPL Final
Mars CubeSat Telecom Relay Constellation_JPL FinalMars CubeSat Telecom Relay Constellation_JPL Final
Mars CubeSat Telecom Relay Constellation_JPL FinalRohan Deshmukh
 
Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...Daichi Kitamura
 
Optimal Relay Selection and Beamforming in MIMO Cognitive Multi-Relay Networks
Optimal Relay Selection and Beamforming in MIMO Cognitive Multi-Relay NetworksOptimal Relay Selection and Beamforming in MIMO Cognitive Multi-Relay Networks
Optimal Relay Selection and Beamforming in MIMO Cognitive Multi-Relay NetworksMohamed Seif
 
IRJET- Compressed Sensing based Modified Orthogonal Matching Pursuit in DTTV ...
IRJET- Compressed Sensing based Modified Orthogonal Matching Pursuit in DTTV ...IRJET- Compressed Sensing based Modified Orthogonal Matching Pursuit in DTTV ...
IRJET- Compressed Sensing based Modified Orthogonal Matching Pursuit in DTTV ...IRJET Journal
 
Link Adapatation in Mobile Satellite Links: Field Trial Results using SDR and...
Link Adapatation in Mobile Satellite Links: Field Trial Results using SDR and...Link Adapatation in Mobile Satellite Links: Field Trial Results using SDR and...
Link Adapatation in Mobile Satellite Links: Field Trial Results using SDR and...Anxo Tato Arias
 
Beamforming for Multiuser Massive MIMO Systems: Digital versus Hybrid Analog-...
Beamforming for Multiuser Massive MIMO Systems: Digital versus Hybrid Analog-...Beamforming for Multiuser Massive MIMO Systems: Digital versus Hybrid Analog-...
Beamforming for Multiuser Massive MIMO Systems: Digital versus Hybrid Analog-...T. E. BOGALE
 
Using Derivation-Free Optimization Methods in the Hadoop Cluster with Terasort
Using Derivation-Free Optimization Methods in the Hadoop Cluster with TerasortUsing Derivation-Free Optimization Methods in the Hadoop Cluster with Terasort
Using Derivation-Free Optimization Methods in the Hadoop Cluster with TerasortAnhanguera Educacional S/A
 
Energy Efficient GPS Acquisition with Sparse-GPS
Energy Efficient GPS Acquisition with Sparse-GPSEnergy Efficient GPS Acquisition with Sparse-GPS
Energy Efficient GPS Acquisition with Sparse-GPSPrasant Misra
 
Subgraph Matching for Resource Allocation in the Federated Cloud Environment
Subgraph Matching for Resource Allocation in the Federated Cloud EnvironmentSubgraph Matching for Resource Allocation in the Federated Cloud Environment
Subgraph Matching for Resource Allocation in the Federated Cloud EnvironmentAtakanAral
 
On Line Training of the Path-Loss Model in Bayesian WLAN Indoor Positioning
On Line Training of the Path-Loss Model in Bayesian WLAN Indoor PositioningOn Line Training of the Path-Loss Model in Bayesian WLAN Indoor Positioning
On Line Training of the Path-Loss Model in Bayesian WLAN Indoor Positioninglbruno236
 
Massive MIMO: Bristol - Lund Joint Field Trial Experiments and Record Breakin...
Massive MIMO: Bristol - Lund Joint Field Trial Experiments and Record Breakin...Massive MIMO: Bristol - Lund Joint Field Trial Experiments and Record Breakin...
Massive MIMO: Bristol - Lund Joint Field Trial Experiments and Record Breakin...Communication Systems & Networks
 

Tendances (20)

Real-time Implementation of Sphere Decoder-based MIMO Wireless System (EUSIPC...
Real-time Implementation of Sphere Decoder-based MIMO Wireless System (EUSIPC...Real-time Implementation of Sphere Decoder-based MIMO Wireless System (EUSIPC...
Real-time Implementation of Sphere Decoder-based MIMO Wireless System (EUSIPC...
 
CLIC plans for the next European strategy update, and beyond
CLIC plans for the next European strategy update, and beyondCLIC plans for the next European strategy update, and beyond
CLIC plans for the next European strategy update, and beyond
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
 
Low Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard PlatformLow Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard Platform
 
IGARSS_DESDynI_2011.pptx
IGARSS_DESDynI_2011.pptxIGARSS_DESDynI_2011.pptx
IGARSS_DESDynI_2011.pptx
 
Stability Analysis in a Cognitive Radio System with Cooperative Beamforming
Stability Analysis in a Cognitive Radio System with Cooperative BeamformingStability Analysis in a Cognitive Radio System with Cooperative Beamforming
Stability Analysis in a Cognitive Radio System with Cooperative Beamforming
 
Smart mm-Wave Beam Steering Algorithm for Fast Link Re-Establishment under No...
Smart mm-Wave Beam Steering Algorithm for Fast Link Re-Establishment under No...Smart mm-Wave Beam Steering Algorithm for Fast Link Re-Establishment under No...
Smart mm-Wave Beam Steering Algorithm for Fast Link Re-Establishment under No...
 
Experimental Evaluation of a Novel Fast Beamsteering Algorithm for Link Re-Es...
Experimental Evaluation of a Novel Fast Beamsteering Algorithm for Link Re-Es...Experimental Evaluation of a Novel Fast Beamsteering Algorithm for Link Re-Es...
Experimental Evaluation of a Novel Fast Beamsteering Algorithm for Link Re-Es...
 
Mars CubeSat Telecom Relay Constellation_JPL Final
Mars CubeSat Telecom Relay Constellation_JPL FinalMars CubeSat Telecom Relay Constellation_JPL Final
Mars CubeSat Telecom Relay Constellation_JPL Final
 
Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...Online divergence switching for superresolution-based nonnegative matrix fact...
Online divergence switching for superresolution-based nonnegative matrix fact...
 
Optimal Relay Selection and Beamforming in MIMO Cognitive Multi-Relay Networks
Optimal Relay Selection and Beamforming in MIMO Cognitive Multi-Relay NetworksOptimal Relay Selection and Beamforming in MIMO Cognitive Multi-Relay Networks
Optimal Relay Selection and Beamforming in MIMO Cognitive Multi-Relay Networks
 
IRJET- Compressed Sensing based Modified Orthogonal Matching Pursuit in DTTV ...
IRJET- Compressed Sensing based Modified Orthogonal Matching Pursuit in DTTV ...IRJET- Compressed Sensing based Modified Orthogonal Matching Pursuit in DTTV ...
IRJET- Compressed Sensing based Modified Orthogonal Matching Pursuit in DTTV ...
 
Link Adapatation in Mobile Satellite Links: Field Trial Results using SDR and...
Link Adapatation in Mobile Satellite Links: Field Trial Results using SDR and...Link Adapatation in Mobile Satellite Links: Field Trial Results using SDR and...
Link Adapatation in Mobile Satellite Links: Field Trial Results using SDR and...
 
Beamforming for Multiuser Massive MIMO Systems: Digital versus Hybrid Analog-...
Beamforming for Multiuser Massive MIMO Systems: Digital versus Hybrid Analog-...Beamforming for Multiuser Massive MIMO Systems: Digital versus Hybrid Analog-...
Beamforming for Multiuser Massive MIMO Systems: Digital versus Hybrid Analog-...
 
Using Derivation-Free Optimization Methods in the Hadoop Cluster with Terasort
Using Derivation-Free Optimization Methods in the Hadoop Cluster with TerasortUsing Derivation-Free Optimization Methods in the Hadoop Cluster with Terasort
Using Derivation-Free Optimization Methods in the Hadoop Cluster with Terasort
 
Energy Efficient GPS Acquisition with Sparse-GPS
Energy Efficient GPS Acquisition with Sparse-GPSEnergy Efficient GPS Acquisition with Sparse-GPS
Energy Efficient GPS Acquisition with Sparse-GPS
 
Subgraph Matching for Resource Allocation in the Federated Cloud Environment
Subgraph Matching for Resource Allocation in the Federated Cloud EnvironmentSubgraph Matching for Resource Allocation in the Federated Cloud Environment
Subgraph Matching for Resource Allocation in the Federated Cloud Environment
 
On Line Training of the Path-Loss Model in Bayesian WLAN Indoor Positioning
On Line Training of the Path-Loss Model in Bayesian WLAN Indoor PositioningOn Line Training of the Path-Loss Model in Bayesian WLAN Indoor Positioning
On Line Training of the Path-Loss Model in Bayesian WLAN Indoor Positioning
 
Massive MIMO: Bristol - Lund Joint Field Trial Experiments and Record Breakin...
Massive MIMO: Bristol - Lund Joint Field Trial Experiments and Record Breakin...Massive MIMO: Bristol - Lund Joint Field Trial Experiments and Record Breakin...
Massive MIMO: Bristol - Lund Joint Field Trial Experiments and Record Breakin...
 
Precoding
PrecodingPrecoding
Precoding
 

En vedette

Applications and Derivation of Linear Predictive Coding
Applications and Derivation of Linear Predictive CodingApplications and Derivation of Linear Predictive Coding
Applications and Derivation of Linear Predictive CodingEric Larson
 
LPC for Speech Recognition
LPC for Speech RecognitionLPC for Speech Recognition
LPC for Speech RecognitionDr. Uday Saikia
 
Linear predictive coding documentation
Linear predictive coding  documentationLinear predictive coding  documentation
Linear predictive coding documentationchakravarthy Gopi
 
Linear prediction
Linear predictionLinear prediction
Linear predictionUma Rajaram
 

En vedette (7)

Applications and Derivation of Linear Predictive Coding
Applications and Derivation of Linear Predictive CodingApplications and Derivation of Linear Predictive Coding
Applications and Derivation of Linear Predictive Coding
 
Linear prediction
Linear predictionLinear prediction
Linear prediction
 
Speech coding std
Speech coding stdSpeech coding std
Speech coding std
 
LPC for Speech Recognition
LPC for Speech RecognitionLPC for Speech Recognition
LPC for Speech Recognition
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive Coding
 
Linear predictive coding documentation
Linear predictive coding  documentationLinear predictive coding  documentation
Linear predictive coding documentation
 
Linear prediction
Linear predictionLinear prediction
Linear prediction
 

Similaire à ISSCS2011

Barcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaBarcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaFacultad de Informática UCM
 
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...IJERA Editor
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Rf network design
Rf network designRf network design
Rf network designNguyen Le
 
237620891 atoll-3-1-lte-training
237620891 atoll-3-1-lte-training237620891 atoll-3-1-lte-training
237620891 atoll-3-1-lte-trainingThang Dang
 
Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet D...
Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet D...Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet D...
Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet D...Shamman Noor Shoudha
 
Random access scan
Random access scan Random access scan
Random access scan Harish Peta
 
WE2.TO9.4.ppt
WE2.TO9.4.pptWE2.TO9.4.ppt
WE2.TO9.4.pptgrssieee
 
Coded OFDM in Fiber-Optics Communication Systems with Optimum biasing of Laser
Coded OFDM in Fiber-Optics Communication Systems with Optimum biasing of LaserCoded OFDM in Fiber-Optics Communication Systems with Optimum biasing of Laser
Coded OFDM in Fiber-Optics Communication Systems with Optimum biasing of LaserCSCJournals
 
Dissertation on lv pll by dhwani sametrya
Dissertation on lv pll by dhwani sametryaDissertation on lv pll by dhwani sametrya
Dissertation on lv pll by dhwani sametryaDhwani Sametriya
 
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...Sri Manakula Vinayagar Engineering College
 
W(level3) wcdma rno rf optimization-20041217-a-1[1].0
W(level3) wcdma rno rf optimization-20041217-a-1[1].0W(level3) wcdma rno rf optimization-20041217-a-1[1].0
W(level3) wcdma rno rf optimization-20041217-a-1[1].0Malik Md Nurani
 
Qualcomm lte-performance-challenges-09-01-2011
Qualcomm lte-performance-challenges-09-01-2011Qualcomm lte-performance-challenges-09-01-2011
Qualcomm lte-performance-challenges-09-01-2011Muhammad Noor Ifansyah
 
Universal approximators for Direct Policy Search in multi-purpose water reser...
Universal approximators for Direct Policy Search in multi-purpose water reser...Universal approximators for Direct Policy Search in multi-purpose water reser...
Universal approximators for Direct Policy Search in multi-purpose water reser...Andrea Castelletti
 

Similaire à ISSCS2011 (20)

Barcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaBarcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de Riqueza
 
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
 
Thesis
ThesisThesis
Thesis
 
Thesis
ThesisThesis
Thesis
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Rf network design
Rf network designRf network design
Rf network design
 
237620891 atoll-3-1-lte-training
237620891 atoll-3-1-lte-training237620891 atoll-3-1-lte-training
237620891 atoll-3-1-lte-training
 
Atol use.pdf
Atol use.pdfAtol use.pdf
Atol use.pdf
 
Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet D...
Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet D...Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet D...
Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet D...
 
Random access scan
Random access scan Random access scan
Random access scan
 
WE2.TO9.4.ppt
WE2.TO9.4.pptWE2.TO9.4.ppt
WE2.TO9.4.ppt
 
Equalization.pdf
Equalization.pdfEqualization.pdf
Equalization.pdf
 
dfma_seminar
dfma_seminardfma_seminar
dfma_seminar
 
Coded OFDM in Fiber-Optics Communication Systems with Optimum biasing of Laser
Coded OFDM in Fiber-Optics Communication Systems with Optimum biasing of LaserCoded OFDM in Fiber-Optics Communication Systems with Optimum biasing of Laser
Coded OFDM in Fiber-Optics Communication Systems with Optimum biasing of Laser
 
Dissertation on lv pll by dhwani sametrya
Dissertation on lv pll by dhwani sametryaDissertation on lv pll by dhwani sametrya
Dissertation on lv pll by dhwani sametrya
 
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
 
On using BS to improve the
On using BS to improve theOn using BS to improve the
On using BS to improve the
 
W(level3) wcdma rno rf optimization-20041217-a-1[1].0
W(level3) wcdma rno rf optimization-20041217-a-1[1].0W(level3) wcdma rno rf optimization-20041217-a-1[1].0
W(level3) wcdma rno rf optimization-20041217-a-1[1].0
 
Qualcomm lte-performance-challenges-09-01-2011
Qualcomm lte-performance-challenges-09-01-2011Qualcomm lte-performance-challenges-09-01-2011
Qualcomm lte-performance-challenges-09-01-2011
 
Universal approximators for Direct Policy Search in multi-purpose water reser...
Universal approximators for Direct Policy Search in multi-purpose water reser...Universal approximators for Direct Policy Search in multi-purpose water reser...
Universal approximators for Direct Policy Search in multi-purpose water reser...
 

ISSCS2011

  • 1. Environmental Sound Recognition with CELP-based Features EnShuo Tsau, Seung-Hwan Kim and C.-C. Jay Kuo Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089-2564 http://viola.usc.edu/
  • 2. Outline  Environmental Sound Recognition (ESR) and Challenge  Conventional Audio Features  Motivation and Proposed Solution  Experimental Results  Conclusion and Future Work 2
  • 3. Environmental Sound Recognition (ESR)  Environmental Sound • Restaurants, streets, parks, airport and train stations, hallway, etc  Environmental Sound Recognition (ESR) • Use audio information to assist activities, • Easy storage and process • Robotic navigation and human-computer interactions • Lacking of lighting and angle of the camera problems • Other applications: surveillance, search and rescue  Challenges of ESR • Similar sounds • Multiple generating sources • Noise 3 Unlike speech and music Unstructured Difficult to build model
  • 4. Conventional Audio Features  Conventional features • MFCCs, MFCC derivatives, sub-band energy, fundamental frequency, LPCCs, energy, zerocrossing, and spectral- centroid, bandwidth, matching pursuit (MP)  Problems with conventional features • MFCCs • Describe the shape of the overall spectrum • Only works well for structured sounds such as speech and music • Performance degrades in the presence of noise • MP • Relatively works well for both structured sound and unstructured sound • Require significant computational complexity 4
  • 5. Motivation for CELP-based Features Feature Set CELP MFCCs MP Preserve Data Featuresdata (reversible) Easy Implementation ITU-T G.723.1 Low Complexity Real Time Compact Feature Classification Rate ESR Different Applications Speech Music Potential Side Benefits Mobile applications (5.3/6.3 kbps) Fix point  Comparison with MFCCs and MP Bit streams Information Features 5
  • 6. Code Excited Linear Prediction (CELP) . • M. R. Schroeder and B. S. Atal, "Code-excited linear prediction (CELP): high-quality speech at very low bit rates," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 10, pp. 937–940, 1985 6 Analysis-by-Synthesis Linear Prediction Short Term Prediction (STP): Linear Prediction Coefficients Long Term Prediction (LTP): Pitch T Residual Description
  • 7. Proposed CELP Features  240 samples/frame; 4 subframes/frame;  Available CELP features from bit streams • LPC(Linear Prediction Coefficients) – 10 order • or LSF(Line Spectral Frequencies) • Pitch Lag • Open loop • Close loop 20≤ p ≤ 147 • GAIN of two excitation • Pitch filter (5 tap) • Fixed codebook pulse • POS • Location and sign of fixed codebook pulse CELP 7
  • 8. Proposed Solution CELP: 11 dim MFCC: 21 dim (full bank) Classifier (Bayesian Network) Data Preprocessing Normalization, Cleaning Feature Extraction Classification 8
  • 9. Experimental Setup and Result  10 classes: • Transportation (3): airplane, motorcycle and train. • Weather (4): rain, thunder, wind and stream • Rural Areas (2): bird, insect. • Indoor (1): restaurant.  Feature Extraction • Modifying standard code ITU-T G.723.1  Classifier • Bayesian Network 9
  • 10. Comparison of Features ClassificationAirplane Bird Insect Motor Rain Rest. Stream Thunder Train Wind Overall PITCH 77.8 28.8 1.1 27.1 1.2 62.6 10.5 0 29.1 21.2 26.8 GAIN 66.3 8.5 44 18.5 32 8.3 8.3 2.4 15.9 11.5 22.2 LPC 85.4 96.3 99.6 89.8 99.1 63.7 98 77 74.1 98.5 88.5 CELP+GAIN 88.7 96.8 99.6 90.4 99 77.8 97.6 79.5 81.6 98.7 91 CELP+GAIN+ POS 92.6 99.5 98.7 73.7 96.3 55.9 96 30 61 93 81.3 MFCC 87.8 90 95.8 86.2 76.8 69.4 77 43.2 86.9 100 82.5 CELP 88.4 96.8 99.6 90.4 99 77.9 97.7 78.8 81.3 98.7 91.2 CELP+MFCC 92.3 97.7 99.5 95.5 99 87.5 98.7 85.4 93.4 99.9 95.2 10 0 10 20 30 40 50 60 70 80 90 100 Airplane Bird Insect Motor Rain Rest. Stream Thunder Train Wind Overall ClassificationRate(%) Comparison of Features MFCC CELP CELP+MFCC Short Term and Long Term Prediction Speech like
  • 11. Confusion Matrix of CELP Features Classification Rate Airplane Bird Insect Motor Rain Rest. Stream Thunder Train Wind Airplane 88.4 – – – – 1.9 – 0.2 5.1 4.4 Bird – 96.8 – 0.1 – 1.6 0.3 0.2 1.1 – Insect – – 99.6 – – 0.4 – – – – Motor 0.1 – – 90.4 – 5.7 – 0.3 3.5 – Rain – – – – 99 0.3 0.4 0.1 – – Rest. 1 2.2 – 8.1 0.1 77.9 1.4 2.6 6.8 0.1 Stream – 0.2 – – 0.3 1 97.7 0.2 0.5 – Thunder 1.9 0.6 0.1 3 0.3 7.5 3.8 78.8 3.4 0.7 Train 5.1 0.7 – 5 0.1 7.1 0.1 0.7 81.3 – Wind – – – – – – – 1.3 – 98.7 11
  • 12. Principal Component Analysis 12 0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 ClassificationRate(%) Number of Dimension Principal Component Analysis CELP MFCC MFCC+CELP
  • 13. Speed and Complexity  Speed • Feature extraction: • Real time • Classification • Training: • Depends on different classifier/kernel • Testing: • Fast and neglect able 13 Avg Run Time Training(sec) Testing(sec) CELP 659 8 MFCC 672 9 CELP+MFCC 912 10
  • 14. 14 Summary of ESR topic  Conclusion • A novel set of CELP-based features are proposed by exploring the CELP bit stream information • MFCCs representing bank energy not suitable for ESR • CELP and CELP+MFCC performs better than MFCC by 10% margin (Bayesian network classifier) in ESR problem • Long and short term prediction • more robust with respect to background noise • CELP enjoys low complexity, easy implementation and extendible benefits • Recognition based on CELP features is desirable since the additional effort required by feature extraction is almost negligible
  • 15. Conclusion  A novel set of CELP-based features are proposed by exploring the CELP bit stream information  MFCCs representing bank energy not suitable for ESR  CELP and CELP+MFCC performs better than MFCC by 10% margin (Bayesian network classifier) in ESR problem • Long and short term prediction • more robust with respect to background noise  CELP enjoys low complexity, easy implementation and extendible benefits  Recognition based on CELP features is desirable since the additional effort required by feature extraction is almost negligible 15
  • 16. Future Work  Explore more features  Speaker recognition and identification  Longer term signature capture 16