SlideShare une entreprise Scribd logo
1  sur  7
Télécharger pour lire hors ligne
Subrata Mondal, Shiladitya Pujari, Tapas Sangiri / International Journal of Engineering
             Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                    Vol. 2, Issue 6, November- December 2012, pp.123-129
Environmental Natural Sound Detection And Classification Using
         Content-Based Retrieval (CBR) And MFCC
            Subrata Mondal1, Shiladitya Pujari2, Tapas Sangiri3
                             1
                              Technical Assistant, Department of Computer Science,
                   Bankura Unnayani Institute of Engineering, Bankura, West Bengal, India
                                      2
                                        Assistant Professor, Department of IT,
             University Institute of Technology, Burdwan University, Burdwan, West Bengal, India
                                      3
                                        Assistant Professor, Department of IT,
                   Bankura Unnayani Institute of Engineering, Bankura, West Bengal, India


ABSTRACT
         This paper deals with the extraction and         important aspects of our surroundings environment.
classification of environmental natural sounds            We have presented the different phases of the entire
with the help of content-based retrieval method.          process such as capturing the environmental audio,
Environmental        sounds     are     extremely         pre- processing of audio data, feature extraction,
unpredictable and are very much hard to classify          training and lastly the testing phase.
and store in cluster forms according to its feature       The major steps involved in the entire method are as
contents. Environmental sounds provide many               follows:
contextual clues that enable us to recognize                       Extraction of features for classifying highly
important     aspects of      our   surroundings          diversified natural sounds.
environment. This paper presented the techniques                   Making clusters according to their feature
that allow computer system to extract and classify        similarity.
features from predefined classes of sounds in the                  Finding a match for a particular sound
environment.                                              query from the cluster.
                                                                    The audio files used for testing purpose are
Keywords-content-based       retrieval, feature           of various categories in sample rate, bit rate and no
extraction,     environmental sound,       Mel            of channels, but files has been digitized to similar
frequency Cepstral coefficiant.                           type, i.e. 22k Hz, 16 bit, mono channel. Then all the
                                                          sounds normalized to -6 db and DC offset corrected
I. INTRODUCTION                                           to 0%. This reprocessing of sound data helps
          The development of high speed computer          considering low level descriptor in the sound files
has developed a tremendous opportunity to handle          are similar, hence we can proceed without any
large amount of data from different media. The            drawback in primary classification methods.
proliferation of world-wide- web has farther
accelerated its growth and this trend is expected to      II. RELATED WORKS
continue at an even faster pace in the coming years.                A brief survey of previous works relevant
Even in the past decade textual data was the major        to our work is presented here. Parekh R [1] proposed
form of data that was handled by the computer. But        a statistical method of classification based on Zero
above advancement in technology has led to uses of        crossing rate (ZCR) and Root means square (RMS)
non textual data such as video, audio and image           value of audio file. For this purpose sounds has been
which are known as multimedia data and the                categorized in three semantic classes. 1. Aircraft
database deals with this type of data is called           sound, 2.Bird sound, 3.Car sound. First and third are
multimedia database. This database has much more          inorganic in nature whereas second is organic in
capability than traditional database. One of the best     nature. The author tried to extract samples,
features of multimedia database is content-based          amplitudes and frequencies of those sounds. To
retrieval (CBR). In content base retrieval process the    generate the feature vector, ZCR and RMS methods
major advantage is that it can search data by its         are used. Statistical measure is applied to these
content rather textual indexing. This paper deals         features σZ, μZ, μR, σR, T', R' and values are
with the classification of environmental sounds           calculated.
classification with the help of content-based retrieval
and Mel frequency cepstral coefficient (MFCC).                     Goldhor R. S. et al [2] presented a method
Environmental sounds are extremely unpredictable          of classification technique based on dimensional
so it's really hard to classify and store in cluster      Cepstral Co-efficient. They segmented the sound
forms according to its feature contents.                  wave into tokens of sound according to signal power
Environmental sounds provide many contextual              Histogram. From the histogram statistics, a threshold
clues or characteristics that enable us to recognize      –power level was calculated to separate portions of



                                                                                                 123 | P a g e
Subrata Mondal, Shiladitya Pujari, Tapas Sangiri / International Journal of Engineering
             Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                    Vol. 2, Issue 6, November- December 2012, pp.123-129
the signal stream containing the sound from portions       human perception. They developed an algorithm
containing only background noise. Result is derived        based on matching approximation of repeating parts
from a high SNR recording. As feature vector they          of the sequence unit. For separating pattern they use
used two-dimensional cepstral-coefficients. They           the fact that silent segments affect human perception
calculated cepstral       Coefficient for each token by    and are recognized as the separators of perceived
using Hamming window technique. They also used             patterns, determined the length of the silent portion
two types of cepstral coefficients; one is linear          by dynamic process of histogram of length of silent
cepstral coefficient and another is the Mel frequency      segments using threshold method. Finally distances
cepstral coefficient.                                      between every pair of the units are computed by
           In their paper, Toyoda Y. et al [3] presented   DTW which minimizes the total distance between
the uses of neural network to classify certain types       units by aligning their time series.
sound such as crashing sound, clapping sound, bell
sound, switch typing sound etc All these sounds                     Janku L. [6] used MPEG7 low level
were classified into three categories; burst sound,        descriptor for sound recognition with the help of
repeated sound and continuous sound. For this              hidden markov model. Author used four low level
purpose they fetched the environmental sounds from         descriptor defined in mpeg 7. Those are Audio
a sound database RWCP-OB. For further                      spectrum envelop, audio spectrum centroid, audio
improvement, they developed multistage neural              spectrum spread and audio spectrum flatness. The
network. In this work, a three layer perception neural     testing method described is a three-step procedure:
network is used. In the first step of recognition they     (a) Receiving audio wave file and taking 512 points
used long time power pattern. This classifies the          short time Fourier transform.
sound as single impact. Second stage of                    (b) According to the presented spectrogram, we take
classification was done by combination of short time       the associated parameters.
power pattern and frequency at power peak.                 (c) Taking the SVD of flatness and running the
           Hattori Y. et al [4] proposed an                Viterbi algorithm to get the sound class candidate.
environmental sounds recognition system using LPC
- Cepstral coefficients for characterization and                    Chen Z. and Maher R. C. [7] dealt with
artificial neural network as verification method. This     Automatic off-line classification and recognition of
system was evaluated using a database containing           bird vocalizations. They focused on syllable level
files of different sound-sources under a variety of        discrimination of bird sound. For this purpose they
recording conditions. Two neural networks were             developed a set of parameters those are spectral
trained with the magnitude of the discrete Fourier         frequencies, frequency differences, track shape,
transform of the LPC Cepstral matrices. The global         spectral power, and track duration. The parameters
percentage of verification was of 96.66%. First they       include the frequencies, the frequency differences,
collected a common sound database. A segmentation          the relative the shape, and the duration of the
algorithm is applied to each file. For this purpose        spectral peak tracks. These parameters were selected
they uses 240 point Hamming window with frame              based on several key insights. For the feature
overlap of 50%. LPC cepstral coefficient is                extraction they had two choices: MPEG-7 audio
computed and DFT is measured. For LPC cepstral             features and Mel-scale Frequency Cepstral
calculating they used Levinson-Durbin recursion.           Coefficients (MFCC), For the classification they had
                                                           two choices: Maximum Likelihood Hidden Markov
         Wang J. et al [5] focused on repeat               Models (ML-HMM) and Entropic Prior HMM (EP-
recognition in environmental sounds. For this              HMM. They have achieved accuracy of 90% with
purpose they designed a three module system; 1)            the best and the second best being MPEG-7 features
Unit of the repetitions; 2) Distance between units; 3)     with EP- HMM and MFCC with ML-HMM.
Algorithm to detect repeating part.
                                                                    In their paper, Constantine G. et al [8]
For unit extraction they used the concept of a peak in     presented a comparison of three audio taxonomy
power envelop. They extracted peak power with the          methods for MPEG-7 sound classification. For the
help of following:                                         low-level descriptors they used, low-dimensional
1.The power is below the threshold T L                     features based on spectral basis descriptors are
2. The interval between the maximum and the                produced in three stages:
neighbor maximum is than the threshold T2.                         Normalized audio spectrum envelope.
3. The ratio of the power to the power of the                      Principal component analysis.
neighbor local minimum is than the threshold T3.                   Independent component analysis.
Distance of every pair of unit was calculated using
dynamic time warping MFCC.                                           High-level description schemes they used to
                                                           describe the modeling of audio features, the
Mel frequency cepstral coefficient (MFCC) was              procedure of audio classification, and retrieval. For
developed by Stevens and was designed to adapt             classification they tested three approaches:


                                                                                                 124 | P a g e
Subrata Mondal, Shiladitya Pujari, Tapas Sangiri / International Journal of Engineering
             Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                    Vol. 2, Issue 6, November- December 2012, pp.123-129
                                                                     Mel frequency cepstral coefficients (music
        The direct approach.                                and speech).
        The hierarchical approach without hints.                    Linear     prediction     cepstral    (LPC)
        Hierarchical approach with hints.                   coefficients.
                                                                     Mel frequency LPC coefficients.
          They suggested that best approach is                       Bark frequency cepstral coefficients.
hierarchical approach with hints, which results in                   Bark frequency LPC coefficients.
classification accuracy of around 99%. The direct                    Perceptual linear prediction (PLP) features.
approach produces the second best results, and the
hierarchical approach without hints the third best                    Cepstral     frequency    based      feature
results.                                                     extractions are pseudo stationary feature extraction
                                                             techniques because they split the signal into many
III. SOUND ANALYSIS TECHNIQUE                                small part in time domain. Techniques based on LPC
          Undoubtedly the most important part of the         coefficients were based on the idea of a decoder,
entire process is to extract the features from the audio     which is a simulation of the human vocal tract. Since
signal. The audio signal feature extraction in a             the human vocal tract does not produce
categorization problem. It is about reducing the             environmental sounds, these techniques typically
dimensionality of the input-vector while maintaining         seem to highlight non-unique features in the sound
the discriminating power of the signal. It is clear          and are therefore not appropriate for recognition of
from the above discussion of environmental sound             non- speech sounds.
identification and verification systems, that the
training and test vector needed for the classification       The main non         stationary   feature   extraction
problem grows dimension of the given input vector            techniques are:
[2-3], feature extraction is necessary. More variety in
sound needs more variety in features for a particular                Short-time Fourier transforms (STFT).
genre of sound.                                                      Fast (discrete) wavelet transform (FWT).
          There are various techniques suitable for                  Continuous wavelet transforms (CWT).
environmental sound recognition. Recognition of                      Wigner-Ville distribution (WVD).
sound (whether it is speech or environmental) is
generally consists of two phases:                                      These techniques use different algorithms
         The Feature Extraction phase, followed by          to represent time frequency based representation of
         The      Classification     (using    artificial   signals. Short-time Fourier transforms use standard
intelligence techniques or any other) phase.                 Fourier transforms of the signal where as wavelet
          In feature extraction phase, sound is              based techniques apply a mother wavelet to a
manipulated in order to produce a set of                     waveform to surmount the resolution issues inherent
characteristics. Classification phase is to recognize        in STFT. WVD is a bilinear time-frequency
the sound by cataloging the features of existing             distribution.
sounds during training session and then comparing
the test sound to this database of features (which is                Moreover there are some other features
called the testing session). These two phases has been       which are suitable for classification of
described in the following text.                             environmental sounds. These features are described
                                                             below:
A. Feature Extraction
        Feature extraction mainly of two types. (a)          Zero-crossing rate:
Frequency based feature extraction and (b) Time-                      The zero-crossing rate is a measure of the
frequency based feature extraction.                          number of time the signal value crosses the zero
                                                             axis. Periodic sounds tend to have small value of it,
         Frequency based feature extraction method           while noisy sounds tend to have a high value of it. It
produces an overall result detailing the contents of         is computed in each time frame on the signal.
the entire signal. According to Cowling [11]
Frequency extraction is stationary feature extraction        Spectral Features:
and Time-frequency extraction is non-stationary                       Spectral features are single valued features,
feature extraction,                                          calculated using the frequency spectrum of a signal.
                                                             Thus for the time domain signal x (t):
Eight popular stationary feature extraction methods          A (f) = 𝐹[𝑥 𝑡 ]
are there:
        Frequency extraction (music and speech).            Spectral Centroid:
        Homomorphic cepstral coefficients.                          The Spectral Centroid (μ) is bar center of
                                                             the spectrum. It is a weighted average of the



                                                                                                    125 | P a g e
Subrata Mondal, Shiladitya Pujari, Tapas Sangiri / International Journal of Engineering
             Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                    Vol. 2, Issue 6, November- December 2012, pp.123-129
probability to observe the normalized amplitude.           B. Sampling :
Given A (f) from the equation mentioned above                       It is the process of converting a continuous
                                                           signal into a discrete signal. Sampling can be done
                   𝜇=      𝑓. 𝑝 𝑓 𝛿𝑓
                                                           for signals varying in space, time, or any other
                                                           dimension, and similar results are obtained in two or
Where, p (f) = A (f) / ∑A (f)                              more dimensions. For signals that vary with time, let
Spectral spread:                                           x(t) be a continuous signal to be sampled, and let
The spectral spread (σ) is a the spectrum around the       sampling be performed by measuring the value of
mean value calculated in following equation                the continuous signal every T seconds, which is
                                                           called the sampling interval. Thus, the sampled
                                                           signal x[n] given by:
              σ2 =       𝑓 − . 2𝑝 𝑓 δ 𝑓
                                                           x[n] = x(nT), with n = 0, 1, 2, 3, ...
Spectral Roll-off:
The spectral roll-off point (fc) is 95% of the signal                The sampling frequency or sampling rate fs
energy below this frequency.                               is defined as the number of samples obtained in one
                                                           second, i.e. fs = 1/T. The sampling rate is measured
IV. DESIGN ARCHITECTURE                                    in hertz or in samples per second.
          In context of multimedia database system
major challenge is to classify the environmental           C. Pre-emphasis :
sound because they are so unpredictable in nature                   In processing of electronic audio signals,
that they are harder to categorize into smaller number     pre-emphasis refers to a system process designed to
of classes. Classification problem concerns with           increase (within a frequency band) the magnitude of
determining two samples match each other or not            some (usually higher) frequencies with respect to the
under some similarity criterion and to different           magnitude of other (usually lower) frequencies in
classes keeping similar type of samples in a class. In     order to improve the overall signal-to-noise ratio
case of Content Based Retrieval (CBR) system we            (SNR) by minimizing the adverse effects.
must consider one thing that how might people
access the sound from database. That may be similar        D. Windowing :
one sound or a group of sounds in terms of some                      In signal processing, a window function
characteristics. For example, class of bird sound,         (also known as tapering function) is a mathematical
class of sound of car, where system has been               function that is zero-valued outside of some chosen
previously train on other sound in similar class. To       interval. For instance, a function that is constant
achieve the desired result, we have divided the entire     inside the interval and zero elsewhere is called a
procedure into two sessions - training and testing         rectangular window, which describes the shape of its
session.                                                   graphical representation.
          In order to make an automated                    Window Terminology:
classification of environmental sound, main part is to              N represents the width, in samples, of a
find out the right features of the sound. In this paper,   discrete-time, symmetrical window function.
we have selected Mel frequency Cepstral Coefficient                 n is an integer, with values 0 ≤ n ≤ N-1.
(MFCC) to extract features from the sound signal
and compare the unknown signal with the existing           1) Hamming window:
signals in the multimedia database.                                  Hamming window is also called the raised
                                                           cosine window. The equation and plot for the
                                                           Hamming window shown below. In a window
                                                           function there is a zero outside of some chosen
                                                           interval. For example, a function that is stable inside
                                                           the interval and zero elsewhere is called a
                                                           rectangular window that illustrates the shape of its
                                                           graphical representation, When signal or any other
                                                           function is multiplied by a window function, the
                                                           product is also zero valued outside the interval. The
                                                           windowing is done to avoid problems due to
                                                           truncation of the signal. Window function has some
                                                           other applications such as spectral analysis, filter
                                                           design, audio data compression.



Figure 1: Mel Frequency Cepstral Coefficient
pipeline



                                                                                                  126 | P a g e
Subrata Mondal, Shiladitya Pujari, Tapas Sangiri / International Journal of Engineering
            Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                   Vol. 2, Issue 6, November- December 2012, pp.123-129
                                                         labeled to 2000 mel. The process is repeated for half
                                                         of the frequency, then this frequency as 500 and so
                                                         on. On this basis the normal frequency is mapped
                                                         into the mel frequency. The mel is normally a linear
                                                         mapping below 1000 Hz and logarithmically above
                                                         1000 Hz.
                                                                  Transformation of normal frequency into
                                                         mel frequency and vice versa can be obtained by
                                                         using the following formulas shown here:
                                                                                            𝑓
                                                          𝑚 = 1127.01048 log⁡ +
                                                                            (1                  )……….... (1)
Figure 2: Example of hamming window                                                       700


          The raised cosine with these particular        Where m is the Mel frequency and f is the normal
coefficients was proposed by Richard W. Hamming.         frequency.
The window is optimized to minimize the maximum
                                                                       𝑒 𝑚
(nearest) side lobe, giving it a height of about one-    𝑓 = 700(1127 .01048 − 1)………………… (2)
fifth that of the Hann window, a raised cosine with
simpler coefficients.                                       Here mel frequency is transformed into normal
Function plotted on the graph bellow :                   frequency. Figure 4 showing the mapping of normal
                                                         frequencies to the mel frequencies.




Figure 3: Function is plotted
                                                         Figure 4: Mapping of normal frequency into Mel
E. Cepstrum :                                            frequency
         Cepstrum name was derived, from the
spectrum by reversing the first four spectrums. We       F. Distance measure:
can say cepstrum is the Fourier Transformer of the                 In sound recognition phase, an unknown
log with phase of the Fourier Transformer frequency      sound is represented by a sequence of feature vectors
bands are not positioned logarithmically. As the         (x1, x2 ... xi), and then it is compared with the other
frequency bands are positioned logarithmically in        sound from the database. In order to identify the
the Fourier Transform the MFCC approximates the          unknown speaker, this can be done by measuring the
human system response better than any other system.      distortion distance of two vector sets based on
These coefficients allow better processing of data. In   minimizing the Euclidean distance. The Euclidean
the Mel Frequency Cepstral Coefficients the              distance is the scalar distance between the two points
calculation of the mel is as the real Cepstrum except    that one would measure with a ruler, which can be
the Mel cepstrum's frequency is to up a                  proven by repeated application of the Pythagorean
correspondence to the Mel scale. The Mel scale was       Theorem, The formula used to calculate the
projected by Stevens, Volkmann and Newman In             Euclidean distance is as follows:
1937.The Mel scale is mainly based on the study of           The Euclidean distance between two points P =
observing the pitch or frequency of human. The           (x1, x2...xn) and Q = (y1, y2… yn)
scale is divided into the units mel. In this test the                              𝒏
person started out hearing a frequency of 1000 Hz,
and labeled it 1000 mel. Listeners were asked to                             𝒅=         (𝒙 𝒊 − 𝒚 𝒊 ) 𝟐
change the frequency till it reaches to the frequency                             𝒊=𝟏

of the reference frequency. Then this frequency is



                                                                                                         127 | P a g e
Subrata Mondal, Shiladitya Pujari, Tapas Sangiri / International Journal of Engineering
             Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                    Vol. 2, Issue 6, November- December 2012, pp.123-129
The sound signal with the lowest distortion distance      that both look for linear combinations of variables
is chosen to be identified as the unknown signal.         which best explains the data. LDA explicitly
                                                          attempts to model the difference between the classes
G. Fast Fourier Transformation :                          of data. PCA on the other hand does not take into
          FFTs are of great importance to a wide          account any difference in class, and factor analysis
variety of applications, from digital signal              builds the feature combinations based on differences
processing and solving partial differential equations     rather than similarities. Discriminate analysis is also
to algorithms for quick multiplication of large           different from factor analysis in that it is not an
integers.                                                 interdependence technique: a distinction between
                                                          independent variables and dependent variables (also
H. Absolute value:                                        called criterion variables) must be made.
         In mathematics, the absolute value (or
modulus) |a| of a real number a is the numerical          V. TRAINING AND TESTING SESSIONS
value of a without its sign. The absolute value of a                As stated earlier, due to achieve the best
number may be thought of as its distance from zero.       result, the entire sound detection and classification
    Generalizations of the absolute value for real        process is divided into two sessions. One is called
numbers occur in a wide variety of mathematical           the training session, during which the system is
settings. For example an absolute value is also           being trained with known sound signals and features
defined for the complex numbers, the quaternion,          of that known signals are extracted. These extracted
ordered rings, fields and vector spaces. The absolute     features are then stored into the multimedia database
value is closely related to the notions of magnitude,     as the reference model for future reference. Figure 5
distance, and norm in various mathematical and            shows the processes during the training session.
physical contexts.                                        Form this figure, we can see that the known sound
                                                          signal is first re-sampled, then re-referenced and
I. DCT :                                                  normalized. The normalized signal is then passed
         In particular, a DCT is a Fourier-related        through the process to extract the features. System is
transform similar to the discrete Fourier transform       then trained with the help of these features and
(DFT), but uses only real numbers. DCTs are               reference model is built.
equivalent to DFTs of roughly twice the length,
operating on real data with even symmetry (since the
Fourier transform of a real and even function is real
and even), where in some variants the input and/or
output data are shifted by half a sample. There are
eight standard DCT variants, of which four are
commonly used.

J. Linear Discriminate Analysis (LDA):
          Linear discriminate analysis (LDA) and
the related Fisher's linear discriminate are methods
used in statistics, pattern recognition and machine
learning to find a linear combination of features
which characterizes or separates two or more classes
of objects or events. The resulting combination may
be used as a linear classifier or, more commonly, for
dimensionality reduction before later classification.
          LDA is closely related to ANOVA
(analysis of variance) and regression analysis, which
also attempts to express one dependent variable as a
linear combination of other features or
measurements. In the other two methods, however,
the dependent variable is a numerical quantity, while
for LDA it is a categorical variable (i.e. the class      Figure 5: Flow chart of Training Session
label). Logistic regression is more similar to LDA,
as it also explains a categorical variable. These other
                                                                    Through the training session, the
methods are preferable in applications where it is not    multimedia database is populated by feature vectors
reasonable to assume that the independent variables       of known sound signals so that those known
are normally distributed, which is a fundamental          environmental sound can be detected and classified
assumption of the LDA method.                             later. In case of unknown signals, signals are newly
          LDA is also closely related to principal        classified during training session and detection can
component analysis (PCA) and factor analysis in           be done with that classification.


                                                                                                 128 | P a g e
Subrata Mondal, Shiladitya Pujari, Tapas Sangiri / International Journal of Engineering
             Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                    Vol. 2, Issue 6, November- December 2012, pp.123-129
                                                          power. Using more than one features of a sound may
          Another session of the detection and            obviously improve the performance of the method.
classification process is the testing session. During     Applying clustering technique, accuracy can be
this session, depicted in figure 6, the input signal is   boosted. Another good feature available today is
sampled, DC offset is determined and then                 Audio spectrum projection provided by MPEG7
normalization is done. After normalization, features      specification. Inclusion of this feature may increase
are extracted for the input signal. Then for a            the performance measure of the method.
particular sound, feature vectors are retrieved from
the multimedia database using content-based               REFERENCES
retrieval (CBR) method. The retrieved features are          [1]  Ranjan      Parekh,     "Classification     Of
then matched with the feature vector of the input                Environmental Sounds",ICEMC2
sound signal.                                               2007, Second International Conference on
                                                                 Embedded Mobile Commu- nication and
                                                                 Computing, ICEMC2 2007.
                                                            [2]  R.     S.     Goldhor,    "Recognition      of
                                                                 Environmental Sounds/' Proceedings of
                                                                 ICASSP, vol.1, pp.149-152, April 1993.
                                                            [3]  Yoshiyuki Toyoda, Jie Huang, Shuxue
                                                                 Ding,Yong Liu, "Environmental Sound
                                                                 Recognition by Multi layered Neural
                                                                 Networks" ,The Fourth International
                                                                 Conference on Computer School of
                                                                 Education        Technology,         Jadavpur
                                                                 University, Kolkata 32 and Information
                                                                 Technology (CIT'04) pp. 123-127,2004.
                                                            [4]  Yuya Hattori, Kazushi Ishihara, Kazunori
                                                                 Komatani,      Tetsuya     Ogata,      Hiroshi
                                                                 G.Okuno/Repeat          Recognition        for
                                                                 Environmental Sounds", Proceedings of the
                                                                 2004 IEEE International Workshop on
                                                                 Robot        and      Human        Interactive
                                                                 Communication Kurashiki, Okayama Japan
                                                                 September 20-22, pp. 83-88, 2004.
                                                            [5]  Jia-Ching Wang, Jhing-Fa Wang, Kuok
Figure 6: Flowchart of Testing Session                           Wai          He,       and        Cheng-Shu
                                                                 Hsu/'Environmental Sound Classification
         After feature extraction, the matching                  Using Hybrid SVM/KNN Classifier and
process is carried on and feature vector of input                MPEG-7 Audio Conference on Low-Level
signal is matched with every reference model of                  Descriptor", 2006 International Joint Neural
sound signals stored in the database. Whenever a                 Networks Sheraton Vancouver Wall Centre
maxim um matching is found, the input signal is said             Hotel, Vancouver, BC, Canada July 16-21,
to be detected. This is the entire process of detection          2006.
and classification of environmental sounds. In both         [6]  Ladislava Janku, "Artificial Perception:
sessions, feature extraction process is done using the           Auditory Decomposition of Mixtures of
MFCC pipeline.                                                   Environmental Sounds - Combining
                                                                 Information Theoretical and Supervised
VI. CONCLUSION                                                   Pattern Recognition Approaches", P. Van
         This method of environmental sound                      Emde Boas et al. (Eds.): SOFSEM 2004,
detection and classification is developed using                  LNCS 2932, pp. 229-240,2004
MFCC pipeline and CBR for extraction of features            [7]  Z, Chen, R. C, Maher, "'Semi-automatic
of a particular sound and retrieval of sound features            classification of bird vocalizations1 Accost
from the multimedia database respectively. This                  Soc Am., Vol. 120, No. 5, pp. 2974-2984,
method can be implemented in the domain of                       November 2006.
robotics where sound detection and recognition may          [8]  G. Constantine, A, Rizzi, and D, Casali,
be possible up to a satisfactory level. If the method            '"Recognition of musical instruments by
will be properly implemented with computer vision,               generalized min-max classifiers/' in IEEE
then human-computer interaction process can be                   Workshop on Neural Networks for Signal
developed much. MFCC is undoubtedly more                         Processing, pp. 555-564, 2003.
efficient feature extraction method because it is
designed by giving emphasis on human perception



                                                                                                129 | P a g e

Contenu connexe

Tendances

Performance analysis of bio-Signal processing in ocean Environment using soft...
Performance analysis of bio-Signal processing in ocean Environment using soft...Performance analysis of bio-Signal processing in ocean Environment using soft...
Performance analysis of bio-Signal processing in ocean Environment using soft...IJECEIAES
 
Deep learning for music classification, 2016-05-24
Deep learning for music classification, 2016-05-24Deep learning for music classification, 2016-05-24
Deep learning for music classification, 2016-05-24Keunwoo Choi
 
IRJET- Implementing Musical Instrument Recognition using CNN and SVM
IRJET- Implementing Musical Instrument Recognition using CNN and SVMIRJET- Implementing Musical Instrument Recognition using CNN and SVM
IRJET- Implementing Musical Instrument Recognition using CNN and SVMIRJET Journal
 
Audio Steganography Coding Using the Discreet Wavelet Transforms
Audio Steganography Coding Using the Discreet Wavelet TransformsAudio Steganography Coding Using the Discreet Wavelet Transforms
Audio Steganography Coding Using the Discreet Wavelet TransformsCSCJournals
 
Human Emotion Recognition From Speech
Human Emotion Recognition From SpeechHuman Emotion Recognition From Speech
Human Emotion Recognition From SpeechIJERA Editor
 
11.design and implementation of distributed space frequency to achieve cooper...
11.design and implementation of distributed space frequency to achieve cooper...11.design and implementation of distributed space frequency to achieve cooper...
11.design and implementation of distributed space frequency to achieve cooper...Alexander Decker
 
IRJET- Wavelet Transform based Steganography
IRJET- Wavelet Transform based SteganographyIRJET- Wavelet Transform based Steganography
IRJET- Wavelet Transform based SteganographyIRJET Journal
 
IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...
IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...
IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...IRJET Journal
 
ENERGY EFFICIENT ANIMAL SOUND RECOGNITION SCHEME IN WIRELESS ACOUSTIC SENSORS...
ENERGY EFFICIENT ANIMAL SOUND RECOGNITION SCHEME IN WIRELESS ACOUSTIC SENSORS...ENERGY EFFICIENT ANIMAL SOUND RECOGNITION SCHEME IN WIRELESS ACOUSTIC SENSORS...
ENERGY EFFICIENT ANIMAL SOUND RECOGNITION SCHEME IN WIRELESS ACOUSTIC SENSORS...ijwmn
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
IRJET- A Review for Reduction of Noise by Wavelet Transform in Audio Signals
IRJET- A Review for Reduction of Noise by Wavelet Transform in Audio SignalsIRJET- A Review for Reduction of Noise by Wavelet Transform in Audio Signals
IRJET- A Review for Reduction of Noise by Wavelet Transform in Audio SignalsIRJET Journal
 
Higher Order Low Pass FIR Filter Design using IPSO Algorithm
Higher Order Low Pass FIR Filter Design using IPSO AlgorithmHigher Order Low Pass FIR Filter Design using IPSO Algorithm
Higher Order Low Pass FIR Filter Design using IPSO Algorithmijtsrd
 
Noise removal techniques for microwave remote sensing radar data and its eval...
Noise removal techniques for microwave remote sensing radar data and its eval...Noise removal techniques for microwave remote sensing radar data and its eval...
Noise removal techniques for microwave remote sensing radar data and its eval...csandit
 
NOISE REMOVAL TECHNIQUES FOR MICROWAVE REMOTE SENSING RADAR DATA AND ITS EVAL...
NOISE REMOVAL TECHNIQUES FOR MICROWAVE REMOTE SENSING RADAR DATA AND ITS EVAL...NOISE REMOVAL TECHNIQUES FOR MICROWAVE REMOTE SENSING RADAR DATA AND ITS EVAL...
NOISE REMOVAL TECHNIQUES FOR MICROWAVE REMOTE SENSING RADAR DATA AND ITS EVAL...cscpconf
 

Tendances (18)

Performance analysis of bio-Signal processing in ocean Environment using soft...
Performance analysis of bio-Signal processing in ocean Environment using soft...Performance analysis of bio-Signal processing in ocean Environment using soft...
Performance analysis of bio-Signal processing in ocean Environment using soft...
 
Deep learning for music classification, 2016-05-24
Deep learning for music classification, 2016-05-24Deep learning for music classification, 2016-05-24
Deep learning for music classification, 2016-05-24
 
IRJET- Implementing Musical Instrument Recognition using CNN and SVM
IRJET- Implementing Musical Instrument Recognition using CNN and SVMIRJET- Implementing Musical Instrument Recognition using CNN and SVM
IRJET- Implementing Musical Instrument Recognition using CNN and SVM
 
1801 1805
1801 18051801 1805
1801 1805
 
Audio Steganography Coding Using the Discreet Wavelet Transforms
Audio Steganography Coding Using the Discreet Wavelet TransformsAudio Steganography Coding Using the Discreet Wavelet Transforms
Audio Steganography Coding Using the Discreet Wavelet Transforms
 
Human Emotion Recognition From Speech
Human Emotion Recognition From SpeechHuman Emotion Recognition From Speech
Human Emotion Recognition From Speech
 
11.design and implementation of distributed space frequency to achieve cooper...
11.design and implementation of distributed space frequency to achieve cooper...11.design and implementation of distributed space frequency to achieve cooper...
11.design and implementation of distributed space frequency to achieve cooper...
 
IRJET- Wavelet Transform based Steganography
IRJET- Wavelet Transform based SteganographyIRJET- Wavelet Transform based Steganography
IRJET- Wavelet Transform based Steganography
 
IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...
IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...
IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...
 
G017513945
G017513945G017513945
G017513945
 
ENERGY EFFICIENT ANIMAL SOUND RECOGNITION SCHEME IN WIRELESS ACOUSTIC SENSORS...
ENERGY EFFICIENT ANIMAL SOUND RECOGNITION SCHEME IN WIRELESS ACOUSTIC SENSORS...ENERGY EFFICIENT ANIMAL SOUND RECOGNITION SCHEME IN WIRELESS ACOUSTIC SENSORS...
ENERGY EFFICIENT ANIMAL SOUND RECOGNITION SCHEME IN WIRELESS ACOUSTIC SENSORS...
 
I43014752
I43014752I43014752
I43014752
 
H010234144
H010234144H010234144
H010234144
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
IRJET- A Review for Reduction of Noise by Wavelet Transform in Audio Signals
IRJET- A Review for Reduction of Noise by Wavelet Transform in Audio SignalsIRJET- A Review for Reduction of Noise by Wavelet Transform in Audio Signals
IRJET- A Review for Reduction of Noise by Wavelet Transform in Audio Signals
 
Higher Order Low Pass FIR Filter Design using IPSO Algorithm
Higher Order Low Pass FIR Filter Design using IPSO AlgorithmHigher Order Low Pass FIR Filter Design using IPSO Algorithm
Higher Order Low Pass FIR Filter Design using IPSO Algorithm
 
Noise removal techniques for microwave remote sensing radar data and its eval...
Noise removal techniques for microwave remote sensing radar data and its eval...Noise removal techniques for microwave remote sensing radar data and its eval...
Noise removal techniques for microwave remote sensing radar data and its eval...
 
NOISE REMOVAL TECHNIQUES FOR MICROWAVE REMOTE SENSING RADAR DATA AND ITS EVAL...
NOISE REMOVAL TECHNIQUES FOR MICROWAVE REMOTE SENSING RADAR DATA AND ITS EVAL...NOISE REMOVAL TECHNIQUES FOR MICROWAVE REMOTE SENSING RADAR DATA AND ITS EVAL...
NOISE REMOVAL TECHNIQUES FOR MICROWAVE REMOTE SENSING RADAR DATA AND ITS EVAL...
 

En vedette

20 Ideas for your Website Homepage Content
20 Ideas for your Website Homepage Content20 Ideas for your Website Homepage Content
20 Ideas for your Website Homepage ContentBarry Feldman
 
Prototyping is an attitude
Prototyping is an attitudePrototyping is an attitude
Prototyping is an attitudeWith Company
 
10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer Experience10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer ExperienceYuan Wang
 
How to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanHow to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanPost Planner
 
Learn BEM: CSS Naming Convention
Learn BEM: CSS Naming ConventionLearn BEM: CSS Naming Convention
Learn BEM: CSS Naming ConventionIn a Rocket
 
SEO: Getting Personal
SEO: Getting PersonalSEO: Getting Personal
SEO: Getting PersonalKirsty Hulse
 

En vedette (6)

20 Ideas for your Website Homepage Content
20 Ideas for your Website Homepage Content20 Ideas for your Website Homepage Content
20 Ideas for your Website Homepage Content
 
Prototyping is an attitude
Prototyping is an attitudePrototyping is an attitude
Prototyping is an attitude
 
10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer Experience10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer Experience
 
How to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanHow to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media Plan
 
Learn BEM: CSS Naming Convention
Learn BEM: CSS Naming ConventionLearn BEM: CSS Naming Convention
Learn BEM: CSS Naming Convention
 
SEO: Getting Personal
SEO: Getting PersonalSEO: Getting Personal
SEO: Getting Personal
 

Similaire à T26123129

IRJET- Music Genre Recognition using Convolution Neural Network
IRJET- Music Genre Recognition using Convolution Neural NetworkIRJET- Music Genre Recognition using Convolution Neural Network
IRJET- Music Genre Recognition using Convolution Neural NetworkIRJET Journal
 
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...IRJET Journal
 
Recognition of music genres using deep learning.
Recognition of music genres using deep learning.Recognition of music genres using deep learning.
Recognition of music genres using deep learning.IRJET Journal
 
Investigating Multi-Feature Selection and Ensembling for Audio Classification
Investigating Multi-Feature Selection and Ensembling for Audio ClassificationInvestigating Multi-Feature Selection and Ensembling for Audio Classification
Investigating Multi-Feature Selection and Ensembling for Audio Classificationijaia
 
Anvita Eusipco 2004
Anvita Eusipco 2004Anvita Eusipco 2004
Anvita Eusipco 2004guest6e7a1b1
 
Attention gated encoder-decoder for ultrasonic signal denoising
Attention gated encoder-decoder for ultrasonic signal denoisingAttention gated encoder-decoder for ultrasonic signal denoising
Attention gated encoder-decoder for ultrasonic signal denoisingIAESIJAI
 
International Journal of Biometrics and Bioinformatics(IJBB) Volume (1) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (1) Issue...International Journal of Biometrics and Bioinformatics(IJBB) Volume (1) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (1) Issue...CSCJournals
 
A Combined Sub-Band And Reconstructed Phase Space Approach To Phoneme Classif...
A Combined Sub-Band And Reconstructed Phase Space Approach To Phoneme Classif...A Combined Sub-Band And Reconstructed Phase Space Approach To Phoneme Classif...
A Combined Sub-Band And Reconstructed Phase Space Approach To Phoneme Classif...April Smith
 
IRJET- A Survey on Sound Recognition
IRJET- A Survey on Sound RecognitionIRJET- A Survey on Sound Recognition
IRJET- A Survey on Sound RecognitionIRJET Journal
 
A Gaussian Clustering Based Voice Activity Detector for Noisy Environments Us...
A Gaussian Clustering Based Voice Activity Detector for Noisy Environments Us...A Gaussian Clustering Based Voice Activity Detector for Noisy Environments Us...
A Gaussian Clustering Based Voice Activity Detector for Noisy Environments Us...CSCJournals
 
IRJET- Musical Instrument Recognition using CNN and SVM
IRJET-  	  Musical Instrument Recognition using CNN and SVMIRJET-  	  Musical Instrument Recognition using CNN and SVM
IRJET- Musical Instrument Recognition using CNN and SVMIRJET Journal
 
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...ijtsrd
 
Environmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniqueEnvironmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniquePankaj Kumar
 
Performance analysis of the convolutional recurrent neural network on acousti...
Performance analysis of the convolutional recurrent neural network on acousti...Performance analysis of the convolutional recurrent neural network on acousti...
Performance analysis of the convolutional recurrent neural network on acousti...journalBEEI
 
A computationally efficient learning model to classify audio signal attributes
A computationally efficient learning model to classify audio  signal attributesA computationally efficient learning model to classify audio  signal attributes
A computationally efficient learning model to classify audio signal attributesIJECEIAES
 
Automatic Music Generation Using Deep Learning
Automatic Music Generation Using Deep LearningAutomatic Music Generation Using Deep Learning
Automatic Music Generation Using Deep LearningIRJET Journal
 
Cochlear implant acoustic simulation model based on critical band filters
Cochlear implant acoustic simulation model based on critical band filtersCochlear implant acoustic simulation model based on critical band filters
Cochlear implant acoustic simulation model based on critical band filtersIAEME Publication
 
A Study of Digital Media Based Voice Activity Detection Protocols
A Study of Digital Media Based Voice Activity Detection ProtocolsA Study of Digital Media Based Voice Activity Detection Protocols
A Study of Digital Media Based Voice Activity Detection Protocolsijtsrd
 

Similaire à T26123129 (20)

IRJET- Music Genre Recognition using Convolution Neural Network
IRJET- Music Genre Recognition using Convolution Neural NetworkIRJET- Music Genre Recognition using Convolution Neural Network
IRJET- Music Genre Recognition using Convolution Neural Network
 
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...
 
Recognition of music genres using deep learning.
Recognition of music genres using deep learning.Recognition of music genres using deep learning.
Recognition of music genres using deep learning.
 
1801 1805
1801 18051801 1805
1801 1805
 
Investigating Multi-Feature Selection and Ensembling for Audio Classification
Investigating Multi-Feature Selection and Ensembling for Audio ClassificationInvestigating Multi-Feature Selection and Ensembling for Audio Classification
Investigating Multi-Feature Selection and Ensembling for Audio Classification
 
Anvita Eusipco 2004
Anvita Eusipco 2004Anvita Eusipco 2004
Anvita Eusipco 2004
 
Attention gated encoder-decoder for ultrasonic signal denoising
Attention gated encoder-decoder for ultrasonic signal denoisingAttention gated encoder-decoder for ultrasonic signal denoising
Attention gated encoder-decoder for ultrasonic signal denoising
 
International Journal of Biometrics and Bioinformatics(IJBB) Volume (1) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (1) Issue...International Journal of Biometrics and Bioinformatics(IJBB) Volume (1) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (1) Issue...
 
A Combined Sub-Band And Reconstructed Phase Space Approach To Phoneme Classif...
A Combined Sub-Band And Reconstructed Phase Space Approach To Phoneme Classif...A Combined Sub-Band And Reconstructed Phase Space Approach To Phoneme Classif...
A Combined Sub-Band And Reconstructed Phase Space Approach To Phoneme Classif...
 
IRJET- A Survey on Sound Recognition
IRJET- A Survey on Sound RecognitionIRJET- A Survey on Sound Recognition
IRJET- A Survey on Sound Recognition
 
A Gaussian Clustering Based Voice Activity Detector for Noisy Environments Us...
A Gaussian Clustering Based Voice Activity Detector for Noisy Environments Us...A Gaussian Clustering Based Voice Activity Detector for Noisy Environments Us...
A Gaussian Clustering Based Voice Activity Detector for Noisy Environments Us...
 
CASR-Report
CASR-ReportCASR-Report
CASR-Report
 
IRJET- Musical Instrument Recognition using CNN and SVM
IRJET-  	  Musical Instrument Recognition using CNN and SVMIRJET-  	  Musical Instrument Recognition using CNN and SVM
IRJET- Musical Instrument Recognition using CNN and SVM
 
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
 
Environmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniqueEnvironmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC technique
 
Performance analysis of the convolutional recurrent neural network on acousti...
Performance analysis of the convolutional recurrent neural network on acousti...Performance analysis of the convolutional recurrent neural network on acousti...
Performance analysis of the convolutional recurrent neural network on acousti...
 
A computationally efficient learning model to classify audio signal attributes
A computationally efficient learning model to classify audio  signal attributesA computationally efficient learning model to classify audio  signal attributes
A computationally efficient learning model to classify audio signal attributes
 
Automatic Music Generation Using Deep Learning
Automatic Music Generation Using Deep LearningAutomatic Music Generation Using Deep Learning
Automatic Music Generation Using Deep Learning
 
Cochlear implant acoustic simulation model based on critical band filters
Cochlear implant acoustic simulation model based on critical band filtersCochlear implant acoustic simulation model based on critical band filters
Cochlear implant acoustic simulation model based on critical band filters
 
A Study of Digital Media Based Voice Activity Detection Protocols
A Study of Digital Media Based Voice Activity Detection ProtocolsA Study of Digital Media Based Voice Activity Detection Protocols
A Study of Digital Media Based Voice Activity Detection Protocols
 

Dernier

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Dernier (20)

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

T26123129

  • 1. Subrata Mondal, Shiladitya Pujari, Tapas Sangiri / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 6, November- December 2012, pp.123-129 Environmental Natural Sound Detection And Classification Using Content-Based Retrieval (CBR) And MFCC Subrata Mondal1, Shiladitya Pujari2, Tapas Sangiri3 1 Technical Assistant, Department of Computer Science, Bankura Unnayani Institute of Engineering, Bankura, West Bengal, India 2 Assistant Professor, Department of IT, University Institute of Technology, Burdwan University, Burdwan, West Bengal, India 3 Assistant Professor, Department of IT, Bankura Unnayani Institute of Engineering, Bankura, West Bengal, India ABSTRACT This paper deals with the extraction and important aspects of our surroundings environment. classification of environmental natural sounds We have presented the different phases of the entire with the help of content-based retrieval method. process such as capturing the environmental audio, Environmental sounds are extremely pre- processing of audio data, feature extraction, unpredictable and are very much hard to classify training and lastly the testing phase. and store in cluster forms according to its feature The major steps involved in the entire method are as contents. Environmental sounds provide many follows: contextual clues that enable us to recognize  Extraction of features for classifying highly important aspects of our surroundings diversified natural sounds. environment. This paper presented the techniques  Making clusters according to their feature that allow computer system to extract and classify similarity. features from predefined classes of sounds in the  Finding a match for a particular sound environment. query from the cluster. The audio files used for testing purpose are Keywords-content-based retrieval, feature of various categories in sample rate, bit rate and no extraction, environmental sound, Mel of channels, but files has been digitized to similar frequency Cepstral coefficiant. type, i.e. 22k Hz, 16 bit, mono channel. Then all the sounds normalized to -6 db and DC offset corrected I. INTRODUCTION to 0%. This reprocessing of sound data helps The development of high speed computer considering low level descriptor in the sound files has developed a tremendous opportunity to handle are similar, hence we can proceed without any large amount of data from different media. The drawback in primary classification methods. proliferation of world-wide- web has farther accelerated its growth and this trend is expected to II. RELATED WORKS continue at an even faster pace in the coming years. A brief survey of previous works relevant Even in the past decade textual data was the major to our work is presented here. Parekh R [1] proposed form of data that was handled by the computer. But a statistical method of classification based on Zero above advancement in technology has led to uses of crossing rate (ZCR) and Root means square (RMS) non textual data such as video, audio and image value of audio file. For this purpose sounds has been which are known as multimedia data and the categorized in three semantic classes. 1. Aircraft database deals with this type of data is called sound, 2.Bird sound, 3.Car sound. First and third are multimedia database. This database has much more inorganic in nature whereas second is organic in capability than traditional database. One of the best nature. The author tried to extract samples, features of multimedia database is content-based amplitudes and frequencies of those sounds. To retrieval (CBR). In content base retrieval process the generate the feature vector, ZCR and RMS methods major advantage is that it can search data by its are used. Statistical measure is applied to these content rather textual indexing. This paper deals features σZ, μZ, μR, σR, T', R' and values are with the classification of environmental sounds calculated. classification with the help of content-based retrieval and Mel frequency cepstral coefficient (MFCC). Goldhor R. S. et al [2] presented a method Environmental sounds are extremely unpredictable of classification technique based on dimensional so it's really hard to classify and store in cluster Cepstral Co-efficient. They segmented the sound forms according to its feature contents. wave into tokens of sound according to signal power Environmental sounds provide many contextual Histogram. From the histogram statistics, a threshold clues or characteristics that enable us to recognize –power level was calculated to separate portions of 123 | P a g e
  • 2. Subrata Mondal, Shiladitya Pujari, Tapas Sangiri / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 6, November- December 2012, pp.123-129 the signal stream containing the sound from portions human perception. They developed an algorithm containing only background noise. Result is derived based on matching approximation of repeating parts from a high SNR recording. As feature vector they of the sequence unit. For separating pattern they use used two-dimensional cepstral-coefficients. They the fact that silent segments affect human perception calculated cepstral Coefficient for each token by and are recognized as the separators of perceived using Hamming window technique. They also used patterns, determined the length of the silent portion two types of cepstral coefficients; one is linear by dynamic process of histogram of length of silent cepstral coefficient and another is the Mel frequency segments using threshold method. Finally distances cepstral coefficient. between every pair of the units are computed by In their paper, Toyoda Y. et al [3] presented DTW which minimizes the total distance between the uses of neural network to classify certain types units by aligning their time series. sound such as crashing sound, clapping sound, bell sound, switch typing sound etc All these sounds Janku L. [6] used MPEG7 low level were classified into three categories; burst sound, descriptor for sound recognition with the help of repeated sound and continuous sound. For this hidden markov model. Author used four low level purpose they fetched the environmental sounds from descriptor defined in mpeg 7. Those are Audio a sound database RWCP-OB. For further spectrum envelop, audio spectrum centroid, audio improvement, they developed multistage neural spectrum spread and audio spectrum flatness. The network. In this work, a three layer perception neural testing method described is a three-step procedure: network is used. In the first step of recognition they (a) Receiving audio wave file and taking 512 points used long time power pattern. This classifies the short time Fourier transform. sound as single impact. Second stage of (b) According to the presented spectrogram, we take classification was done by combination of short time the associated parameters. power pattern and frequency at power peak. (c) Taking the SVD of flatness and running the Hattori Y. et al [4] proposed an Viterbi algorithm to get the sound class candidate. environmental sounds recognition system using LPC - Cepstral coefficients for characterization and Chen Z. and Maher R. C. [7] dealt with artificial neural network as verification method. This Automatic off-line classification and recognition of system was evaluated using a database containing bird vocalizations. They focused on syllable level files of different sound-sources under a variety of discrimination of bird sound. For this purpose they recording conditions. Two neural networks were developed a set of parameters those are spectral trained with the magnitude of the discrete Fourier frequencies, frequency differences, track shape, transform of the LPC Cepstral matrices. The global spectral power, and track duration. The parameters percentage of verification was of 96.66%. First they include the frequencies, the frequency differences, collected a common sound database. A segmentation the relative the shape, and the duration of the algorithm is applied to each file. For this purpose spectral peak tracks. These parameters were selected they uses 240 point Hamming window with frame based on several key insights. For the feature overlap of 50%. LPC cepstral coefficient is extraction they had two choices: MPEG-7 audio computed and DFT is measured. For LPC cepstral features and Mel-scale Frequency Cepstral calculating they used Levinson-Durbin recursion. Coefficients (MFCC), For the classification they had two choices: Maximum Likelihood Hidden Markov Wang J. et al [5] focused on repeat Models (ML-HMM) and Entropic Prior HMM (EP- recognition in environmental sounds. For this HMM. They have achieved accuracy of 90% with purpose they designed a three module system; 1) the best and the second best being MPEG-7 features Unit of the repetitions; 2) Distance between units; 3) with EP- HMM and MFCC with ML-HMM. Algorithm to detect repeating part. In their paper, Constantine G. et al [8] For unit extraction they used the concept of a peak in presented a comparison of three audio taxonomy power envelop. They extracted peak power with the methods for MPEG-7 sound classification. For the help of following: low-level descriptors they used, low-dimensional 1.The power is below the threshold T L features based on spectral basis descriptors are 2. The interval between the maximum and the produced in three stages: neighbor maximum is than the threshold T2.  Normalized audio spectrum envelope. 3. The ratio of the power to the power of the  Principal component analysis. neighbor local minimum is than the threshold T3.  Independent component analysis. Distance of every pair of unit was calculated using dynamic time warping MFCC. High-level description schemes they used to describe the modeling of audio features, the Mel frequency cepstral coefficient (MFCC) was procedure of audio classification, and retrieval. For developed by Stevens and was designed to adapt classification they tested three approaches: 124 | P a g e
  • 3. Subrata Mondal, Shiladitya Pujari, Tapas Sangiri / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 6, November- December 2012, pp.123-129  Mel frequency cepstral coefficients (music  The direct approach. and speech).  The hierarchical approach without hints.  Linear prediction cepstral (LPC)  Hierarchical approach with hints. coefficients.  Mel frequency LPC coefficients. They suggested that best approach is  Bark frequency cepstral coefficients. hierarchical approach with hints, which results in  Bark frequency LPC coefficients. classification accuracy of around 99%. The direct  Perceptual linear prediction (PLP) features. approach produces the second best results, and the hierarchical approach without hints the third best Cepstral frequency based feature results. extractions are pseudo stationary feature extraction techniques because they split the signal into many III. SOUND ANALYSIS TECHNIQUE small part in time domain. Techniques based on LPC Undoubtedly the most important part of the coefficients were based on the idea of a decoder, entire process is to extract the features from the audio which is a simulation of the human vocal tract. Since signal. The audio signal feature extraction in a the human vocal tract does not produce categorization problem. It is about reducing the environmental sounds, these techniques typically dimensionality of the input-vector while maintaining seem to highlight non-unique features in the sound the discriminating power of the signal. It is clear and are therefore not appropriate for recognition of from the above discussion of environmental sound non- speech sounds. identification and verification systems, that the training and test vector needed for the classification The main non stationary feature extraction problem grows dimension of the given input vector techniques are: [2-3], feature extraction is necessary. More variety in sound needs more variety in features for a particular  Short-time Fourier transforms (STFT). genre of sound.  Fast (discrete) wavelet transform (FWT). There are various techniques suitable for  Continuous wavelet transforms (CWT). environmental sound recognition. Recognition of  Wigner-Ville distribution (WVD). sound (whether it is speech or environmental) is generally consists of two phases: These techniques use different algorithms  The Feature Extraction phase, followed by to represent time frequency based representation of  The Classification (using artificial signals. Short-time Fourier transforms use standard intelligence techniques or any other) phase. Fourier transforms of the signal where as wavelet In feature extraction phase, sound is based techniques apply a mother wavelet to a manipulated in order to produce a set of waveform to surmount the resolution issues inherent characteristics. Classification phase is to recognize in STFT. WVD is a bilinear time-frequency the sound by cataloging the features of existing distribution. sounds during training session and then comparing the test sound to this database of features (which is Moreover there are some other features called the testing session). These two phases has been which are suitable for classification of described in the following text. environmental sounds. These features are described below: A. Feature Extraction Feature extraction mainly of two types. (a) Zero-crossing rate: Frequency based feature extraction and (b) Time- The zero-crossing rate is a measure of the frequency based feature extraction. number of time the signal value crosses the zero axis. Periodic sounds tend to have small value of it, Frequency based feature extraction method while noisy sounds tend to have a high value of it. It produces an overall result detailing the contents of is computed in each time frame on the signal. the entire signal. According to Cowling [11] Frequency extraction is stationary feature extraction Spectral Features: and Time-frequency extraction is non-stationary Spectral features are single valued features, feature extraction, calculated using the frequency spectrum of a signal. Thus for the time domain signal x (t): Eight popular stationary feature extraction methods A (f) = 𝐹[𝑥 𝑡 ] are there:  Frequency extraction (music and speech). Spectral Centroid:  Homomorphic cepstral coefficients. The Spectral Centroid (μ) is bar center of the spectrum. It is a weighted average of the 125 | P a g e
  • 4. Subrata Mondal, Shiladitya Pujari, Tapas Sangiri / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 6, November- December 2012, pp.123-129 probability to observe the normalized amplitude. B. Sampling : Given A (f) from the equation mentioned above It is the process of converting a continuous signal into a discrete signal. Sampling can be done 𝜇= 𝑓. 𝑝 𝑓 𝛿𝑓 for signals varying in space, time, or any other dimension, and similar results are obtained in two or Where, p (f) = A (f) / ∑A (f) more dimensions. For signals that vary with time, let Spectral spread: x(t) be a continuous signal to be sampled, and let The spectral spread (σ) is a the spectrum around the sampling be performed by measuring the value of mean value calculated in following equation the continuous signal every T seconds, which is called the sampling interval. Thus, the sampled signal x[n] given by: σ2 = 𝑓 − . 2𝑝 𝑓 δ 𝑓 x[n] = x(nT), with n = 0, 1, 2, 3, ... Spectral Roll-off: The spectral roll-off point (fc) is 95% of the signal The sampling frequency or sampling rate fs energy below this frequency. is defined as the number of samples obtained in one second, i.e. fs = 1/T. The sampling rate is measured IV. DESIGN ARCHITECTURE in hertz or in samples per second. In context of multimedia database system major challenge is to classify the environmental C. Pre-emphasis : sound because they are so unpredictable in nature In processing of electronic audio signals, that they are harder to categorize into smaller number pre-emphasis refers to a system process designed to of classes. Classification problem concerns with increase (within a frequency band) the magnitude of determining two samples match each other or not some (usually higher) frequencies with respect to the under some similarity criterion and to different magnitude of other (usually lower) frequencies in classes keeping similar type of samples in a class. In order to improve the overall signal-to-noise ratio case of Content Based Retrieval (CBR) system we (SNR) by minimizing the adverse effects. must consider one thing that how might people access the sound from database. That may be similar D. Windowing : one sound or a group of sounds in terms of some In signal processing, a window function characteristics. For example, class of bird sound, (also known as tapering function) is a mathematical class of sound of car, where system has been function that is zero-valued outside of some chosen previously train on other sound in similar class. To interval. For instance, a function that is constant achieve the desired result, we have divided the entire inside the interval and zero elsewhere is called a procedure into two sessions - training and testing rectangular window, which describes the shape of its session. graphical representation. In order to make an automated Window Terminology: classification of environmental sound, main part is to  N represents the width, in samples, of a find out the right features of the sound. In this paper, discrete-time, symmetrical window function. we have selected Mel frequency Cepstral Coefficient  n is an integer, with values 0 ≤ n ≤ N-1. (MFCC) to extract features from the sound signal and compare the unknown signal with the existing 1) Hamming window: signals in the multimedia database. Hamming window is also called the raised cosine window. The equation and plot for the Hamming window shown below. In a window function there is a zero outside of some chosen interval. For example, a function that is stable inside the interval and zero elsewhere is called a rectangular window that illustrates the shape of its graphical representation, When signal or any other function is multiplied by a window function, the product is also zero valued outside the interval. The windowing is done to avoid problems due to truncation of the signal. Window function has some other applications such as spectral analysis, filter design, audio data compression. Figure 1: Mel Frequency Cepstral Coefficient pipeline 126 | P a g e
  • 5. Subrata Mondal, Shiladitya Pujari, Tapas Sangiri / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 6, November- December 2012, pp.123-129 labeled to 2000 mel. The process is repeated for half of the frequency, then this frequency as 500 and so on. On this basis the normal frequency is mapped into the mel frequency. The mel is normally a linear mapping below 1000 Hz and logarithmically above 1000 Hz. Transformation of normal frequency into mel frequency and vice versa can be obtained by using the following formulas shown here: 𝑓 𝑚 = 1127.01048 log⁡ + (1 )……….... (1) Figure 2: Example of hamming window 700 The raised cosine with these particular Where m is the Mel frequency and f is the normal coefficients was proposed by Richard W. Hamming. frequency. The window is optimized to minimize the maximum 𝑒 𝑚 (nearest) side lobe, giving it a height of about one- 𝑓 = 700(1127 .01048 − 1)………………… (2) fifth that of the Hann window, a raised cosine with simpler coefficients. Here mel frequency is transformed into normal Function plotted on the graph bellow : frequency. Figure 4 showing the mapping of normal frequencies to the mel frequencies. Figure 3: Function is plotted Figure 4: Mapping of normal frequency into Mel E. Cepstrum : frequency Cepstrum name was derived, from the spectrum by reversing the first four spectrums. We F. Distance measure: can say cepstrum is the Fourier Transformer of the In sound recognition phase, an unknown log with phase of the Fourier Transformer frequency sound is represented by a sequence of feature vectors bands are not positioned logarithmically. As the (x1, x2 ... xi), and then it is compared with the other frequency bands are positioned logarithmically in sound from the database. In order to identify the the Fourier Transform the MFCC approximates the unknown speaker, this can be done by measuring the human system response better than any other system. distortion distance of two vector sets based on These coefficients allow better processing of data. In minimizing the Euclidean distance. The Euclidean the Mel Frequency Cepstral Coefficients the distance is the scalar distance between the two points calculation of the mel is as the real Cepstrum except that one would measure with a ruler, which can be the Mel cepstrum's frequency is to up a proven by repeated application of the Pythagorean correspondence to the Mel scale. The Mel scale was Theorem, The formula used to calculate the projected by Stevens, Volkmann and Newman In Euclidean distance is as follows: 1937.The Mel scale is mainly based on the study of The Euclidean distance between two points P = observing the pitch or frequency of human. The (x1, x2...xn) and Q = (y1, y2… yn) scale is divided into the units mel. In this test the 𝒏 person started out hearing a frequency of 1000 Hz, and labeled it 1000 mel. Listeners were asked to 𝒅= (𝒙 𝒊 − 𝒚 𝒊 ) 𝟐 change the frequency till it reaches to the frequency 𝒊=𝟏 of the reference frequency. Then this frequency is 127 | P a g e
  • 6. Subrata Mondal, Shiladitya Pujari, Tapas Sangiri / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 6, November- December 2012, pp.123-129 The sound signal with the lowest distortion distance that both look for linear combinations of variables is chosen to be identified as the unknown signal. which best explains the data. LDA explicitly attempts to model the difference between the classes G. Fast Fourier Transformation : of data. PCA on the other hand does not take into FFTs are of great importance to a wide account any difference in class, and factor analysis variety of applications, from digital signal builds the feature combinations based on differences processing and solving partial differential equations rather than similarities. Discriminate analysis is also to algorithms for quick multiplication of large different from factor analysis in that it is not an integers. interdependence technique: a distinction between independent variables and dependent variables (also H. Absolute value: called criterion variables) must be made. In mathematics, the absolute value (or modulus) |a| of a real number a is the numerical V. TRAINING AND TESTING SESSIONS value of a without its sign. The absolute value of a As stated earlier, due to achieve the best number may be thought of as its distance from zero. result, the entire sound detection and classification Generalizations of the absolute value for real process is divided into two sessions. One is called numbers occur in a wide variety of mathematical the training session, during which the system is settings. For example an absolute value is also being trained with known sound signals and features defined for the complex numbers, the quaternion, of that known signals are extracted. These extracted ordered rings, fields and vector spaces. The absolute features are then stored into the multimedia database value is closely related to the notions of magnitude, as the reference model for future reference. Figure 5 distance, and norm in various mathematical and shows the processes during the training session. physical contexts. Form this figure, we can see that the known sound signal is first re-sampled, then re-referenced and I. DCT : normalized. The normalized signal is then passed In particular, a DCT is a Fourier-related through the process to extract the features. System is transform similar to the discrete Fourier transform then trained with the help of these features and (DFT), but uses only real numbers. DCTs are reference model is built. equivalent to DFTs of roughly twice the length, operating on real data with even symmetry (since the Fourier transform of a real and even function is real and even), where in some variants the input and/or output data are shifted by half a sample. There are eight standard DCT variants, of which four are commonly used. J. Linear Discriminate Analysis (LDA): Linear discriminate analysis (LDA) and the related Fisher's linear discriminate are methods used in statistics, pattern recognition and machine learning to find a linear combination of features which characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier or, more commonly, for dimensionality reduction before later classification. LDA is closely related to ANOVA (analysis of variance) and regression analysis, which also attempts to express one dependent variable as a linear combination of other features or measurements. In the other two methods, however, the dependent variable is a numerical quantity, while for LDA it is a categorical variable (i.e. the class Figure 5: Flow chart of Training Session label). Logistic regression is more similar to LDA, as it also explains a categorical variable. These other Through the training session, the methods are preferable in applications where it is not multimedia database is populated by feature vectors reasonable to assume that the independent variables of known sound signals so that those known are normally distributed, which is a fundamental environmental sound can be detected and classified assumption of the LDA method. later. In case of unknown signals, signals are newly LDA is also closely related to principal classified during training session and detection can component analysis (PCA) and factor analysis in be done with that classification. 128 | P a g e
  • 7. Subrata Mondal, Shiladitya Pujari, Tapas Sangiri / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 6, November- December 2012, pp.123-129 power. Using more than one features of a sound may Another session of the detection and obviously improve the performance of the method. classification process is the testing session. During Applying clustering technique, accuracy can be this session, depicted in figure 6, the input signal is boosted. Another good feature available today is sampled, DC offset is determined and then Audio spectrum projection provided by MPEG7 normalization is done. After normalization, features specification. Inclusion of this feature may increase are extracted for the input signal. Then for a the performance measure of the method. particular sound, feature vectors are retrieved from the multimedia database using content-based REFERENCES retrieval (CBR) method. The retrieved features are [1] Ranjan Parekh, "Classification Of then matched with the feature vector of the input Environmental Sounds",ICEMC2 sound signal. 2007, Second International Conference on Embedded Mobile Commu- nication and Computing, ICEMC2 2007. [2] R. S. Goldhor, "Recognition of Environmental Sounds/' Proceedings of ICASSP, vol.1, pp.149-152, April 1993. [3] Yoshiyuki Toyoda, Jie Huang, Shuxue Ding,Yong Liu, "Environmental Sound Recognition by Multi layered Neural Networks" ,The Fourth International Conference on Computer School of Education Technology, Jadavpur University, Kolkata 32 and Information Technology (CIT'04) pp. 123-127,2004. [4] Yuya Hattori, Kazushi Ishihara, Kazunori Komatani, Tetsuya Ogata, Hiroshi G.Okuno/Repeat Recognition for Environmental Sounds", Proceedings of the 2004 IEEE International Workshop on Robot and Human Interactive Communication Kurashiki, Okayama Japan September 20-22, pp. 83-88, 2004. [5] Jia-Ching Wang, Jhing-Fa Wang, Kuok Figure 6: Flowchart of Testing Session Wai He, and Cheng-Shu Hsu/'Environmental Sound Classification After feature extraction, the matching Using Hybrid SVM/KNN Classifier and process is carried on and feature vector of input MPEG-7 Audio Conference on Low-Level signal is matched with every reference model of Descriptor", 2006 International Joint Neural sound signals stored in the database. Whenever a Networks Sheraton Vancouver Wall Centre maxim um matching is found, the input signal is said Hotel, Vancouver, BC, Canada July 16-21, to be detected. This is the entire process of detection 2006. and classification of environmental sounds. In both [6] Ladislava Janku, "Artificial Perception: sessions, feature extraction process is done using the Auditory Decomposition of Mixtures of MFCC pipeline. Environmental Sounds - Combining Information Theoretical and Supervised VI. CONCLUSION Pattern Recognition Approaches", P. Van This method of environmental sound Emde Boas et al. (Eds.): SOFSEM 2004, detection and classification is developed using LNCS 2932, pp. 229-240,2004 MFCC pipeline and CBR for extraction of features [7] Z, Chen, R. C, Maher, "'Semi-automatic of a particular sound and retrieval of sound features classification of bird vocalizations1 Accost from the multimedia database respectively. This Soc Am., Vol. 120, No. 5, pp. 2974-2984, method can be implemented in the domain of November 2006. robotics where sound detection and recognition may [8] G. Constantine, A, Rizzi, and D, Casali, be possible up to a satisfactory level. If the method '"Recognition of musical instruments by will be properly implemented with computer vision, generalized min-max classifiers/' in IEEE then human-computer interaction process can be Workshop on Neural Networks for Signal developed much. MFCC is undoubtedly more Processing, pp. 555-564, 2003. efficient feature extraction method because it is designed by giving emphasis on human perception 129 | P a g e