SlideShare une entreprise Scribd logo
1  sur  6
Télécharger pour lire hors ligne
International Journal of Trend in Scientific Research and Development (IJTSRD)
Volume 6 Issue 1, November-December 2021 Available Online: www.ijtsrd.com e-ISSN: 2456 – 6470
@ IJTSRD | Unique Paper ID – IJTSRD47958 | Volume – 6 | Issue – 1 | Nov-Dec 2021 Page 922
Speech Emotion Recognition Using Neural Networks
Anirban Chakraborty
Research Scholar, Department of Artificial Intelligence, Lovely Professional University, Jalandhar, Punjab, India
ABSTRACT
Speech is the most natural and easy method for people to
communicate, and interpreting speech is one of the most
sophisticated tasks that the human brain conducts. The goal of
Speech Emotion Recognition (SER) is to identify human emotion
from speech. This is due to the fact that tone and pitch of the voice
frequently reflect underlying emotions. Librosa was used to analyse
audio and music, sound file was used to read and write sampled
sound file formats, and sklearn was used to create the model. The
current study looked on the effectiveness of Convolutional Neural
Networks (CNN) in recognising spoken emotions. The networks'
input characteristics are spectrograms of voice samples. Mel-
Frequency Cepstral Coefficients (MFCC) are used to extract
characteristics from audio. Our own voice dataset is utilised to train
and test our algorithms. The emotions of the speech (happy, sad,
angry, neutral, shocked, disgusted) will be determined based on the
evaluation.
KEYWORDS: Speech emotion, Energy, Pitch, Librosa, Sklearn,
Sound file, CNN, Spectrogram, MFCC
How to cite this paper: Anirban
Chakraborty "Speech Emotion
Recognition Using Neural Networks"
Published in
International
Journal of Trend in
Scientific Research
and Development
(ijtsrd), ISSN: 2456-
6470, Volume-6 |
Issue-1, December
2021, pp.922-927, URL:
www.ijtsrd.com/papers/ijtsrd47958.pdf
Copyright © 2021 by author(s) and
International Journal of Trend in
Scientific Research and Development
Journal. This is an
Open Access article
distributed under the
terms of the Creative Commons
Attribution License (CC BY 4.0)
(http://creativecommons.org/licenses/by/4.0)
I. INTRODUCTION
Speech emotion recognition (SER) is a technique that
extracts emotional features from speech by analysing
distinctive characteristics and the acquired emotional
change. At the moment, voice emotion recognition is
a developing artificial intelligence cross-field [1]. A
voice emotion processing and recognition system is
made up of three parts: speech signal acquisition,
feature extraction, and emotion recognition. In this
method, the extraction quality has a direct impact on
the accuracy of speech emotion identification. In
feature extraction, the entire emotion sentence was
frequently used as a unit for feature extraction and
extraction contents. The neural networks of the
human brain are highly capable of learning high-level
abstract notions from low-level information acquired
by the sensory periphery. Humans communicate
through voice, and interpreting speech is one of the
most sophisticated operations that the human brain
conducts. It has been argued that children who are not
able to understand the emotional states of the
speakers developed poor social skills and in some
cases they show psychopathological symptoms [2, 3].
This highlights the importance of recognizing the
emotional states of speech in effective
communication. Detection of emotion from facial
expressions and biological measurements such as
heart beats or skin resistance formed the preliminary
framework of research in emotion recognition[4].
More recently, emotion recognition from speech
signal has received growing attention. The traditional
approach toward this problem was based on the fact
that there are relationships between acoustic features
and emotion. In other words, the emotion is encoded
by acoustic and prosodic correlates of speech signals
such as speaking rate, intonation, energy, formant
frequencies, fundamental frequency (pitch), intensity
(loudness), duration (length), and spectral
characteristic (timbre) [5, 6]. There are a variety of
machine learning algorithms that have been examined
to classify emotions based on their acoustic correlates
in speech utterances. In the current study, we
investigated the capability of convolutional neural
networks in classifying speech emotions using our
own dataset. There are a variety of machine learning
algorithms that have been examined to classify
emotions based on their acoustic correlates in speech
utterances. In the current study, we investigated the
capability of convolutional neural networks in
classifying speech emotions using our own dataset.
The specific contribution of this study is using
IJTSRD47958
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
@ IJTSRD | Unique Paper ID – IJTSRD47958 | Volume – 6 | Issue – 1 | Nov-Dec 2021 Page 923
wideband spectrograms instead of narrow-band
spectrograms as well as assessing the effect of data
augmentation on the accuracy of models. Our results
revealed that wide-band spectrograms and data
augmentation equipped CNNs to achieve the state-of-
the art accuracy and surpass human performance.
Fig.1. Speech emotion recognition block diagram
II. RELATED WORK
Most of the papers published in last decade use spectral and prosodic features extracted from raw audio signals.
The process of emotion recognition from speech involves extracting the characteristics from a corpus of
emotional speech selected or implemented, and after that, the classification of emotions is done on the basis of
the extracted characteristics. The performance of the classification of emotions strongly depends on the good
extraction of the characteristics (such as combination of MFCC acoustic feature with the energy prosodic feature
[7]. Yixiong Pan in [8] used SVM for three class emotion classification on Berlin Database of Emotional Speech
[9] and achieved 95.1% accuracy.
Norooziet.al. Proposed a versatile emotion recognition system based on the analysis of visual and auditory
signals. He used 88 features (Mel frequency cepstral coefficients
(MFCC), filter bank energies (FBEs)) using the Principal Component Analysis (PCA) infeature extraction to
reduce the dimension of features previously extracted revealed that wide-band spectrograms and data
augmentation equipped CNNs to achieve the state-of-the art accuracy and surpass human performance.
The performance of the classification of emotions strongly depends on the good extraction of the characteristics
(such as combination of MFCC acoustic feature with the energy prosodic feature [7]. Yixiong Pan in [8] used
SVM for three class emotion classification on Berlin Database of Emotional Speech [9] and achieved 95.1%
accuracy.
Norooziet.al. proposed a versatile emotion recognition system based on the analysis of visual and auditory
signals. He used 88 features (Mel frequency cepstral coefficients (MFCC), filter bank energies (FBEs)) using the
Principal Component Analysis (PCA) in feature extraction to reduce the dimension of features previously
extracted [10]. S. Lalitha in [11] used pitch and prosody features and SVM classifier reporting 81.1% accuracy
on 7 classes of the whole Berlin Database of Emotional Speech. Zamil et al also used the spectral characteristics
which is the 13 MFCC obtained from the audio data in their proposed system to classify the 7 emotions with the
Logistic Model Tree (LMT) algorithm with an accuracy rate 70% [12]. Yu zhou in [13] combined prosodic and
spectral features and used Gaussian mixture model super vector based SVM and reported 88.35% accuracy on 5
classes of Chinese-LDC corpus.
H.M Fayek in [14] explored various DNN architecture and reported accuracy around 60% on two different
database eNTERFACE [15] and SAVEE [16] with 6 and 7 classes respectively. Fei Wang used combination of
Deep Auto Encoder, various features and SVM in [17] and reported 83.5% accuracy on 6 classes of Chinese
emotion corpus CASIA. In contrast to these traditional approaches more novel papers have been published
recently employing Deep Neural Networks into their experiments with the promising results. Many authors
agree that the most important audio characteristics to recognize emotions are spectral energydistribution, Teager
Energy Operator (TEO) [18], MFCC, Zero Crossing Rate (ZCR), and the energy parameters of the filter bank
energies (FBEs) [19].
III. TRADITIONAL SYSTEM
The traditional system was based on the analysis and comparison of all kinds of emotional characteristic
parameters, selecting emotional characteristics with high emotional resolution for feature extraction. In general,
the traditional emotional feature extraction concentrates on the analysis of the emotional features in the speech
from time construction, amplitude construction, and fundamental frequency construction and signal feature [28].
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
@ IJTSRD | Unique Paper ID – IJTSRD47958 | Volume – 6 | Issue – 1 | Nov-Dec 2021 Page 924
IV. PROPOSED METHOD
Convolutional Neural Network (CNN) is used to classify the emotions (happy, sad, angry, neutral, surprised,
disgust) and to predict the output by showing its accuracy.
The given speech is plotted as spectrogram by using matplot library and this is used as input for CNN to build
the model.
Fig.2. Flow diagram of proposed system
A. Data Set Collection
The first step is to create an empty dataset that will hold the training data for the model. After creating an empty
dataset, the data’s (audio) have to be recorded and labeled in different classes. Once the labeling is done, the
data’s have to be preprocessed which will produce the clear pitch of the data by removing its unwanted
background noise. After preprocessing the data’s are classified into train dataset and test dataset, where the train
dataset hold 75% of the data and the test dataset holds 25% of the data.
B. Feature Extraction of Speech Emotion
Human speech consists of many parameters which show the emotions compromise in it. As there is change in
emotions these parameters also gets changed. Hence it’s necessary to select proper feature vector to identify the
emotions. Features are categorized as excitation source features, spectral features, and prosodic features.
Excitation source features are achieved by suppressing characteristics of vocal tract (VT). Spectral features used
for emotion recognition are linear prediction coefficients (LPC), Perceptual Linear prediction coefficients
(PLPCs), Mel-frequency cepstral coefficients (MFCC), linear prediction cepstrum coefficients (LPCC), and
perceptual linear prediction (PLP). The accuracy of differentiating different emotions can be achieved by using
MFCC, LFPC
[20, 21].
C. Mel-Frequency Cepstral Coefficients
The Mel-Frequency Cepstral Coefficients (MFCC) feature extraction method is a leading approach for speech
feature extraction. The various steps involved in MFCC feature extraction are:
Fig.3. Flow of MFCC A/D conversion:
This converts the analog signal into discrete space.
Pre-emphasis:
This boosts the amount of energy in the high frequencies.
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
@ IJTSRD | Unique Paper ID – IJTSRD47958 | Volume – 6 | Issue – 1 | Nov-Dec 2021 Page 925
Windowing:
Windowing involves the slicing of audio waveform into sliding frames. Discrete Fourier Transform:
DFT is used to extract information in the frequency domain [22, 23].
D. Classifiers
After extracting features of speech, it is essential to select a proper classifier. Classifiers are used to classify
emotions. In the current study, we use Convolutional Neural Network (CNN). The term Convolutional comes
from the fact that Convolution-the mathematical operation is employed in these networks. Convolutional Neural
Networks is one of the most popular Deep Learning Models that have manifested remarkable success in the
research areas. CNN is a deep learning algorithm that takes image as an input, assign importance to various
aspects in the image and will be able to differentiate from other. Generally CNNs have three building blocks: the
convolutional layer, the pooling layer, and the fully connected layer. Following, we describe these building
blocks along with some basic concept such as soft max unit, rectified linear unit, and drop out.
Input layer: This layer holds the raw input image.
Convolution Layer: This layer computes the output volume by computing dot product between all filters
and image patch.
Activation Function Layer: This layer will apply element wise activation function to the output of
convolution layer.
Pool Layer: This layer is periodically inserted in CNN and its main function is to reduce the size of volume
which makes computation fast and reduces memory. The two types are Maxpooling and average pooling.
Fully-Connected Layer: This layer takes input from the previous layer and computes the class scores and
outputs the 1-D array of size equal to the number of classes [24, 25].
Fig.4. CNN Algorithm
V. APPLICATION
The applications of speech emotion recognition
system are, psychiatric diagnosis, conversation with
robots, intelligent toys, mobile based emotion
recognition, emotion recognition in call centre where
emotions of customer can be identified and can help
to get better service quality, intelligent tutoring
system, lie detection, games[26,27]. It is also used in
healthcare, Psychology, cognitive science and
marketing, voice-based virtual assistants.
VI. CONCLUSION
In this research, we suggested a technique for
extracting the emotional characteristic parameter
from an emotional speech signal using the CNN
algorithm, one of the Deep Learning methods.
Previous research relied heavily on narrow-band
spectrograms, which offer better frequency resolution
than wide-band spectrograms and can discern
individual harmonics. Wide-band spectrograms, on
the other hand, offer better temporal resolution than
narrow-band spectrograms and reveal distinct glottal
pulses that are connected with basic frequency and
pitch. On training data, CNNs perform admirably.
The current study's findings demonstrated CNNs'
ability to learn the fundamental emotional properties
of speech signals from their low-level representation
utilising wide-band spectrums.
VII. FUTURE SCOPE
For future work, we suggest to use audio-visual
database or audio-visual-linguistic databases to train
Deep Learning models where facial expressions and
semantic information are taken into account as well as
speech signals, which allows improving the
recognition rate of each emotion. In future, we can
think about using other types of features and apply
our system on other bases that are larger and used
other method for feature extraction.
REFERENCES
[1] Z. Yongzhao and C. Peng, “Research and
implementation of emotional feature extraction
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
@ IJTSRD | Unique Paper ID – IJTSRD47958 | Volume – 6 | Issue – 1 | Nov-Dec 2021 Page 926
and recognition in speech signal,” Journal of
Jiangsu University, volume. 26, No. 1, pp.72-
75, 2005.
[2] Monita Chatterjee, Danielle J Zion, Mickael L
Deroche, Brooke A Burianek, Charles J Limb,
Alison P Goren, Aditya M Kulkarni, and Julie
A Christensen. Voice emotion recognition by
Cochlear-implanted children and their
normally-hearing peers. Hearing research,
322:151-162, 2015.
[3] Nancy Eisenberg, Tracy L Spinrad, and Natalie
D Eggum. Emotional-related self-regulation
and its relation to children’s maladjustment.
Annual review of clinical pschycology, 6:495-
525, 2010.
[4] Harold Schlosberg. Three dimensions of
emotion. Psychological review, 61(2):81, 1954.
[5] Louis Ten Bosch. Emotions, speech and the asr
framework. Speech communication, 40(1-
2):213-225, 2003.
[6] Thomas S Polzin and Alex Waibel. Detecting
emotions in speech. In proceedings of the
CMC, volume 16. Citeseer, 1998.
[7] SurajTripathi, Abhay Kumar, Abhiram
Ramesh, Chirag Singh, PromodYenigalla,
“Speech emotion recognition using kernel
sparse representation based classifier,” in 2016
24th European Signal Processing
Conference(EUSIPCO), pp.374-377, 2016.
[8] Pan, Y., Shen, P. and Shen, L., 2012. Speech
emotion recognition using support vector
machine. International Journal of Smart Home,
6(2), pp.101-108.
[9] Burkhardt, F.,Taeschke, A.,Rolfes,
M.,Sendlmeier, W.F.andWeiss,B., 2015,
September. A database of German emotional
speech. In Interspeech (vol.5, pp.1517-1520).
[10] Noroozi, F., Marjavonic, M., Njegus, A.,
Escalera, S., &Anbarjafari, G. Audio-visual
emotion recognition in video clips. IEEE
Transactions on Affective Computing, 2017.
[11] Lalitha, S., Madhavan, A., Bhushan, B, and
Saketh, S., 2014, October. Speech emotion
recognition. In Advances in Electronics,
Computer characteristics. The performance of
the classification of emotions strongly depends
on the good extraction of the characteristics
(such as combination of MFCC acoustic feature
with the energy prosodic feature [7]. Yixiong
Pan in [8] used SVM for three class emotion
classification on Berlin Database of Emotional
Speech [9] and achieved 95.1% accuracy.
Noroozi et al proposed a versatile and
Communications (ICAECC), 2014
International Conference on (pp. 1-4). IEEE.
[12] Zamil, AdibAshfaq A., et al. “Emotion
Detection from Speech Signals using Voting
Machanism on Classified Frames.” 2019
International Conference on Robotics,
Electrical and Signal Processing Techniques
(ICREST). IEEE, 2019.
[13] Zhou, Y., Sun, Y., Zhang, J., and Yan, Y.,
2009, December. Speech emotion recognition
using both spectral and prosodic features. In
2009 International Conference on Information
Engineering and Computer Science (pp. 1-4).
IEEE.
[14] Fayek, H. M., M. Lech, and L. Cavedon.
“Towards real-time speech emotion recognition
using Deep Neural Networks.” Signal
Processing and Communication Systems
(ICSPCS), 2015 9th International Conference
on IEEE, 2015.
[15] Martin, O., Kotsia, L.,Macq, B., and Pitas, I.,
2006, April. The eNtERFACE’05 audio-visual
database. In 22nd International Conference on
Data Engineering Workshops (ICDEW’06) (pp
8-8). IEEE.
[16] Sanjitha. B. R, Nipunika. A, Rohita Desai.
“Speech Emotion Recognition using MLP”,
IJESC.
[17] Ray Kurzweil. The singularity is near. Gerald
Duckworth & Co, 2010.
[18] HadhamiAouani and Yassine Ben Ayed,
“Speech Emotion Recognition with Deep
Learning”, 24th International Conference on
Knowledge-Based and Intelligent Information
& Engineering Systems.
[19] PavolHarar, RadimBurget and Malay Kishore
Dutta “Efficiency of chosen ` speech
descriptors in relation to emotion recognition,”
EURASIP Journal on Audio, Speech, and
Music Processing, 2017.
[20] Idris I., Salam M.S. “Improved Speech
Emotion Classification from Spectral
Coefficient Optimization”. Lecture Notes in
Electrical Engineering, vol 387. Springer, 2016.
[21] Pao TL., Chen YT., Yeh JH., Cheng YM.,
Chien C.S. “Feature Combination for Better
Differentiating Anger from Neutral in
Mandarin Emotional Speech”, LNCS: Vol.
4738 Berlin: Springer 2007.
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
@ IJTSRD | Unique Paper ID – IJTSRD47958 | Volume – 6 | Issue – 1 | Nov-Dec 2021 Page 927
[22] J&M: Daniel Jurafsky and James H. Martin
(2008). Speech and Language Processing,
Pearson Education (2nd edition).
[23] HyenkHermansky, “Perceptual linear
Predictive (PLP) analysis of speech,” The
Journal of the Acoustical Society of America,
Vol.87, No.4, pp.1737-1752, 1980.
[24] A. Berg, J. Deng, and L. Fei-Fei. Large scale
visual recognition challenge 2010.
www.upgrad.com/blog/.

Contenu connexe

Similaire à Speech Emotion Recognition Using Neural Networks

A Review Paper on Speech Based Emotion Detection Using Deep Learning
A Review Paper on Speech Based Emotion Detection Using Deep LearningA Review Paper on Speech Based Emotion Detection Using Deep Learning
A Review Paper on Speech Based Emotion Detection Using Deep LearningIRJET Journal
 
A hybrid strategy for emotion classification
A hybrid strategy for emotion classificationA hybrid strategy for emotion classification
A hybrid strategy for emotion classificationnooriasukmaningtyas
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNIJCSEA Journal
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNIJCSEA Journal
 
ASERS-CNN: ARABIC SPEECH EMOTION RECOGNITION SYSTEM BASED ON CNN MODEL
ASERS-CNN: ARABIC SPEECH EMOTION RECOGNITION SYSTEM BASED ON CNN MODELASERS-CNN: ARABIC SPEECH EMOTION RECOGNITION SYSTEM BASED ON CNN MODEL
ASERS-CNN: ARABIC SPEECH EMOTION RECOGNITION SYSTEM BASED ON CNN MODELsipij
 
Signal & Image Processing : An International Journal
Signal & Image Processing : An International JournalSignal & Image Processing : An International Journal
Signal & Image Processing : An International Journalsipij
 
ASERS-CNN: Arabic Speech Emotion Recognition System based on CNN Model
ASERS-CNN: Arabic Speech Emotion Recognition System based on CNN ModelASERS-CNN: Arabic Speech Emotion Recognition System based on CNN Model
ASERS-CNN: Arabic Speech Emotion Recognition System based on CNN Modelsipij
 
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
 MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORKijitcs
 
Recognition of emotional states using EEG signals based on time-frequency ana...
Recognition of emotional states using EEG signals based on time-frequency ana...Recognition of emotional states using EEG signals based on time-frequency ana...
Recognition of emotional states using EEG signals based on time-frequency ana...IJECEIAES
 
Sentiment analysis by deep learning approaches
Sentiment analysis by deep learning approachesSentiment analysis by deep learning approaches
Sentiment analysis by deep learning approachesTELKOMNIKA JOURNAL
 
76201926
7620192676201926
76201926IJRAT
 
Human Emotion Recognition From Speech
Human Emotion Recognition From SpeechHuman Emotion Recognition From Speech
Human Emotion Recognition From SpeechIJERA Editor
 
IRJET-speech emotion.pdf
IRJET-speech emotion.pdfIRJET-speech emotion.pdf
IRJET-speech emotion.pdfneeshukumari4
 
Utterance based speaker identification
Utterance based speaker identificationUtterance based speaker identification
Utterance based speaker identificationIJCSEA Journal
 
Emotion Recognition through Speech Analysis using various Deep Learning Algor...
Emotion Recognition through Speech Analysis using various Deep Learning Algor...Emotion Recognition through Speech Analysis using various Deep Learning Algor...
Emotion Recognition through Speech Analysis using various Deep Learning Algor...IRJET Journal
 

Similaire à Speech Emotion Recognition Using Neural Networks (20)

histogram-based-emotion
histogram-based-emotionhistogram-based-emotion
histogram-based-emotion
 
A Review Paper on Speech Based Emotion Detection Using Deep Learning
A Review Paper on Speech Based Emotion Detection Using Deep LearningA Review Paper on Speech Based Emotion Detection Using Deep Learning
A Review Paper on Speech Based Emotion Detection Using Deep Learning
 
A hybrid strategy for emotion classification
A hybrid strategy for emotion classificationA hybrid strategy for emotion classification
A hybrid strategy for emotion classification
 
Audio-
Audio-Audio-
Audio-
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANN
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANN
 
ASERS-CNN: ARABIC SPEECH EMOTION RECOGNITION SYSTEM BASED ON CNN MODEL
ASERS-CNN: ARABIC SPEECH EMOTION RECOGNITION SYSTEM BASED ON CNN MODELASERS-CNN: ARABIC SPEECH EMOTION RECOGNITION SYSTEM BASED ON CNN MODEL
ASERS-CNN: ARABIC SPEECH EMOTION RECOGNITION SYSTEM BASED ON CNN MODEL
 
Signal & Image Processing : An International Journal
Signal & Image Processing : An International JournalSignal & Image Processing : An International Journal
Signal & Image Processing : An International Journal
 
ASERS-CNN: Arabic Speech Emotion Recognition System based on CNN Model
ASERS-CNN: Arabic Speech Emotion Recognition System based on CNN ModelASERS-CNN: Arabic Speech Emotion Recognition System based on CNN Model
ASERS-CNN: Arabic Speech Emotion Recognition System based on CNN Model
 
A017410108
A017410108A017410108
A017410108
 
A017410108
A017410108A017410108
A017410108
 
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
 MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
 
Recognition of emotional states using EEG signals based on time-frequency ana...
Recognition of emotional states using EEG signals based on time-frequency ana...Recognition of emotional states using EEG signals based on time-frequency ana...
Recognition of emotional states using EEG signals based on time-frequency ana...
 
F334047
F334047F334047
F334047
 
Sentiment analysis by deep learning approaches
Sentiment analysis by deep learning approachesSentiment analysis by deep learning approaches
Sentiment analysis by deep learning approaches
 
76201926
7620192676201926
76201926
 
Human Emotion Recognition From Speech
Human Emotion Recognition From SpeechHuman Emotion Recognition From Speech
Human Emotion Recognition From Speech
 
IRJET-speech emotion.pdf
IRJET-speech emotion.pdfIRJET-speech emotion.pdf
IRJET-speech emotion.pdf
 
Utterance based speaker identification
Utterance based speaker identificationUtterance based speaker identification
Utterance based speaker identification
 
Emotion Recognition through Speech Analysis using various Deep Learning Algor...
Emotion Recognition through Speech Analysis using various Deep Learning Algor...Emotion Recognition through Speech Analysis using various Deep Learning Algor...
Emotion Recognition through Speech Analysis using various Deep Learning Algor...
 

Plus de ijtsrd

‘Six Sigma Technique’ A Journey Through its Implementation
‘Six Sigma Technique’ A Journey Through its Implementation‘Six Sigma Technique’ A Journey Through its Implementation
‘Six Sigma Technique’ A Journey Through its Implementationijtsrd
 
Edge Computing in Space Enhancing Data Processing and Communication for Space...
Edge Computing in Space Enhancing Data Processing and Communication for Space...Edge Computing in Space Enhancing Data Processing and Communication for Space...
Edge Computing in Space Enhancing Data Processing and Communication for Space...ijtsrd
 
Dynamics of Communal Politics in 21st Century India Challenges and Prospects
Dynamics of Communal Politics in 21st Century India Challenges and ProspectsDynamics of Communal Politics in 21st Century India Challenges and Prospects
Dynamics of Communal Politics in 21st Century India Challenges and Prospectsijtsrd
 
Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...
Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...
Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...ijtsrd
 
The Impact of Digital Media on the Decentralization of Power and the Erosion ...
The Impact of Digital Media on the Decentralization of Power and the Erosion ...The Impact of Digital Media on the Decentralization of Power and the Erosion ...
The Impact of Digital Media on the Decentralization of Power and the Erosion ...ijtsrd
 
Online Voices, Offline Impact Ambedkars Ideals and Socio Political Inclusion ...
Online Voices, Offline Impact Ambedkars Ideals and Socio Political Inclusion ...Online Voices, Offline Impact Ambedkars Ideals and Socio Political Inclusion ...
Online Voices, Offline Impact Ambedkars Ideals and Socio Political Inclusion ...ijtsrd
 
Problems and Challenges of Agro Entreprenurship A Study
Problems and Challenges of Agro Entreprenurship A StudyProblems and Challenges of Agro Entreprenurship A Study
Problems and Challenges of Agro Entreprenurship A Studyijtsrd
 
Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...
Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...
Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...ijtsrd
 
The Impact of Educational Background and Professional Training on Human Right...
The Impact of Educational Background and Professional Training on Human Right...The Impact of Educational Background and Professional Training on Human Right...
The Impact of Educational Background and Professional Training on Human Right...ijtsrd
 
A Study on the Effective Teaching Learning Process in English Curriculum at t...
A Study on the Effective Teaching Learning Process in English Curriculum at t...A Study on the Effective Teaching Learning Process in English Curriculum at t...
A Study on the Effective Teaching Learning Process in English Curriculum at t...ijtsrd
 
The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...
The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...
The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...ijtsrd
 
Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...
Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...
Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...ijtsrd
 
Sustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. Sadiku
Sustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. SadikuSustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. Sadiku
Sustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. Sadikuijtsrd
 
Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...
Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...
Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...ijtsrd
 
Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...
Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...
Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...ijtsrd
 
Activating Geospatial Information for Sudans Sustainable Investment Map
Activating Geospatial Information for Sudans Sustainable Investment MapActivating Geospatial Information for Sudans Sustainable Investment Map
Activating Geospatial Information for Sudans Sustainable Investment Mapijtsrd
 
Educational Unity Embracing Diversity for a Stronger Society
Educational Unity Embracing Diversity for a Stronger SocietyEducational Unity Embracing Diversity for a Stronger Society
Educational Unity Embracing Diversity for a Stronger Societyijtsrd
 
Integration of Indian Indigenous Knowledge System in Management Prospects and...
Integration of Indian Indigenous Knowledge System in Management Prospects and...Integration of Indian Indigenous Knowledge System in Management Prospects and...
Integration of Indian Indigenous Knowledge System in Management Prospects and...ijtsrd
 
DeepMask Transforming Face Mask Identification for Better Pandemic Control in...
DeepMask Transforming Face Mask Identification for Better Pandemic Control in...DeepMask Transforming Face Mask Identification for Better Pandemic Control in...
DeepMask Transforming Face Mask Identification for Better Pandemic Control in...ijtsrd
 
Streamlining Data Collection eCRF Design and Machine Learning
Streamlining Data Collection eCRF Design and Machine LearningStreamlining Data Collection eCRF Design and Machine Learning
Streamlining Data Collection eCRF Design and Machine Learningijtsrd
 

Plus de ijtsrd (20)

‘Six Sigma Technique’ A Journey Through its Implementation
‘Six Sigma Technique’ A Journey Through its Implementation‘Six Sigma Technique’ A Journey Through its Implementation
‘Six Sigma Technique’ A Journey Through its Implementation
 
Edge Computing in Space Enhancing Data Processing and Communication for Space...
Edge Computing in Space Enhancing Data Processing and Communication for Space...Edge Computing in Space Enhancing Data Processing and Communication for Space...
Edge Computing in Space Enhancing Data Processing and Communication for Space...
 
Dynamics of Communal Politics in 21st Century India Challenges and Prospects
Dynamics of Communal Politics in 21st Century India Challenges and ProspectsDynamics of Communal Politics in 21st Century India Challenges and Prospects
Dynamics of Communal Politics in 21st Century India Challenges and Prospects
 
Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...
Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...
Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...
 
The Impact of Digital Media on the Decentralization of Power and the Erosion ...
The Impact of Digital Media on the Decentralization of Power and the Erosion ...The Impact of Digital Media on the Decentralization of Power and the Erosion ...
The Impact of Digital Media on the Decentralization of Power and the Erosion ...
 
Online Voices, Offline Impact Ambedkars Ideals and Socio Political Inclusion ...
Online Voices, Offline Impact Ambedkars Ideals and Socio Political Inclusion ...Online Voices, Offline Impact Ambedkars Ideals and Socio Political Inclusion ...
Online Voices, Offline Impact Ambedkars Ideals and Socio Political Inclusion ...
 
Problems and Challenges of Agro Entreprenurship A Study
Problems and Challenges of Agro Entreprenurship A StudyProblems and Challenges of Agro Entreprenurship A Study
Problems and Challenges of Agro Entreprenurship A Study
 
Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...
Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...
Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...
 
The Impact of Educational Background and Professional Training on Human Right...
The Impact of Educational Background and Professional Training on Human Right...The Impact of Educational Background and Professional Training on Human Right...
The Impact of Educational Background and Professional Training on Human Right...
 
A Study on the Effective Teaching Learning Process in English Curriculum at t...
A Study on the Effective Teaching Learning Process in English Curriculum at t...A Study on the Effective Teaching Learning Process in English Curriculum at t...
A Study on the Effective Teaching Learning Process in English Curriculum at t...
 
The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...
The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...
The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...
 
Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...
Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...
Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...
 
Sustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. Sadiku
Sustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. SadikuSustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. Sadiku
Sustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. Sadiku
 
Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...
Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...
Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...
 
Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...
Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...
Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...
 
Activating Geospatial Information for Sudans Sustainable Investment Map
Activating Geospatial Information for Sudans Sustainable Investment MapActivating Geospatial Information for Sudans Sustainable Investment Map
Activating Geospatial Information for Sudans Sustainable Investment Map
 
Educational Unity Embracing Diversity for a Stronger Society
Educational Unity Embracing Diversity for a Stronger SocietyEducational Unity Embracing Diversity for a Stronger Society
Educational Unity Embracing Diversity for a Stronger Society
 
Integration of Indian Indigenous Knowledge System in Management Prospects and...
Integration of Indian Indigenous Knowledge System in Management Prospects and...Integration of Indian Indigenous Knowledge System in Management Prospects and...
Integration of Indian Indigenous Knowledge System in Management Prospects and...
 
DeepMask Transforming Face Mask Identification for Better Pandemic Control in...
DeepMask Transforming Face Mask Identification for Better Pandemic Control in...DeepMask Transforming Face Mask Identification for Better Pandemic Control in...
DeepMask Transforming Face Mask Identification for Better Pandemic Control in...
 
Streamlining Data Collection eCRF Design and Machine Learning
Streamlining Data Collection eCRF Design and Machine LearningStreamlining Data Collection eCRF Design and Machine Learning
Streamlining Data Collection eCRF Design and Machine Learning
 

Dernier

Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 

Dernier (20)

YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 

Speech Emotion Recognition Using Neural Networks

  • 1. International Journal of Trend in Scientific Research and Development (IJTSRD) Volume 6 Issue 1, November-December 2021 Available Online: www.ijtsrd.com e-ISSN: 2456 – 6470 @ IJTSRD | Unique Paper ID – IJTSRD47958 | Volume – 6 | Issue – 1 | Nov-Dec 2021 Page 922 Speech Emotion Recognition Using Neural Networks Anirban Chakraborty Research Scholar, Department of Artificial Intelligence, Lovely Professional University, Jalandhar, Punjab, India ABSTRACT Speech is the most natural and easy method for people to communicate, and interpreting speech is one of the most sophisticated tasks that the human brain conducts. The goal of Speech Emotion Recognition (SER) is to identify human emotion from speech. This is due to the fact that tone and pitch of the voice frequently reflect underlying emotions. Librosa was used to analyse audio and music, sound file was used to read and write sampled sound file formats, and sklearn was used to create the model. The current study looked on the effectiveness of Convolutional Neural Networks (CNN) in recognising spoken emotions. The networks' input characteristics are spectrograms of voice samples. Mel- Frequency Cepstral Coefficients (MFCC) are used to extract characteristics from audio. Our own voice dataset is utilised to train and test our algorithms. The emotions of the speech (happy, sad, angry, neutral, shocked, disgusted) will be determined based on the evaluation. KEYWORDS: Speech emotion, Energy, Pitch, Librosa, Sklearn, Sound file, CNN, Spectrogram, MFCC How to cite this paper: Anirban Chakraborty "Speech Emotion Recognition Using Neural Networks" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456- 6470, Volume-6 | Issue-1, December 2021, pp.922-927, URL: www.ijtsrd.com/papers/ijtsrd47958.pdf Copyright © 2021 by author(s) and International Journal of Trend in Scientific Research and Development Journal. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0) (http://creativecommons.org/licenses/by/4.0) I. INTRODUCTION Speech emotion recognition (SER) is a technique that extracts emotional features from speech by analysing distinctive characteristics and the acquired emotional change. At the moment, voice emotion recognition is a developing artificial intelligence cross-field [1]. A voice emotion processing and recognition system is made up of three parts: speech signal acquisition, feature extraction, and emotion recognition. In this method, the extraction quality has a direct impact on the accuracy of speech emotion identification. In feature extraction, the entire emotion sentence was frequently used as a unit for feature extraction and extraction contents. The neural networks of the human brain are highly capable of learning high-level abstract notions from low-level information acquired by the sensory periphery. Humans communicate through voice, and interpreting speech is one of the most sophisticated operations that the human brain conducts. It has been argued that children who are not able to understand the emotional states of the speakers developed poor social skills and in some cases they show psychopathological symptoms [2, 3]. This highlights the importance of recognizing the emotional states of speech in effective communication. Detection of emotion from facial expressions and biological measurements such as heart beats or skin resistance formed the preliminary framework of research in emotion recognition[4]. More recently, emotion recognition from speech signal has received growing attention. The traditional approach toward this problem was based on the fact that there are relationships between acoustic features and emotion. In other words, the emotion is encoded by acoustic and prosodic correlates of speech signals such as speaking rate, intonation, energy, formant frequencies, fundamental frequency (pitch), intensity (loudness), duration (length), and spectral characteristic (timbre) [5, 6]. There are a variety of machine learning algorithms that have been examined to classify emotions based on their acoustic correlates in speech utterances. In the current study, we investigated the capability of convolutional neural networks in classifying speech emotions using our own dataset. There are a variety of machine learning algorithms that have been examined to classify emotions based on their acoustic correlates in speech utterances. In the current study, we investigated the capability of convolutional neural networks in classifying speech emotions using our own dataset. The specific contribution of this study is using IJTSRD47958
  • 2. International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470 @ IJTSRD | Unique Paper ID – IJTSRD47958 | Volume – 6 | Issue – 1 | Nov-Dec 2021 Page 923 wideband spectrograms instead of narrow-band spectrograms as well as assessing the effect of data augmentation on the accuracy of models. Our results revealed that wide-band spectrograms and data augmentation equipped CNNs to achieve the state-of- the art accuracy and surpass human performance. Fig.1. Speech emotion recognition block diagram II. RELATED WORK Most of the papers published in last decade use spectral and prosodic features extracted from raw audio signals. The process of emotion recognition from speech involves extracting the characteristics from a corpus of emotional speech selected or implemented, and after that, the classification of emotions is done on the basis of the extracted characteristics. The performance of the classification of emotions strongly depends on the good extraction of the characteristics (such as combination of MFCC acoustic feature with the energy prosodic feature [7]. Yixiong Pan in [8] used SVM for three class emotion classification on Berlin Database of Emotional Speech [9] and achieved 95.1% accuracy. Norooziet.al. Proposed a versatile emotion recognition system based on the analysis of visual and auditory signals. He used 88 features (Mel frequency cepstral coefficients (MFCC), filter bank energies (FBEs)) using the Principal Component Analysis (PCA) infeature extraction to reduce the dimension of features previously extracted revealed that wide-band spectrograms and data augmentation equipped CNNs to achieve the state-of-the art accuracy and surpass human performance. The performance of the classification of emotions strongly depends on the good extraction of the characteristics (such as combination of MFCC acoustic feature with the energy prosodic feature [7]. Yixiong Pan in [8] used SVM for three class emotion classification on Berlin Database of Emotional Speech [9] and achieved 95.1% accuracy. Norooziet.al. proposed a versatile emotion recognition system based on the analysis of visual and auditory signals. He used 88 features (Mel frequency cepstral coefficients (MFCC), filter bank energies (FBEs)) using the Principal Component Analysis (PCA) in feature extraction to reduce the dimension of features previously extracted [10]. S. Lalitha in [11] used pitch and prosody features and SVM classifier reporting 81.1% accuracy on 7 classes of the whole Berlin Database of Emotional Speech. Zamil et al also used the spectral characteristics which is the 13 MFCC obtained from the audio data in their proposed system to classify the 7 emotions with the Logistic Model Tree (LMT) algorithm with an accuracy rate 70% [12]. Yu zhou in [13] combined prosodic and spectral features and used Gaussian mixture model super vector based SVM and reported 88.35% accuracy on 5 classes of Chinese-LDC corpus. H.M Fayek in [14] explored various DNN architecture and reported accuracy around 60% on two different database eNTERFACE [15] and SAVEE [16] with 6 and 7 classes respectively. Fei Wang used combination of Deep Auto Encoder, various features and SVM in [17] and reported 83.5% accuracy on 6 classes of Chinese emotion corpus CASIA. In contrast to these traditional approaches more novel papers have been published recently employing Deep Neural Networks into their experiments with the promising results. Many authors agree that the most important audio characteristics to recognize emotions are spectral energydistribution, Teager Energy Operator (TEO) [18], MFCC, Zero Crossing Rate (ZCR), and the energy parameters of the filter bank energies (FBEs) [19]. III. TRADITIONAL SYSTEM The traditional system was based on the analysis and comparison of all kinds of emotional characteristic parameters, selecting emotional characteristics with high emotional resolution for feature extraction. In general, the traditional emotional feature extraction concentrates on the analysis of the emotional features in the speech from time construction, amplitude construction, and fundamental frequency construction and signal feature [28].
  • 3. International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470 @ IJTSRD | Unique Paper ID – IJTSRD47958 | Volume – 6 | Issue – 1 | Nov-Dec 2021 Page 924 IV. PROPOSED METHOD Convolutional Neural Network (CNN) is used to classify the emotions (happy, sad, angry, neutral, surprised, disgust) and to predict the output by showing its accuracy. The given speech is plotted as spectrogram by using matplot library and this is used as input for CNN to build the model. Fig.2. Flow diagram of proposed system A. Data Set Collection The first step is to create an empty dataset that will hold the training data for the model. After creating an empty dataset, the data’s (audio) have to be recorded and labeled in different classes. Once the labeling is done, the data’s have to be preprocessed which will produce the clear pitch of the data by removing its unwanted background noise. After preprocessing the data’s are classified into train dataset and test dataset, where the train dataset hold 75% of the data and the test dataset holds 25% of the data. B. Feature Extraction of Speech Emotion Human speech consists of many parameters which show the emotions compromise in it. As there is change in emotions these parameters also gets changed. Hence it’s necessary to select proper feature vector to identify the emotions. Features are categorized as excitation source features, spectral features, and prosodic features. Excitation source features are achieved by suppressing characteristics of vocal tract (VT). Spectral features used for emotion recognition are linear prediction coefficients (LPC), Perceptual Linear prediction coefficients (PLPCs), Mel-frequency cepstral coefficients (MFCC), linear prediction cepstrum coefficients (LPCC), and perceptual linear prediction (PLP). The accuracy of differentiating different emotions can be achieved by using MFCC, LFPC [20, 21]. C. Mel-Frequency Cepstral Coefficients The Mel-Frequency Cepstral Coefficients (MFCC) feature extraction method is a leading approach for speech feature extraction. The various steps involved in MFCC feature extraction are: Fig.3. Flow of MFCC A/D conversion: This converts the analog signal into discrete space. Pre-emphasis: This boosts the amount of energy in the high frequencies.
  • 4. International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470 @ IJTSRD | Unique Paper ID – IJTSRD47958 | Volume – 6 | Issue – 1 | Nov-Dec 2021 Page 925 Windowing: Windowing involves the slicing of audio waveform into sliding frames. Discrete Fourier Transform: DFT is used to extract information in the frequency domain [22, 23]. D. Classifiers After extracting features of speech, it is essential to select a proper classifier. Classifiers are used to classify emotions. In the current study, we use Convolutional Neural Network (CNN). The term Convolutional comes from the fact that Convolution-the mathematical operation is employed in these networks. Convolutional Neural Networks is one of the most popular Deep Learning Models that have manifested remarkable success in the research areas. CNN is a deep learning algorithm that takes image as an input, assign importance to various aspects in the image and will be able to differentiate from other. Generally CNNs have three building blocks: the convolutional layer, the pooling layer, and the fully connected layer. Following, we describe these building blocks along with some basic concept such as soft max unit, rectified linear unit, and drop out. Input layer: This layer holds the raw input image. Convolution Layer: This layer computes the output volume by computing dot product between all filters and image patch. Activation Function Layer: This layer will apply element wise activation function to the output of convolution layer. Pool Layer: This layer is periodically inserted in CNN and its main function is to reduce the size of volume which makes computation fast and reduces memory. The two types are Maxpooling and average pooling. Fully-Connected Layer: This layer takes input from the previous layer and computes the class scores and outputs the 1-D array of size equal to the number of classes [24, 25]. Fig.4. CNN Algorithm V. APPLICATION The applications of speech emotion recognition system are, psychiatric diagnosis, conversation with robots, intelligent toys, mobile based emotion recognition, emotion recognition in call centre where emotions of customer can be identified and can help to get better service quality, intelligent tutoring system, lie detection, games[26,27]. It is also used in healthcare, Psychology, cognitive science and marketing, voice-based virtual assistants. VI. CONCLUSION In this research, we suggested a technique for extracting the emotional characteristic parameter from an emotional speech signal using the CNN algorithm, one of the Deep Learning methods. Previous research relied heavily on narrow-band spectrograms, which offer better frequency resolution than wide-band spectrograms and can discern individual harmonics. Wide-band spectrograms, on the other hand, offer better temporal resolution than narrow-band spectrograms and reveal distinct glottal pulses that are connected with basic frequency and pitch. On training data, CNNs perform admirably. The current study's findings demonstrated CNNs' ability to learn the fundamental emotional properties of speech signals from their low-level representation utilising wide-band spectrums. VII. FUTURE SCOPE For future work, we suggest to use audio-visual database or audio-visual-linguistic databases to train Deep Learning models where facial expressions and semantic information are taken into account as well as speech signals, which allows improving the recognition rate of each emotion. In future, we can think about using other types of features and apply our system on other bases that are larger and used other method for feature extraction. REFERENCES [1] Z. Yongzhao and C. Peng, “Research and implementation of emotional feature extraction
  • 5. International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470 @ IJTSRD | Unique Paper ID – IJTSRD47958 | Volume – 6 | Issue – 1 | Nov-Dec 2021 Page 926 and recognition in speech signal,” Journal of Jiangsu University, volume. 26, No. 1, pp.72- 75, 2005. [2] Monita Chatterjee, Danielle J Zion, Mickael L Deroche, Brooke A Burianek, Charles J Limb, Alison P Goren, Aditya M Kulkarni, and Julie A Christensen. Voice emotion recognition by Cochlear-implanted children and their normally-hearing peers. Hearing research, 322:151-162, 2015. [3] Nancy Eisenberg, Tracy L Spinrad, and Natalie D Eggum. Emotional-related self-regulation and its relation to children’s maladjustment. Annual review of clinical pschycology, 6:495- 525, 2010. [4] Harold Schlosberg. Three dimensions of emotion. Psychological review, 61(2):81, 1954. [5] Louis Ten Bosch. Emotions, speech and the asr framework. Speech communication, 40(1- 2):213-225, 2003. [6] Thomas S Polzin and Alex Waibel. Detecting emotions in speech. In proceedings of the CMC, volume 16. Citeseer, 1998. [7] SurajTripathi, Abhay Kumar, Abhiram Ramesh, Chirag Singh, PromodYenigalla, “Speech emotion recognition using kernel sparse representation based classifier,” in 2016 24th European Signal Processing Conference(EUSIPCO), pp.374-377, 2016. [8] Pan, Y., Shen, P. and Shen, L., 2012. Speech emotion recognition using support vector machine. International Journal of Smart Home, 6(2), pp.101-108. [9] Burkhardt, F.,Taeschke, A.,Rolfes, M.,Sendlmeier, W.F.andWeiss,B., 2015, September. A database of German emotional speech. In Interspeech (vol.5, pp.1517-1520). [10] Noroozi, F., Marjavonic, M., Njegus, A., Escalera, S., &Anbarjafari, G. Audio-visual emotion recognition in video clips. IEEE Transactions on Affective Computing, 2017. [11] Lalitha, S., Madhavan, A., Bhushan, B, and Saketh, S., 2014, October. Speech emotion recognition. In Advances in Electronics, Computer characteristics. The performance of the classification of emotions strongly depends on the good extraction of the characteristics (such as combination of MFCC acoustic feature with the energy prosodic feature [7]. Yixiong Pan in [8] used SVM for three class emotion classification on Berlin Database of Emotional Speech [9] and achieved 95.1% accuracy. Noroozi et al proposed a versatile and Communications (ICAECC), 2014 International Conference on (pp. 1-4). IEEE. [12] Zamil, AdibAshfaq A., et al. “Emotion Detection from Speech Signals using Voting Machanism on Classified Frames.” 2019 International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST). IEEE, 2019. [13] Zhou, Y., Sun, Y., Zhang, J., and Yan, Y., 2009, December. Speech emotion recognition using both spectral and prosodic features. In 2009 International Conference on Information Engineering and Computer Science (pp. 1-4). IEEE. [14] Fayek, H. M., M. Lech, and L. Cavedon. “Towards real-time speech emotion recognition using Deep Neural Networks.” Signal Processing and Communication Systems (ICSPCS), 2015 9th International Conference on IEEE, 2015. [15] Martin, O., Kotsia, L.,Macq, B., and Pitas, I., 2006, April. The eNtERFACE’05 audio-visual database. In 22nd International Conference on Data Engineering Workshops (ICDEW’06) (pp 8-8). IEEE. [16] Sanjitha. B. R, Nipunika. A, Rohita Desai. “Speech Emotion Recognition using MLP”, IJESC. [17] Ray Kurzweil. The singularity is near. Gerald Duckworth & Co, 2010. [18] HadhamiAouani and Yassine Ben Ayed, “Speech Emotion Recognition with Deep Learning”, 24th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems. [19] PavolHarar, RadimBurget and Malay Kishore Dutta “Efficiency of chosen ` speech descriptors in relation to emotion recognition,” EURASIP Journal on Audio, Speech, and Music Processing, 2017. [20] Idris I., Salam M.S. “Improved Speech Emotion Classification from Spectral Coefficient Optimization”. Lecture Notes in Electrical Engineering, vol 387. Springer, 2016. [21] Pao TL., Chen YT., Yeh JH., Cheng YM., Chien C.S. “Feature Combination for Better Differentiating Anger from Neutral in Mandarin Emotional Speech”, LNCS: Vol. 4738 Berlin: Springer 2007.
  • 6. International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470 @ IJTSRD | Unique Paper ID – IJTSRD47958 | Volume – 6 | Issue – 1 | Nov-Dec 2021 Page 927 [22] J&M: Daniel Jurafsky and James H. Martin (2008). Speech and Language Processing, Pearson Education (2nd edition). [23] HyenkHermansky, “Perceptual linear Predictive (PLP) analysis of speech,” The Journal of the Acoustical Society of America, Vol.87, No.4, pp.1737-1752, 1980. [24] A. Berg, J. Deng, and L. Fei-Fei. Large scale visual recognition challenge 2010. www.upgrad.com/blog/.