2. “SPEECH BASED EMOTION RECOGNITION”
MAJOR PROJECT REVIEW
BY
G.HANNAH SANJANA 17P71A1209
MANASA MITTIPALLY 17P71A1218
VAMSHIDHAR SINGH 17P71A1248
UNDER THE GUIDANCE OF
MRS M.SUPRIYA , ASSOCIATE PROFESSOR
DEPARTMENT OF INFORMATION TECHNOLOGY
SWAMI VIVEKANANDA INSTITUTE OF TECHNOLOGY
Mahbub College Campus, R.P Road, Secunderabad-03
(Affiliated to JNTUH)
2017-2021
3. CONTENTS :
Abstract
Introduction
Existing System
Disadvantages of Existing System
Proposed System
Advantages of Proposed System
System Specifications
UML Diagrams
Output Screens
Conclusion
4. ABSTRACT
Speech emotion recognition is a trending research topic these days, with its
main motive to improve the human-machine interaction. At present, most of
the work in this area utilizes extraction of discriminatory features for the
purpose of classification of emotions into various categories.
Most of the present work involves the utterance of words which is used for
lexical analysis for emotion recognition. In our project, a technique is utilized
for classifying emotions into Angry', 'Calm', 'Fearful', 'Happy', and 'Sad'
categories.
5. ABSTRACT
In previous works, the maximum cross correlation between audio files is
computed for labeling the speech data into one of the only few (three) emotion
categories. Accordingly, there was one more work developed in MATLAB for
Identification of an emotion for any audio file passed as an argument.
A variety of classifiers are used through the MATLAB classification learner
toolbox, to classify only few emotion categories. The proposed techniques pave
way for a real-time prototype for speech emotion recognition with open-source
features.
6. INTRODUCTION
Speech emotion recognition is a technology that extracts emotion features from computer
speech signals, compares them, and analyzes the feature parameters and the obtained emotion
changes. Recognizing emotions from audio signals requires feature extraction and classifier
training.
The feature vector is composed of audio signal elements that characterize the specific
characteristics of the speaker (such as pitch, volume, energy), which is essential for training
the classifier model to accurately recognize specific emotions.
7. EXISTING SYSTEM
The existing work in this area reveals that most of the present work relies on lexical
analysis for emotion recognition, that have been used for the purpose of classification of
emotions into three categories, i.e., Angry, Happy and Neutral. The maximum cross-
correlation between the discrete time sequences of the audio signals is computed and the
highest degree of correlation between the testing audio file and the training audio file is
used as an integral parameter for identification of a particular emotion type.
The second technique is used with the feature extraction of discriminatory features with the
Cubic SVM classifier for recognition of Angry, Happy and Neutral emotion segments only.
8. DISADVANTAGES OF EXISTING
SYSTEM:
The system is very static in nature and cannot
provide any good performance in real time
systems.
The system is very slow as to compare the
correlations of the complete dataset with just one
audio file.
Variable length audio files are not
understandable.
Long pre-processing steps are required for the
model to understand the audio signal.
Expensive and not upgradable.
9. PROPOSED SYSTEM
In the project, MFCC has been used as the feature for classifying the speech data into
various emotion categories employing artificial neural networks. The usage of the Neural
Networks provides us the advantage of classifying many different types of emotions in a
variable length of audio signal in a real time environment.
This technique manages to establish a good balance between computational volume and
performance accuracy of the real-time processes.
10. ADVANTAGES OF PROPOSED
SYSTEM:
Can be implemented in any hardware
supporting the python language.
Very fast in processing the audio and easy
to use.
Variable length audio files are understood
by the system.
24. CONCLUSION
The CNN model was trained and based on this we were able to give the
emotions of a person based on speech.
The trained model is giving us the F1 score of 91.04.
‘Happy’, ‘Sad’, ‘Fearful, ’Calm’, ‘Angry’ are the five different emotions
which are given using this project.
This speech based emotion recognition can be used in understanding the
opinions/ sentiments they express regarding a product or a political opinion
etc.. by giving the audio as the input to this model.
25. FUTURE ENHANCEMENTS
Making the system more accurate.
Various other emotions can be added to it like disgusted, surprised etc..
Integrating the system with different platforms.