SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
EMO PLAYER SEMINAR REPORT| 2016
MCET 1 nzmotp@gmail.com
EMO PLAYER
SEMINAR REPORT
Submitted in partial fulfillment of the requirements for the award of
Bachelor of Technology degree in Electronics & Communication Engineering
By
NISAMUDEEN P (MQAMEEC020)
DEPARTMENT OF ELECTRONICS & COMMUNICATION
ENGINEERING
MALABAR COLLEGE OF ENGINEERING & TECHNOLOGY
DESAMANGALAM, PALLUR (P.O),
THRISSUR. KERALA - 679532.
EMO PLAYER SEMINAR REPORT| 2016
MCET 2 nzmotp@gmail.com
BONAFIDE CERTIFICATE
This is to certify that this Seminar Report is the bonafide work of NISAMUDEEN P
(MQAMEEC020) who carried out the Seminar entitled EMO PLAYER under our
supervision from __________ to ___________.
(Guide)
Mr. VYSHAK VITTAL Mr. ANSHAD A.S
(Assistant professor ECE Department) HOD ECE
Name of the Examiners Signature
1)
2)
EMO PLAYER SEMINAR REPORT| 2016
MCET 3 nzmotp@gmail.com
DECLARATION
I NISAMUDEEN P (MQAMEEC020) hereby declare that the Seminar Report
entitled EMO PLAYER done by me under the guidance of Mr. VYSHAK VITTAL at
Malabar College of Engineering & Technology is submitted in partial fulfillment of the
requirements for the award of Bachelor of Technology degree in Electronics &
Communication Engineering.
DATE:
PLACE: SIGNATURE OF THECANDIDATE
NISAMUDEEN P
EMO PLAYER SEMINAR REPORT| 2016
MCET 4 nzmotp@gmail.com
ACKNOWLEDGEMENT
I thank ALMIGHTY GOD, who laid the foundation for knowledge and wisdom and
has always been the source of my strength, inspiration and who guides me through out.
I express my sincere thanks to Dr. V.NANDANAN, Principal, for giving me the
permission to present this seminar work and providing me with all the facilities for the
successful completion of the same. I express deep sense of gratitude to Prof. ANSHAD,
HOD, Department of Electronic & Communication for providing me with best facilities and
atmosphere for the creative work guidance and encouragement.
I would like to thank my coordinator, Ms. GOPIKA GOPINATH V, Assistant
professor and my guide Mr. VYSHAK VITTAL, Assistant professor, for their judicious and
importunate piece of information, expert suggestion, guidance and support during every
stage of this work. I express sincere thanks to all the Lectures in the ECE Department for
providing me with valuable guidance and encouragement throughout the work.
I also express my sincere thanks to all my classmates, friends, and much great felt to
my loving Parents for their encouragement, prayers and inspiration during every stage of this
work.
CONTENTS
EMO PLAYER SEMINAR REPORT| 2016
MCET 5 nzmotp@gmail.com
1. LIST OF FIGURES [iii]
2. LIST OF TABLES [iv]
3. ABSTRACT
4. CHAPTER 1: INTRODUCTION 4
5. CHAPTER 2: LITERATURE REVIEW
2.1 Face detection methods 4
2.2 Feature extraction methods 4
2.3 Expression Recognition 5
6. CHAPTER 3: CURRENT MUSIC SYSTEMS 7
3.1 Limitations of existing system 7
7. CHAPTER 4: PROPOSED SYSTEM 8
4.1 Emotion Feature Extraction Module 10
4.1.1 Face detection and tracking 11
4.1.2 Facial Expression Recognition 12
4.2 Audio Feature Extraction Module 14
4.2.1 MIR 1.5 Toolbox 15
4.2.2 Chroma Toolbox 16
4.2.3 Auditory ToolboX 16
EMO PLAYER SEMINAR REPORT| 2016
MCET 6 nzmotp@gmail.com
4.2.4 MFCC 17
4.3 Emotion- Audio integration module 17
4.3.1 Audio Emotion Recognition 17
4.4 Classifier 20
4.5 Databases 21
4.5 Music 21
8. CHAPTER 5: Viola and Jones algorithm 22
5.1 Advantages 22
5.2 Disadvantages 23
5.3 MATLAB code on frontal cart for object detection 23
9. CHAPTER 6: Experimental results 24
6.1 EMOTION EXTRACTION 24
6.2 AUDIO FEATURE EXTRACTION 25
10. CHAPTER 7: CONCLUSION 26
11. CHAPTER 8: APLICATIONS 27
12 CHAPTER 9: FUTURE SCOPE 28
13 CHAPTER 10: REFRENCES 29
EMO PLAYER SEMINAR REPORT| 2016
MCET 7 nzmotp@gmail.com
LIST OF FIGURES
1. Framework of recognizing emotions with privileged information 03
2. functional block diagram for basic music system 07
3. flow chart 09
4. Feature Extraction from Facial Image 12
5. Regions involved in Expression analysis 13
6. Face region grids (Bounding box) 17
7. Audio Preprocessing 14
8. Chroma Toolbox 16
9. Phases of the MFCC computation 16
10. Integration of three modules 18
11. Mapping of modules 19
EMO PLAYER SEMINAR REPORT| 2016
MCET 8 nzmotp@gmail.com
LIST OF TABLES
Table 1: Time Estimation of Various modules of Emotion Extraction Module 28
Table 2: Estimated Accuracy for different categories of Audio Feature 29
Table 3: Average Time Estimation of Various modules of Proposed System 29
EMO PLAYER SEMINAR REPORT| 2016
MCET 9 nzmotp@gmail.com
ABSTRACT
The human face is an important organ of an individual‘s body and it especially plays
an important role in extraction of an individual‘s behavior and emotional state. Manually
segregating the list of songs and generating an appropriate playlist based on an individual‘s
emotional features is a very tedious, time consuming, labor intensive and upheld task.
Various algorithms have been proposed and developed for automating the playlist generation
process. However the proposed existing algorithms in use are computationally slow, less
accurate and sometimes even require use of additional hardware like EEG or sensors. This
proposed system based on facial expression extracted will generate a playlist automatically
thereby reducing the effort and time involved in rendering the process manually. Thus the
proposed system tends to reduce the computational time involved in obtaining the results and
the overall cost of the designed system, thereby increasing the overall accuracy of the system.
Testing of the system is done on both user dependent (dynamic) and user independent (static)
dataset. Facial expressions are captured using an inbuilt camera. The accuracy of the emotion
detection algorithm used in the system for real time images is around 85-90%, while for
static images it is around 98- 100%.The proposed algorithm on an average calculated
estimation takes around 0.95-1.05 sec to generate an emotion based music playlist. Thus, it
yields better accuracy in terms of performance and computational time and reduces the
designing cost, compared to the algorithms used in the literature survey. Implementation and
testing of the proposed algorithm is carried out using an inbuilt camera. Hence, the proposed
algorithm reduces the overall cost of the system successfully. Also, on average, the proposed
algorithm takes 1.10 sec to generate a playlist based on facial expression. Thus, it yields
better performance, in terms of computational time, as compared to the algorithms in the
existing literature.
EMO PLAYER SEMINAR REPORT| 2016
MCET 10 nzmotp@gmail.com
CHAPTER 1
INTRODUCTION
Music plays a very important role in enhancing an individual‘s life as it is an
important medium of entertainment for music lovers and listeners and sometimes even
imparts a therapeutic approach. In today‘s world, with ever increasing advancements in the
field of multimedia and technology, various music players have been developed with features
like fast forward, reverse, variable playback speed (seek & time compression),local playback,
streaming playback with multicast streams and including volume modulation, genre
classification etc. Although these features satisfy the user‘s basic requirements, yet the user
has to face the task of manually browsing through the playlist of songs and select songs
based on his current mood and behavior. That is the requirements of an individual, a user
sporadically suffered through the need and desire of browsing through his playlist, according
to his mood and emotions. Using traditional music players, a user had to manually browse
through his playlist and select songs that would soothe his mood and emotional experience.
This task was labor intensive and an individual often faced the dilemma of landing at an
appropriate list of songs.
Emotions are synonymous with the aftermath of interplay between an individual’s
cognitive gauging of an event and the corresponding physical response towards it. Among
the various ways of expressing emotions, including human speech and gesture, a facial
expression is the most natural way of relaying them. The ability to understand human
emotions is desirable for human-computer interaction. In the past decade, considerable
amounts of research have been done on emotion recognition from voice, visual behavior, and
physiological signals respectively or jointly. Very good progresses have been achieved in this
field, and several commercial products have been developed, such as smile detection in
camera. Emotion is a subjective response to the outer stimulus. A facial expression is a
discernible manifestation of the emotive state, cognitive activity, motive, and
psychopathology of a person. Homo sapiens have been blessed with an ability to interpret
and analyze an individual’s emotional state.
EMO PLAYER SEMINAR REPORT| 2016
MCET 11 nzmotp@gmail.com
The existing systems, providing the privileged information for the emotion recognitions are
 Audio Emotion Recognition (AER)
 Music Information Retrieval (MIR)
 Emotion Recognition with the Help of Privileged Information
AER is a technique which deals with classifying a received audio signal, by
considering its various audio features into various classes of emotions and moods. Whereas
MIR is a field that extracts some critical information from an audio signal by exploring some
audio features like pitch, energy, MFCC, flux etc.
Though both Audio Emotion Recognition (AER) and Music Information Retrieval
(MIR) included the capabilities of avoiding manual segregation of songs and generation of
playlist, yet it is unable to incorporate fully a human emotion controlled music player. The
advent of Audio Emotion Recognition (AER) and Music Information Retrieval (MIR)
equipped the traditional systems with a feature that automatically parsed the playlist, based
on different classes of emotions. While AER deals with categorizing an audio signal under
various classes of emotions, based on certain audio features , MIR is a field that relies upon
exploring crucial information and extracting various audio features (e.g. pitch, energy, flux,
MFCC, kurtosis etc.) from an audio signal. Although AER and MIR augmented the
capabilities of traditional music players by eradicating the need of manual segregation of
playlist and annotation of songs, based on a user’s emotion, yet, such systems did not
incorporate mechanisms that enabled a music player to be fully controlled by human
emotions.
Although human speech and gesture are a common way of expressing emotions, but
facial expression is the most ancient and natural way of expressing feelings, emotions and
mood. Facial expression is the most ancient and natural way of expressing feelings,
emotions. But in the above two methods ,Audio Emotion Recognition (AER) and Music
Information Retrieval (MIR) of emotion recognition systems there is no dependence up on
the face of the user.
EMO PLAYER SEMINAR REPORT| 2016
MCET 12 nzmotp@gmail.com
In third, it is a novel approach to classify emotions from EEG signals, and a novel
method to tag videos’ emotions, here the integrations of privileged information as from EEG
feature and the Video feature space. By comparing this information with the music play list
system for the automatic generation as given in figure 1.
Figure 1: Framework of recognizing emotions with privileged information
The main objective of this paper is to design an efficient and accurate algorithm that
would generate a playlist based on current emotional state and behavior of the user. And
paper plays a novel approach that removes the risk that the user has to face the task of
manually browsing through the playlist of songs to select. The performance of an efficient
and accurate algorithm, that would generate a playlist based on current emotional state and
behavior of the user.
EEG DATA
Video data
EEG/Video features to
Be recognized / tagged
EEG/VIDEO
feature space
Video feature
EEG feature
EEG feature space
Video feature space
Tag
Emotion
Video model
EEG model
EEG +video
EMO PLAYER SEMINAR REPORT| 2016
MCET 13 nzmotp@gmail.com
CHAPTER 2
LITERATURE REVIEW
Various techniques and approaches have been proposed and developed to classify human
emotional state of behavior. The proposed approaches have focused only on the some of the
basic emotions.
2.1 Face detection methods
The techniques for face detection can be distinguished into two groups: holistic,
where face is treated as a whole unit and analytic, where co-occurrence of characteristic
facial elements is studied. Pantic and Rothkrantz proposed system which process images of
frontal and profile face view. Vertical and horizontal histogram analysis is used to find face
boundaries. Then, face contour is obtained by thresholding the image with HSV color space
values. Kobayashi and Hara used image captured in monochrome mode to find face
brightness distribution. Position of face is estimated by iris localization. For the purpose of
feature recognition, facial features have been categorized into two major categories such as
Appearance-based feature extraction and Geometric based feature extraction. Geometric
based feature extraction technique considered only the shape or major prominent points of
some important facial features such as mouth and eyes in the system proposed by Changbo.
Around a total of 58 major landmark points was considered in crafting an ASM. The
appearance based extraction feature like texture, have also been considered in different areas
of work and development. An efficient method for coding and implementing extracted facial
features together with multi-orientation and multi-resolution set of Gabor filters was
proposed by Michael Lyons.
2.2 Feature extraction methods
Pantic and Rothkrantz selected a set of facial points from frontal and profile face
images. The expression is measured by a distance between position of those points in the
initial image (neutral face) and peak image (affected face).
EMO PLAYER SEMINAR REPORT| 2016
MCET 14 nzmotp@gmail.com
Cohn developed geometric feature based system in which the optical flow algorithm
is performed only in 13x13 pixel regions surrounding facial landmarks. Shan investigated the
Local Binary Pattern method for texture encoding in facial expression description. Two
methods of feature extraction were proposed. In the first one, features are extracted from
fixed set of patches and in the second method from most probable patches found by boosting.
The music mood tags and A-V values from a total 20 subjects were tested and analyzed in
Jung Hyun Kim‘s work, and based on the results obtained from the analysis, the A-V plane
was classified into 8 regions(clusters), depicting mood by data mining efficient k-means
clustering algorithm. Thayer proposed a very useful 2-dimenesional (Stress v/s energy)
model plotted on two axes with emotions depicted by a 2- dimensional co-ordinate system,
lying on either 2 axes or the 4 quadrants formed by the 2-dimensional plot.
2.3 Expression Recognition
The last part of the FER system is based on machine learning theory; precisely it is
the classification task. The input to the classifier is a set of features which were retrieved
from face region in the previous stage. Classification requires supervised training, so the
training set should consist of labeled data. There are a lot of different machine learning
techniques for classification task, namely: K-Nearest Neighbors, Artificial Neural Networks,
Support Vector Machines, Hidden Markov Models, and Expert Systems with rule based
classifier, Bayesian Networks or Boosting Techniques (Adaboost, Gentleboost).The field of
Facial Expression recognition (FER) synthesized algorithm that excelled in furnishing such
demands. FER enabled the computer systems to monitor an individual’s emotional state
effectively and react appropriately. While various systems have been designed to recognize
facial expressions, their time and memory complexity is relatively high and hence fail in
achieving a real time performance. Their feature extraction techniques are also less reliable
and are responsible for reducing the overall accuracy of the system. An accurate statistical
based approach for was proposed by Renuka R. Londhe. The paper was majorly focused on
the study of the changes in curvatures on the face and intensities of corresponding pixels of
images.
EMO PLAYER SEMINAR REPORT| 2016
MCET 15 nzmotp@gmail.com
Artificial Neural Networks (ANN) was used in the classification extracted features
into 6 major universal emotions like anger, disgust, fear, happy, sad, and surprise.
Some of the drawbacks of the existing system are as follows
i. Existing systems are highly complex in terms of time and storage for recognizing
facial expressions in a real environment.
ii. Existing systems lack accuracy in generating a playlist based on the current emotional
experience of a user.
iii. Existing systems employed additional hardware like EEG systems and sensors that
increased the overall cost of the system.
iv. Some of the existing system imposed a requirement of human speech for generating a
playlist, in accordance with a user’s facial expressions.
Several approaches have been proposed and have been adopted to classify human
emotions successfully. This paper primarily aims and focuses on resolving the drawbacks
involved in the existing system by designing an automated emotion based music player for
the generation of customized playlist based on user extracted facial features and thus
avoiding the employment of any additional hardware. It also includes a mood randomized
and appetizer function that shifts the mood generated playlist to another same level of
randomized mood generated playlist after some duration. In order to reduce the human effort
and time needed for manual segregation of songs from a playlist, in correlation with different
classes of emotions and moods, various approaches have been proposed. Numerous
approaches have been designed to extract facial features and audio features from an audio
signal and very few of the systems designed have the capability to generate an emotion based
music playlist using human emotions and the existing designs of the systems are capable to
generate an automated playlist using an additional hardware like Sensors or EEG systems
thereby increasing the cost of the design proposed.
EMO PLAYER SEMINAR REPORT| 2016
MCET 16 nzmotp@gmail.com
CHAPTER 3
CURRENT MUSIC SYSTEMS
The features available in the existing Music players present in computer systems are as
follows:
i. Manual selection of Songs
ii. Party Shuffle
iii. Playlists
iv. Music squares where user has to classify the songs manually according to
particular emotions for only four basic emotions .Those are Passionate, Calm,
Joyful and Excitement.
Limitations of existing system:
i. It requires the user to manually select the songs.
ii. Randomly played songs may not match to the mood of the user.
iii. User has to classify the songs into various emotions and then for playing the
songs user has to manually select a particular emotion.
iv.
Figure 2: functional block diagram for basic music system
WHOLE MUSIC PLAYING SYSTEM
SELECTION
OF
PLAYLISTS
PLAYLISTS:
PLAYLIST-1
PLAYLIST-2
PLAYLIST-3
FAVORITES
RECENTLY
ADDED
SELECTION
OF SONGS
EMO PLAYER SEMINAR REPORT| 2016
MCET 17 nzmotp@gmail.com
CHAPTER 4
PROPOSED SYSTEM
Proposed automatic play list generation scheme is combination of multiple schemes
together. In this paper, we consider the notion of collecting human emotion from the user’s
expressions, and explore how this information could be used to improve the user experience
with music players. We present a new emotion based and user-interactive music system,
EMO PLAYER. It aims to provide user-preferred music with emotion awareness. The system
starts recommendation with expert knowledge. If the user does not like the recommendation,
he/she can decline the recommendation and select the desired music himself/herself. This
paper is based on the idea of automating much of the interaction between the music player
and its user. It introduces a "smart" music player, EMO PLAYER that learns its user's
emotions, and tailors its music selections accordingly. After an initial training period, EMO
PLAYER is able to use its internal algorithms to make an educated selection of the song that
would best fit its user's emotion.
The proposed algorithm revolves around an automated music recommendation system that
generates a subset of the original playlist or a customized playlist in accordance with a user’s
facial expressions. A User’s facial expression helps the music recommendation system to
decipher the current emotional state of a user and recommend a corresponding subset of the
original playlist. It is composed of three main modules: Facial expression recognition
module, Audio emotion recognition module and a System integration module. Facial
expression recognition and audio emotion recognition modules are two mutually exclusive
modules. Hence, the system integration module maps the two sub-spaces by constructing and
querying a meta-data file. It is composed of meta-data corresponding to each audio file.
Figure 3 depicts the flowchart of the proposed algorithm.
EMO PLAYER SEMINAR REPORT| 2016
MCET 18 nzmotp@gmail.com
Figure 3: flow chart
EMO PLAYER SEMINAR REPORT| 2016
MCET 19 nzmotp@gmail.com
4.1 Emotion Extraction Module
Image of a user is captured using a webcam or it can be accessed from the stored
image in the hard disk. This acquired image undergoes image enhancement in the form of
tone mapping in order to restore the original contrast of the image. After image enhancement
all images are converted into binary image format and the face is detected using Viola and
Jones algorithm where the Frontal Cart property of the algorithm is used that only detects
upright and face forwarding features with a maximum threshold value set in the range of 16-
20. The output of Viola and Jones Face detection block forms an input to the facial feature
extraction block. To increase the accuracy and an aim to obtain real time performance only
features of eyes and mouth are appropriate enough to depict the emotions accurately. For
extracting the features of mouth and eyes certain calculations and measurements are taken
into consideration. Equations (1), (2), (3) and (4) illustrate the bounding box calculations for
extracting features of a mouth. The four equations for defining the feature of a mouth with
the help of bounding box is given as follows
X (start pt of mouth) =X (mid pt of nose) – (X (end pt of nose) – (Xstart pt of nose)) (1)
X (end pt of mouth) =X (mid pt of nose) + ((Xend pt of nose) – (Xstart pt of nose)) (2)
Y (start pt of mouth) =Y (mid pt of nose) +15 (3)
Y (end pt of mouth) =Y (start pt of mouth) +103 (4)
Where (X (start pt of mouth), Y (start pt of mouth)) and (X (end pt of mouth), Y (end
pt of mouth)) illustrates start and end points of the bounding box for mouth
respectively
X (mid pt of nose), Y (mid pt of nose)) illustrates midpoint of noise and
((Xend pt of nose), (Xstart pt of nose)) illustrates end and start point of noise.
Classification is performed using Support Vector Machine (SVM) which classifies it into 7
classes of emotions.
EMO PLAYER SEMINAR REPORT| 2016
MCET 20 nzmotp@gmail.com
Face detection and tracking
First part of the system is a module for face detection and landmark localization in
the image. The algorithm for face detection is based on work by Viola and Jones. In this
approach image is represented by a set of Haar-like features. Possible types of features are
bounding box technique, 2, 3 rectangular feature and four rectangular feature. Bounding box
is the physical shape for a given object and can be used to graphically define the physical
shapes of the objects. Feature value is calculated by subtracting sum of the pixels covered by
white rectangle from sum of pixels under gray rectangle. Two rectangular features detect
contrast between two vertically or horizontally adjacent regions.
Three rectangular features detect contrasted region placed between two similar
regions and four rectangular features detect similar regions placed diagonally. Rectangle
features can be computed very rapidly using an intermediate representation for the image
which we call the integral image. This value helps in coagulating the multiple boxes into a
single bounding box. Thus feature can be computed rapidly because the value of each
rectangle requires only 4 pixel references. Having the representation of the image in
rectangular features, the classifier needs to be trained to decide if the image contains
searched object (face) or not. The number of features is much higher than the number of
pixels in the original image. However, it was proven that even a small set of well- chosen
features can build a strong classifier. That is why, the AdaBoost algorithm can be used for
training. Each step selects the most discriminative feature which separates positive and
negative examples in the best way. The method is widely used in area of face detection.
However, it can be trained to detect any object. This algorithm is quick and efficient and
could be used in real- time applications. The facial image obtained from the face detection
stage forms an input to the feature extraction stage. To obtain real time performance and to
reduce time complexity, for the intent of expression recognition, only eyes and mouth are
considered. The combination of two features is adequate to convey emotions accurately.
Figure 4 depicts the schematic for feature extraction.
EMO PLAYER SEMINAR REPORT| 2016
MCET 21 nzmotp@gmail.com
Facial Expression Recognition
Figure 4: Feature Extraction from Facial Image
All RGB and gray scale images are converted into a binary image. This preprocessed
image is fed into the face detection block. Face detection is carried out using viola and jones
algorithm. The default property of ‘Frontal Cart’ with a merging threshold of 16 is employed
to carry out this process. The ‘frontal cart’ property only detects the frontal face that is
upright and forward facing. Since, viola and jones, by default, produces multiple bounding
boxes around the face, merging threshold of 16 helps in coagulating these multiple boxes into
a single bounding box. In-case of multiple faces, this threshold detects the face closest to the
camera and filters out all the distant images.
EMO PLAYER SEMINAR REPORT| 2016
MCET 22 nzmotp@gmail.com
Figure 5: Regions involved in Expression analysis
The binary image obtained from the face detection block forms an input to the feature
extraction block, where eyes and mouth are extracted. Eyes are extracted using the viola and
jones method, however, to extract mouth, certain measurements were considered. To extract
mouth, first bounding box for nose is calculated using viola and jones and then using the
bounding box for nose, bounding box of mouth is deduced.
Figure 6: Face region grids (Bounding box)
EMO PLAYER SEMINAR REPORT| 2016
MCET 23 nzmotp@gmail.com
4.2 Audio Feature Extraction Module
In this module a list of songs forms the input. As songs are audio files, they require a certain
amount of preprocessing Stereo signals obtained from the Internet are converted to 16 bit
PCM mono signal around a variable sampling rate of 48.6 kHz. The conversion process is
done using Audacity technique. The pre-processed signal obtained undergoes an audio
feature extraction, where features like rhythm toning is extracted using MIR 1.5 Toolbox,
pitch is extracted using Chroma Toolbox and other features like centroid, spectral flux,
spectral roll off, kurtosis,15 MFCC coefficients are extracted using Auditory Toolbox.
Figure 7: Audio Preprocessing
EMO PLAYER SEMINAR REPORT| 2016
MCET 24 nzmotp@gmail.com
Audio signals are categorized into 8 types viz. sad, joy-anger, joy-surprise, joy-excitement,
joy, anger, sad-anger and others.
1. Songs that resemble cheerfulness, energetic and playfulness are classified under joy.
2. Songs that resemble very depressing are classified under the sad.
3. Songs that reflect mere attitude, revenge are classified under anger.
4. Songs with anger in playful is classified under Joy-anger category.
5. Songs with very depress mode and anger mood are classified under Sad-Anger category.
6. Songs which reflect excitement of joy is classified under Joy-Excitement category.
7. Songs which reflect surprise of joy is classified under Joy-surprise category.
8. All other songs fall under ‗others‘category.
The extractions of converted files are done by three tool box units, MIR 1.5 Toolbox,
Chroma Toolbox, and Auditory Toolbox.
MIR 1.5 Toolbox
Here the Features like tonality, rhythm, structures, etc extracted using integrated set
of functions written in MATLAB. It is one of the important tool boxes for audio editing
feature .Additionally to the basic computational processes, the toolbox also includes higher-
level musical feature extraction tools, whose alternative strategies, and their multiple
combinations, can be selected by the user. The toolbox was initially conceived in the context
of the Brain Tuning project financed by the European Union (FP6-NEST). One main
objective was to investigate the relation between musical features and music-induced
emotion and the associated neural activity.
EMO PLAYER SEMINAR REPORT| 2016
MCET 25 nzmotp@gmail.com
Chroma Toolbox
Chroma Toolbox is MATLAB implementations for extracting various types of novel pitch-
based and chroma-based audio features. It contains feature extractors for pitch features as
well as parameterized families of variants of chroma-like features.
Figure 8: Chroma Toolbox
Auditory Toolbox
In auditory tool box the Features like centroid, spectral flux, spectral roll off, kurtosis, and 15
MFCC coefficients are extracted using Auditory Toolbox with the implementation on
MATLAB. Audio signals are categorized into 8 types viz. sad, joy-anger, joy-surprise, joy-
excitement, joy, anger, sad-anger and others. MFCC returns a description of the spectral
shape of the sound. The computation of the cepstral follows the scheme present in Figure 9.
Figure 9: Phases of the MFCC computation
EMO PLAYER SEMINAR REPORT| 2016
MCET 26 nzmotp@gmail.com
4.2.1 MFCC
MFCC are perceptually motivated features that are also based on the Short-time Fourier
Transform (STFT). After taking the log-amplitude of the magnitude spectrum, the FFT bins
are grouped and smoothed according to the perceptually motivated Mel- frequency scaling.
Finally, in order to decorrelate the resulting feature vectors, a Discrete Cosine Transform
(DCT) is performed. Although typically 13 coefficients are used for speech representation, it
was found that the first five coefficients are adequate for music representation (Tzanetakis,
2002).
4.3 Emotion- Audio integration module
Emotions extracted for the songs are stored as a meta-data in the database. Mapping
is performed by querying the meta-data database. The emotion extraction module and audio
feature extraction module is finally mapped and combined using an Emotion-Audio
integration module.
Audio Emotion Recognition
In Music Emotion recognition block, the playlist of a user forms the input. An audio signal
initially undergoes certain amount of preprocessing. Since, music files acquired from the
internet are usually stereo signals; all stereo signals are converted to 16 bit PCM mono signal
at a sampling rate of 44.1 kHz. The conversion is performed using Audacity. Conversion of a
stereo signal into mono signal is crucial to reduce the mathematical complexity of processing
the similar content of both the channels of a stereo signal. An 80 second window is then
extracted from the entire audio signal. Audio files are usually very large in size. They are
computationally expensive in terms of memory. Hence, to reduce the memory complexity of
an audio file incurred during audio feature extraction, an 80 sec window is extracted from
each audio file. This also ensures uniformity in terms of size of each audio file. During
extraction, first 20 seconds of an audio file are discarded and are not considered. This helps
in ensuring an efficient retrieval of significant information from an audio file. Figure 7
depicts the process of extraction of an 80 second window.
EMO PLAYER SEMINAR REPORT| 2016
MCET 27 nzmotp@gmail.com
This pre-processed signal then undergoes audio feature extraction, where centroid, spectral
flux, spectral roll off, kurtosis, zero-crossing rate and 13 MFCC coefficients are extracted.
Toolboxes used for audio feature extraction includes MIR 1.5, Auditory Toolbox and
Chroma toolbox. Music based emotion recognition is then carried out using Support Vector
Machine’s ‘one vs other’ approach. Audio signals are classified under 6 categories viz. Joy,
Sad, Anger, Sad-Anger, Joy-Anger and Others. While various mood models have been
proposed in the literature, they failed in capturing the perceptions of a user in real time. They
are figments of theoretical aspects that researchers have associated with different audio
signals. The mood model and the classes considered in this paper take into account, how a
user may associate different moods with an audio signal and a song. While composing a
song, an artist may not maintain the uniformity of a mood across the entire excerpt of a song.
Songs are usually based on a theme or a situation. A song may boast sadness in first half
while the second half may become cheerful. Hence the model adopted in this paper takes into
consideration all these aspects and generates paired classes for such songs, apart from the
individual classes. The model adopted is as follows. All those songs that are cheerful,
energetic and playful are classified under joy, Songs that are very depressing are classified
under the class sad, Songs that reflect attitude, ‘anger associated with patriotism’, and are
revengeful are classified under anger, The category Joy-Anger is associated with songs that
possess anger in a playful mode, Sad-anger category is composed of all those songs that
revolve around the theme of being extremely depressed and angry, All other songs apart from
these general categories falls under the ‘other’ category and When a user is detected with
emotions such as surprise and fear, the songs from the other category are suggested.
Figure 10: integration of three modules
Emotion Extraction
Module
Audio Feature
Extraction Module
Emotion- Audio
integration module
Data Base
(Meta - data)
EMO PLAYER SEMINAR REPORT| 2016
MCET 28 nzmotp@gmail.com
Figure.11: mapping of modules
Figure 11 depicts the mapping of facial emotion recognition module and audio emotion
recognition module. The name of the file and emotions associated with the song is then
recorded in a database as its meta-data. The final mapping between the two blocks is carried
out by querying the meta-data database. Since, Facial Emotion recognition and Audio
emotion recognition modules are two mutually exclusive components, system integration is
performed using an intermediate mapping module that relates the two mutually exclusive
blocks with the help of the audio Meta data file. Figure 11 depicts the mapping of Facial
emotions and Audio emotions. While fear and surprise categories of facial emotions are
mapped onto the ‘Others’ category of audio emotions, each of joy, sad and anger categories
of facial emotions are mapped onto two categories in audio emotions as shown in diagram
below. During play list generation, when the system classifies a facial image under one of the
three emotions, it considers both individual as well as paired classes. For example, if a facial
image is classified under joy, the system will display songs under joy and joy-anger
categories.
EMO PLAYER SEMINAR REPORT| 2016
MCET 29 nzmotp@gmail.com
4.3 Classifier
Many classifiers have been applied to expression recognition such as neural network
(NN), support vector machines (SVM), linear discriminant analysis (LDA), K- nearest
neighbor, multinomial logistic ridge regression (MLR), hidden Markov models (HMM), tree
augmented naive Bayes, and others.
In proposed method the Support Vector Machine with Radial Based Kernel Function is used
as a classifier. The Support Vector Machine is an adaptive learning system which receives
labeled training data and transforms it into higher dimensional feature space. Then separating
hyper-plane with respect to margin maximization is computed to determine the best
separation between classes. The greatest advantage of SVM is that even with small set of
training data it has good performance in generalization. Shan evaluated the performance of
LBP features as facial expression descriptors and they obtained the result of 89% by use of
SVM with RBF Kernel trained with CK database. Recently Bartlett conducted similar
experiments. They selected 313 image sequences from the Cohn-Kanade database. The
sequences came from 90 subjects, with 1 to 6 emotions per subject. The facial images were
converted into a Gabor magnitude representation using a bank of 40 Gabor filters. Then they
performed 10-fold cross-validation experiments using SVM with linear, polynomial, and
RBF kernels. Linear and RBF kernels performed best, with recognition rates of 84.8% and
86.9% respectively. As a powerful machine learning technique for data classification, SVM
performs an implicit mapping of data into a higher (maybe infinite) dimensional feature
space, and then finds a linear separating hyper-plane with the maximal margin to separate
data in this higher dimensional space.
SVM finds the hyper-plane that maximizes the distance between the support vectors and the
hyper-plane. SVM allows domain-specific selection of the kernel function. Though new
kernels are being proposed, the most frequently used kernel functions are the linear,
polynomial, and Radial Basis Function (RBF) kernels.
EMO PLAYER SEMINAR REPORT| 2016
MCET 30 nzmotp@gmail.com
4.4 Databases
One of the most important aspects of developing any new recognition or detection system is
the choice of the database that will be used for testing the new system. For the purpose of
training, two facial expression databases are to be used. First one is the FG-NET Facial
Expression and Emotion Database [4] which consists of MPEG video files with spontaneous
emotions recorded. Database contains examples gathered from 18 subjects (9 female and 9
male). The system has to be trained with captured video frames in which the displayed
emotion is very representative. The training set consists of images of seven states neutral and
emotional (surprise, fear, disgust, sadness, happiness and anger). Second database is the
Cohn-Kanade Facial Expression Database [5] and contains 486 image sequences displayed
by 97 posers. The sequence displays the emotion from the start to the peak, however, only
the last image of a sequence is used for training. The training and testing sets formed from
Cohn-Kanade database contain images divided into seven classes: neutral, surprise, fear,
disgust, sadness, happiness and anger. Having appropriate training sets, the procedure of
classifier training can be performed. The database was planned and designed in a general and
expandable way. It supports the current needs but also allows different mood models,
including different types (both categorical and dimensional), user accounts, lists of artists,
albums, genres, features and classification profiles, saving different classification and
tracking values for the same song, based on different combinations of features and classifiers,
for instance.
4.5 Music
The last and the most important part of this system is the playing of music according to the
user’s current emotional state. Once the face expression of the user has been classified, user’s
corresponding emotion state is found. A musical database consisting of songs from a variety
of domains and pertaining to a number of emotions is maintained .Every song in each
emotion category of the database is assigned weights. Also, each emotion detected is
assigned weights. The algorithm finds out a song closest in weight with the classified
emotion. This song is then played.
EMO PLAYER SEMINAR REPORT| 2016
MCET 31 nzmotp@gmail.com
CHAPTER 5
Viola and Jones algorithm
The Viola–Jones object detection framework is the first object detection framework
to provide competitive object detection rates in real-time proposed in 2001 by Paul Viola and
Michael Jones. Although it can be trained to detect a variety of object classes, it was
motivated primarily by the problem of face detection. The problem to be solved is detection
of faces in an image. A human can do this easily, but a computer needs precise instructions
and constraints. To make the task more manageable, Viola–Jones requires full view frontal
upright faces. Thus in order to be detected, the entire face must point towards the camera and
should not be tilted to either side. While it seems these constraints could diminish the
algorithm's utility somewhat, because the detection step is most often followed by a
recognition step, in practice these limits on pose are quite acceptable. It can be trained to
detect a variety of object classes; it was motivated primarily by the problem of face detection.
Here a computer needs precise instructions and constraints for the detection to make the task
more manageable, Viola–Jones requires full view frontal upright faces. Thus in order to be
detected, the entire face must point towards the camera and should not be tilted to either side.
Here we use the viola frontal cart property, the ‘frontal cart’ property only detects the frontal
face that is upright and forward facing. Here this property with a merging threshold of 16 is
employed to carry out this process. This value helps in coagulating the multiple boxes into a
single bounding box.
Advantages
• Extremely fast feature computation and Efficient feature selection.
• Scale and location invariant detector; Instead of scaling the image itself (e.g.
pyramid-filters), we scale the features.
• Such a generic detection scheme can be trained for detection of other types of objects
(e.g. cars, hands)
EMO PLAYER SEMINAR REPORT| 2016
MCET 32 nzmotp@gmail.com
Disadvantages
• Detector is most effective only on frontal images of faces
• It can hardly cope with 45° face rotation both around the vertical and horizontal axis.
• Sensitive to lighting conditions
• We might get multiple detections of the same face, due to overlapping sub-windows.
MATLAB code on frontal cart for object detection
function [ ] = Viola_Jones_img ( Img )
%Viola_Jones_img( Img )
% Img - input image
% Example how to call function: Viola_Jones_img( imread
('name_of_the_picture.jpg'))
faceDetector =
vision.CascadeObjectDetector;
bboxes = step(faceDetector, Img);
figure, imshow( Img ), title( 'Detected faces' );hold on
for i = 1 : size (bboxes, 1 )
rectangle( 'Position' ,bboxes
( i ,:), ‘ LineWidth ' , 2 , ‘ EdgeColor ' , 'y' );
end
end
EMO PLAYER SEMINAR REPORT| 2016
MCET 33 nzmotp@gmail.com
CHAPTER 6
Experimental results
Testing and implementation is performed using either MATLAB R2013a or latest
2014 version of MATLAB on Windows7/8, 32 bit operating system and Intel i3 core
processor. Facial expression extraction is done on both user independent and dependent
dataset. A dataset consisting of facial image of 25 individuals was selected for user
independent experiment and dataset of 10 individuals was selected for user dependent
experimentation. An image of size 4000X3000 was used for static and dynamic dataset
experiment. Implementation and experimentation in this paper is carried out using MATLAB
R2013a on Windows8, 32 bit operating system and Intel’s i5 M460(2.53 GHz) processor .
For facial emotion recognition, 2 experiments were carried out.
i. Using user independent dataset.
ii. Using user dependent dataset.
6.1 EMOTION EXTRACTION
A user independent dataset of 30 images and user dependent dataset of 5 images is selected
for extraction of emotions. Estimated time for various modules of Emotion Extraction
Module is illustrated in Table 1.
Table 1 Time Estimation of Various modules of Emotion Extraction Module
EMO PLAYER SEMINAR REPORT| 2016
MCET 34 nzmotp@gmail.com
A user independent dataset of 30 images and user dependent dataset of 5 images is selected
for extraction of emotions. Estimated time for various modules of Emotion Extraction
Module is illustrated in Table 2.
6.2 AUDIO FEATURE EXTRACTION
A dataset of around 200 songs was considered for experimentation and testing of audio
feature extraction module and the songs were collected from various Bollywood music sites
like Djmaza.in, Songs.pk etc. Estimated accuracy for various categories of emotions is
depicted in Table 2. It is given as follows
Table 2 Estimated Accuracy for different categories of Audio Feature
6.3 EMOTION BASED MUSIC PLAYER
Table 4 gives the average time taken by the proposed algorithm (i.e. algorithm to generate
playlist, based on facial expressions) and its facial expression recognition and system
integration modules.
Table 3 Average Time Estimation of Various modules of Proposed System
EMO PLAYER SEMINAR REPORT| 2016
MCET 35 nzmotp@gmail.com
CHAPTER 7
CONCLUSION
Experimental results have shown that the time required for audio feature extraction is
negligible (around 0.0006 sec) and songs are stored pre-handed the total estimation time of
the proposed system is proportional to the time required for extraction of facial features
(around 0.9994 sec).Also the various classes of emotion yield a better accuracy rate as
compared to previous existing systems. The computational time taken is 1.000sec which is
very less thus helping in achieving a better real time performance and efficiency. The system
thus aims at providing the Windows operating system users with a cheaper, additional
hardware free and accurate emotion based music system. The Emotion Based Music System
will be of great advantage to users looking for music based on their mood and emotional
behavior. It will help reduce the searching time for music thereby reducing the unnecessary
computational time and thereby increasing the overall accuracy and efficiency of the system.
The system will not only reduce physical stress but will also act as a boon for the music
therapy systems and may also assist the music therapist to therapize a patient. Also with its
additional features mentioned above, it will be a complete system for music lovers and
listeners. A wide variety of image processing techniques was developed to meet the facial
expression recognition system requirements. Apart from theoretical background, this work
provides ways to design and implement Emotion based music player. Proposed system will
be able to process the video of facial behavior, recognize displayed actions in terms of basic
emotions and then play music based on these emotions. Major strengths of the system are full
automation as well as user and environment independence. Even though the system cannot
handle occlusions and significant head rotations, the head shifts are allowed. In the future
work, we would like to focus on improving the recognition rate of our system.
EMO PLAYER SEMINAR REPORT| 2016
MCET 36 nzmotp@gmail.com
CHAPTER 8
APLICATIONS
Music therapists use music to enhance social or interpersonal, affective, cognitive, and
behavioral functioning. Research indicates that music therapy is effective at reducing muscle
tension and anxiety, and at promoting relaxation, verbalization, interpersonal relationships,
and group cohesiveness. This can set the stage for open communication and provide a
starting place for non-threatening support and processing symptoms associated with or
exacerbated by trauma and disaster, such as the 9/11 event. A therapist can talk with a client,
but a qualified music therapist can use music to actively link a client to their psycho-
emotional state quickly. In certain settings, the active use of music therapy interventions has
resulted in a shorter length of stay (treatment period) and more efficient response to the
client’s overall intervention plan.
The use of songs in music therapy with cancer patients and their families effectively provides
important means for support and tools for change. Human contact and professional support
can often assist in diminishing the suffering involved in cancer. There exists an inherent
association between songs and human contact since lyrics represent melodic verbal
communication. Thus, the use of songs in music therapy can have naturally meaningful
applications.
The proposed algorithm was successful in crafting a mechanism that can find its
application in music therapy systems and can help a music therapist to therapize a patient,
suffering from disorders like acute depression, stress or aggression. The system is prone to
give unpredictable results in difficult light conditions, hence as part of the future work,
removing such a drawback from the system is intuited.
EMO PLAYER SEMINAR REPORT| 2016
MCET 37 nzmotp@gmail.com
CHAPTER 9
FUTURE SCOPE
In future we would like to develop a mood- enhancing music player in the future which starts
with the user’s current emotion (which may be sad) and then plays music of positive
emotions thereby eventually giving a joyful feeling to the user. Finally, we would like to
improve the time efficiency of our system in order to make it appropriate to use in different
applications.
EMO PLAYER SEMINAR REPORT| 2016
MCET 38 nzmotp@gmail.com
CHAPTER 10
REFERENCES
[1]. Chang, C. Hu, R. Feris, and M. Turk, ―Manifold based analysis of facial expression,‖
Image Vision Comput ,IEEE Trans. Pattern Anal. Mach. Intell. vol. 24, pp. 05–614, June
2006.
[2]. A. habibzad, ninavin, Mir kamalMirnia,‖ A new algorithm to classify face emotions
through eye and lip feature by using particle swarm optimization.‖
[3]. Byeong-jun Han, Seungmin Rho, Roger B. Dannenberg and Eenjun Hwang, ―SMERS:
music emotion recognition using support vector regression‖, 10thISMIR , 2009.
[4]. Alvin I. Goldmana, b.Chandra and SekharSripadab, ―Simulationist models of face-
based emotion recognition‖.
[5]. Carlos A. Cervantes and Kai-Tai Song , ―Embedded Design of an Emotion-Aware
Music Player‖, IEEE International Conference on Systems, Man, and Cybernetics, pp 2528-
2533 ,2013.
[6]. Fatma Guney, ‖Emotion Recognition using Face Images‖, Bogazici University, Istanbul,
Turkey 34342.
[7]. Jia-Jun Wong, Siu-Yeung Cho,‖ Facial emotion recognition by adaptive processing of
tree structures‖.
[8]. K.Hevener,‖The affective Character of the major and minor modes in music‖, The
American Journal of Psychology,Vol 47(1) pp 103-118,1935.
[9]. Samuel Strupp, Norbert Schmitz, and KarstenBerns, ―Visual-Based Emotion Detection
for Natural Man-Machine Interaction‖.
EMO PLAYER SEMINAR REPORT| 2016
MCET 39 nzmotp@gmail.com
[10]. Michael lyon and Shigeru Akamatsu, ―Coding Facial expression with Gabor
wavelets.‖, IEEE conf. on Automatic face and gesture recognition, March 2000
[11]. Kuan-Chieh Huang, Yau-Hwang Kuo, Mong-Fong Horng ,‖Emotion Recognition by a
novel triangular facial feature extraction method‖.
[12]. S. Dornbush, K. Fisher, K. McKay, A. Prikhodko and Z. Segall ‖Xpod- A Human
Activity and Emotion Aware Mobile Music Player‖, UMBC Ebiquity, November 2005.
[13]. Renuka R. Londhe, Dr. Vrushshen P. Pawar, ―Analysis of Facial Expression and
Recognition Based On Statistical Approach‖, International Journal of Soft Computing and
Engineering (IJSCE) Volume-2, May 2012.
[14]. Russell, ―A circumplex model of affect‖, Journal of Personality and Social
Psychology Vol-39(6), pp1161–1178 , 1980.
[15]. Sanghoon Jun, Seungmin Rho, Byeong-jun Han and Eenjun Hwang, ―A fuzzy
inference-based music emotion recognition system‖,VIE,2008.
[16]. Thayer ‖ The biopsychology of mood & arousal‖, Oxford University Press ,1989.
[17]. Z. Zeng ―A survey of affect recognition methods: Audio, visual, and spontaneous
expressions,‖IEEE. Transaction Pattern Analysis, vol 31, January 2009

Contenu connexe

Tendances

Emotion Speech Recognition - Convolutional Neural Network Capstone Project
Emotion Speech Recognition - Convolutional Neural Network Capstone ProjectEmotion Speech Recognition - Convolutional Neural Network Capstone Project
Emotion Speech Recognition - Convolutional Neural Network Capstone ProjectDiego Rios
 
Object detection presentation
Object detection presentationObject detection presentation
Object detection presentationAshwinBicholiya
 
Noise filtering
Noise filteringNoise filtering
Noise filteringAlaa Ahmed
 
Facial Emoji Recognition
Facial Emoji RecognitionFacial Emoji Recognition
Facial Emoji Recognitionijtsrd
 
IRJET- Emotion based Music Recommendation System
IRJET- Emotion based Music Recommendation SystemIRJET- Emotion based Music Recommendation System
IRJET- Emotion based Music Recommendation SystemIRJET Journal
 
SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK Kamonasish Hore
 
Smoothing Filters in Spatial Domain
Smoothing Filters in Spatial DomainSmoothing Filters in Spatial Domain
Smoothing Filters in Spatial DomainMadhu Bala
 
Kohonen self organizing maps
Kohonen self organizing mapsKohonen self organizing maps
Kohonen self organizing mapsraphaelkiminya
 
Image restoration and degradation model
Image restoration and degradation modelImage restoration and degradation model
Image restoration and degradation modelAnupriyaDurai
 
Image Restoration (Digital Image Processing)
Image Restoration (Digital Image Processing)Image Restoration (Digital Image Processing)
Image Restoration (Digital Image Processing)Kalyan Acharjya
 
Learning Methods in a Neural Network
Learning Methods in a Neural NetworkLearning Methods in a Neural Network
Learning Methods in a Neural NetworkSaransh Choudhary
 
Digital Image Processing: Image Segmentation
Digital Image Processing: Image SegmentationDigital Image Processing: Image Segmentation
Digital Image Processing: Image SegmentationMostafa G. M. Mostafa
 
Report for Speech Emotion Recognition
Report for Speech Emotion RecognitionReport for Speech Emotion Recognition
Report for Speech Emotion RecognitionDongang (Sean) Wang
 
Edge linking in image processing
Edge linking in image processingEdge linking in image processing
Edge linking in image processingVARUN KUMAR
 

Tendances (20)

Emotion Speech Recognition - Convolutional Neural Network Capstone Project
Emotion Speech Recognition - Convolutional Neural Network Capstone ProjectEmotion Speech Recognition - Convolutional Neural Network Capstone Project
Emotion Speech Recognition - Convolutional Neural Network Capstone Project
 
mean_filter
mean_filtermean_filter
mean_filter
 
Object detection presentation
Object detection presentationObject detection presentation
Object detection presentation
 
Noise filtering
Noise filteringNoise filtering
Noise filtering
 
Region based segmentation
Region based segmentationRegion based segmentation
Region based segmentation
 
Facial Emoji Recognition
Facial Emoji RecognitionFacial Emoji Recognition
Facial Emoji Recognition
 
Module 31
Module 31Module 31
Module 31
 
IRJET- Emotion based Music Recommendation System
IRJET- Emotion based Music Recommendation SystemIRJET- Emotion based Music Recommendation System
IRJET- Emotion based Music Recommendation System
 
SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK
 
Smoothing Filters in Spatial Domain
Smoothing Filters in Spatial DomainSmoothing Filters in Spatial Domain
Smoothing Filters in Spatial Domain
 
image compression ppt
image compression pptimage compression ppt
image compression ppt
 
Kohonen self organizing maps
Kohonen self organizing mapsKohonen self organizing maps
Kohonen self organizing maps
 
Huffman Coding
Huffman CodingHuffman Coding
Huffman Coding
 
Image restoration and degradation model
Image restoration and degradation modelImage restoration and degradation model
Image restoration and degradation model
 
Image Restoration (Digital Image Processing)
Image Restoration (Digital Image Processing)Image Restoration (Digital Image Processing)
Image Restoration (Digital Image Processing)
 
Learning Methods in a Neural Network
Learning Methods in a Neural NetworkLearning Methods in a Neural Network
Learning Methods in a Neural Network
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Digital Image Processing: Image Segmentation
Digital Image Processing: Image SegmentationDigital Image Processing: Image Segmentation
Digital Image Processing: Image Segmentation
 
Report for Speech Emotion Recognition
Report for Speech Emotion RecognitionReport for Speech Emotion Recognition
Report for Speech Emotion Recognition
 
Edge linking in image processing
Edge linking in image processingEdge linking in image processing
Edge linking in image processing
 

En vedette

MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...
MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...
MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...Sebastian Raschka
 
Music Mood Detection (Lyrics based Approach)
Music Mood Detection (Lyrics based Approach)Music Mood Detection (Lyrics based Approach)
Music Mood Detection (Lyrics based Approach)Akhil Panchal
 
Facial expression and emotions 1
Facial expression and emotions 1Facial expression and emotions 1
Facial expression and emotions 1noemipaulin
 
MOODetector: Automatic Music Emotion Recognition
MOODetector: Automatic Music Emotion RecognitionMOODetector: Automatic Music Emotion Recognition
MOODetector: Automatic Music Emotion RecognitionRui Pedro Paiva
 
Facial emotion recognition corporatemeet
Facial emotion recognition corporatemeetFacial emotion recognition corporatemeet
Facial emotion recognition corporatemeetAnukriti Dureha
 
From Music Information Retrieval to Music Emotion Recognition
From Music Information Retrieval to Music Emotion RecognitionFrom Music Information Retrieval to Music Emotion Recognition
From Music Information Retrieval to Music Emotion RecognitionRui Pedro Paiva
 
Emotion Detection from Text
Emotion Detection from TextEmotion Detection from Text
Emotion Detection from TextIJERD Editor
 
Emotion detection from text using data mining and text mining
Emotion detection from text using data mining and text miningEmotion detection from text using data mining and text mining
Emotion detection from text using data mining and text miningSakthi Dasans
 
Applications of Emotions Recognition
Applications of Emotions RecognitionApplications of Emotions Recognition
Applications of Emotions RecognitionFrancesco Bonadiman
 
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...chakravarthy Gopi
 
Music player
Music playerMusic player
Music playerEva Hsieh
 
Final Year Project - Enhancing Virtual Learning through Emotional Agents (Doc...
Final Year Project - Enhancing Virtual Learning through Emotional Agents (Doc...Final Year Project - Enhancing Virtual Learning through Emotional Agents (Doc...
Final Year Project - Enhancing Virtual Learning through Emotional Agents (Doc...Sana Nasar
 
Social Networking - An Ethical Hacker's View
Social Networking - An Ethical Hacker's ViewSocial Networking - An Ethical Hacker's View
Social Networking - An Ethical Hacker's ViewPeter Wood
 
Pengenalan Wajah Dua Dimensi Menggunakan Multi-Layer Perceptron Berdasarkan N...
Pengenalan Wajah Dua Dimensi Menggunakan Multi-Layer Perceptron Berdasarkan N...Pengenalan Wajah Dua Dimensi Menggunakan Multi-Layer Perceptron Berdasarkan N...
Pengenalan Wajah Dua Dimensi Menggunakan Multi-Layer Perceptron Berdasarkan N...Noor Azizah
 
Human emotion recognition
Human emotion recognitionHuman emotion recognition
Human emotion recognitionAnukriti Dureha
 
Your emotions and the impact of your reactions pdf
Your emotions and the impact of your reactions pdfYour emotions and the impact of your reactions pdf
Your emotions and the impact of your reactions pdfBettina Pickering
 
Facial emotion recognition
Facial emotion recognitionFacial emotion recognition
Facial emotion recognitionAnukriti Dureha
 
Audit free cloud storage via deniable attribute based encryption
Audit free cloud storage via  deniable attribute based encryptionAudit free cloud storage via  deniable attribute based encryption
Audit free cloud storage via deniable attribute based encryptionMano Sriram
 
Emotion recognition using facial expressions and speech
Emotion recognition using facial expressions and speechEmotion recognition using facial expressions and speech
Emotion recognition using facial expressions and speechLakshmi Sarvani Videla
 
Offline signature verification based on geometric feature extraction using ar...
Offline signature verification based on geometric feature extraction using ar...Offline signature verification based on geometric feature extraction using ar...
Offline signature verification based on geometric feature extraction using ar...Cen Paul
 

En vedette (20)

MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...
MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...
MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...
 
Music Mood Detection (Lyrics based Approach)
Music Mood Detection (Lyrics based Approach)Music Mood Detection (Lyrics based Approach)
Music Mood Detection (Lyrics based Approach)
 
Facial expression and emotions 1
Facial expression and emotions 1Facial expression and emotions 1
Facial expression and emotions 1
 
MOODetector: Automatic Music Emotion Recognition
MOODetector: Automatic Music Emotion RecognitionMOODetector: Automatic Music Emotion Recognition
MOODetector: Automatic Music Emotion Recognition
 
Facial emotion recognition corporatemeet
Facial emotion recognition corporatemeetFacial emotion recognition corporatemeet
Facial emotion recognition corporatemeet
 
From Music Information Retrieval to Music Emotion Recognition
From Music Information Retrieval to Music Emotion RecognitionFrom Music Information Retrieval to Music Emotion Recognition
From Music Information Retrieval to Music Emotion Recognition
 
Emotion Detection from Text
Emotion Detection from TextEmotion Detection from Text
Emotion Detection from Text
 
Emotion detection from text using data mining and text mining
Emotion detection from text using data mining and text miningEmotion detection from text using data mining and text mining
Emotion detection from text using data mining and text mining
 
Applications of Emotions Recognition
Applications of Emotions RecognitionApplications of Emotions Recognition
Applications of Emotions Recognition
 
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...
 
Music player
Music playerMusic player
Music player
 
Final Year Project - Enhancing Virtual Learning through Emotional Agents (Doc...
Final Year Project - Enhancing Virtual Learning through Emotional Agents (Doc...Final Year Project - Enhancing Virtual Learning through Emotional Agents (Doc...
Final Year Project - Enhancing Virtual Learning through Emotional Agents (Doc...
 
Social Networking - An Ethical Hacker's View
Social Networking - An Ethical Hacker's ViewSocial Networking - An Ethical Hacker's View
Social Networking - An Ethical Hacker's View
 
Pengenalan Wajah Dua Dimensi Menggunakan Multi-Layer Perceptron Berdasarkan N...
Pengenalan Wajah Dua Dimensi Menggunakan Multi-Layer Perceptron Berdasarkan N...Pengenalan Wajah Dua Dimensi Menggunakan Multi-Layer Perceptron Berdasarkan N...
Pengenalan Wajah Dua Dimensi Menggunakan Multi-Layer Perceptron Berdasarkan N...
 
Human emotion recognition
Human emotion recognitionHuman emotion recognition
Human emotion recognition
 
Your emotions and the impact of your reactions pdf
Your emotions and the impact of your reactions pdfYour emotions and the impact of your reactions pdf
Your emotions and the impact of your reactions pdf
 
Facial emotion recognition
Facial emotion recognitionFacial emotion recognition
Facial emotion recognition
 
Audit free cloud storage via deniable attribute based encryption
Audit free cloud storage via  deniable attribute based encryptionAudit free cloud storage via  deniable attribute based encryption
Audit free cloud storage via deniable attribute based encryption
 
Emotion recognition using facial expressions and speech
Emotion recognition using facial expressions and speechEmotion recognition using facial expressions and speech
Emotion recognition using facial expressions and speech
 
Offline signature verification based on geometric feature extraction using ar...
Offline signature verification based on geometric feature extraction using ar...Offline signature verification based on geometric feature extraction using ar...
Offline signature verification based on geometric feature extraction using ar...
 

Similaire à Emotion based music player

IRJET - EMO-MUSIC(Emotion based Music Player)
IRJET - EMO-MUSIC(Emotion based Music Player)IRJET - EMO-MUSIC(Emotion based Music Player)
IRJET - EMO-MUSIC(Emotion based Music Player)IRJET Journal
 
Emotion Based Music Player System
Emotion Based Music Player SystemEmotion Based Music Player System
Emotion Based Music Player SystemIRJET Journal
 
SMART MUSIC PLAYER BASED ON EMOTION DETECTION
SMART MUSIC PLAYER BASED ON EMOTION DETECTIONSMART MUSIC PLAYER BASED ON EMOTION DETECTION
SMART MUSIC PLAYER BASED ON EMOTION DETECTIONIRJET Journal
 
B8_Mini project_Final review ppt.pptx
B8_Mini project_Final review ppt.pptxB8_Mini project_Final review ppt.pptx
B8_Mini project_Final review ppt.pptxEgguIqbal
 
IRJET- Implementation of Emotion based Music Recommendation System using SVM ...
IRJET- Implementation of Emotion based Music Recommendation System using SVM ...IRJET- Implementation of Emotion based Music Recommendation System using SVM ...
IRJET- Implementation of Emotion based Music Recommendation System using SVM ...IRJET Journal
 
Mood based Music Player
Mood based Music PlayerMood based Music Player
Mood based Music PlayerIRJET Journal
 
Moodify – Music Player Based on Mood
Moodify – Music Player Based on MoodMoodify – Music Player Based on Mood
Moodify – Music Player Based on MoodIRJET Journal
 
IRJET- The Complete Music Player
IRJET- The Complete Music PlayerIRJET- The Complete Music Player
IRJET- The Complete Music PlayerIRJET Journal
 
IRJET- Study of Effect of PCA on Speech Emotion Recognition
IRJET- Study of Effect of PCA on Speech Emotion RecognitionIRJET- Study of Effect of PCA on Speech Emotion Recognition
IRJET- Study of Effect of PCA on Speech Emotion RecognitionIRJET Journal
 
Nithin Xavier research_proposal
Nithin Xavier research_proposalNithin Xavier research_proposal
Nithin Xavier research_proposalNithin Xavier
 
Mehfil : Song Recommendation System Using Sentiment Detected
Mehfil : Song Recommendation System Using Sentiment DetectedMehfil : Song Recommendation System Using Sentiment Detected
Mehfil : Song Recommendation System Using Sentiment DetectedIRJET Journal
 
Facial recognition to detect mood and recommend songs
Facial recognition to detect mood and recommend songsFacial recognition to detect mood and recommend songs
Facial recognition to detect mood and recommend songsIRJET Journal
 
IRJET- A Survey on Sound Recognition
IRJET- A Survey on Sound RecognitionIRJET- A Survey on Sound Recognition
IRJET- A Survey on Sound RecognitionIRJET Journal
 
A new alley in Opinion Mining using Senti Audio Visual Algorithm
A new alley in Opinion Mining using Senti Audio Visual AlgorithmA new alley in Opinion Mining using Senti Audio Visual Algorithm
A new alley in Opinion Mining using Senti Audio Visual AlgorithmIJERA Editor
 

Similaire à Emotion based music player (20)

IRJET - EMO-MUSIC(Emotion based Music Player)
IRJET - EMO-MUSIC(Emotion based Music Player)IRJET - EMO-MUSIC(Emotion based Music Player)
IRJET - EMO-MUSIC(Emotion based Music Player)
 
Emotion Based Music Player System
Emotion Based Music Player SystemEmotion Based Music Player System
Emotion Based Music Player System
 
SMART MUSIC PLAYER BASED ON EMOTION DETECTION
SMART MUSIC PLAYER BASED ON EMOTION DETECTIONSMART MUSIC PLAYER BASED ON EMOTION DETECTION
SMART MUSIC PLAYER BASED ON EMOTION DETECTION
 
B8_Mini project_Final review ppt.pptx
B8_Mini project_Final review ppt.pptxB8_Mini project_Final review ppt.pptx
B8_Mini project_Final review ppt.pptx
 
IRJET- Implementation of Emotion based Music Recommendation System using SVM ...
IRJET- Implementation of Emotion based Music Recommendation System using SVM ...IRJET- Implementation of Emotion based Music Recommendation System using SVM ...
IRJET- Implementation of Emotion based Music Recommendation System using SVM ...
 
Mood based Music Player
Mood based Music PlayerMood based Music Player
Mood based Music Player
 
Ijsrdv8 i10550
Ijsrdv8 i10550Ijsrdv8 i10550
Ijsrdv8 i10550
 
Audio-
Audio-Audio-
Audio-
 
STUDY ON AIR AND SOUND POLLUTION MONITORING SYSTEM USING IOT
STUDY ON AIR AND SOUND POLLUTION MONITORING SYSTEM USING IOTSTUDY ON AIR AND SOUND POLLUTION MONITORING SYSTEM USING IOT
STUDY ON AIR AND SOUND POLLUTION MONITORING SYSTEM USING IOT
 
Isolation Of Lactic Acid Producing Bacteria And Production Of Lactic Acid Fro...
Isolation Of Lactic Acid Producing Bacteria And Production Of Lactic Acid Fro...Isolation Of Lactic Acid Producing Bacteria And Production Of Lactic Acid Fro...
Isolation Of Lactic Acid Producing Bacteria And Production Of Lactic Acid Fro...
 
Study On An Emotion Based Music Player
Study On An Emotion Based Music PlayerStudy On An Emotion Based Music Player
Study On An Emotion Based Music Player
 
Moodify – Music Player Based on Mood
Moodify – Music Player Based on MoodMoodify – Music Player Based on Mood
Moodify – Music Player Based on Mood
 
IRJET- The Complete Music Player
IRJET- The Complete Music PlayerIRJET- The Complete Music Player
IRJET- The Complete Music Player
 
IRJET- Study of Effect of PCA on Speech Emotion Recognition
IRJET- Study of Effect of PCA on Speech Emotion RecognitionIRJET- Study of Effect of PCA on Speech Emotion Recognition
IRJET- Study of Effect of PCA on Speech Emotion Recognition
 
Nithin Xavier research_proposal
Nithin Xavier research_proposalNithin Xavier research_proposal
Nithin Xavier research_proposal
 
Mehfil : Song Recommendation System Using Sentiment Detected
Mehfil : Song Recommendation System Using Sentiment DetectedMehfil : Song Recommendation System Using Sentiment Detected
Mehfil : Song Recommendation System Using Sentiment Detected
 
Emotion classification for musical data using deep learning techniques
Emotion classification for musical data using deep learning  techniquesEmotion classification for musical data using deep learning  techniques
Emotion classification for musical data using deep learning techniques
 
Facial recognition to detect mood and recommend songs
Facial recognition to detect mood and recommend songsFacial recognition to detect mood and recommend songs
Facial recognition to detect mood and recommend songs
 
IRJET- A Survey on Sound Recognition
IRJET- A Survey on Sound RecognitionIRJET- A Survey on Sound Recognition
IRJET- A Survey on Sound Recognition
 
A new alley in Opinion Mining using Senti Audio Visual Algorithm
A new alley in Opinion Mining using Senti Audio Visual AlgorithmA new alley in Opinion Mining using Senti Audio Visual Algorithm
A new alley in Opinion Mining using Senti Audio Visual Algorithm
 

Dernier

Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 

Dernier (20)

Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 

Emotion based music player

  • 1. EMO PLAYER SEMINAR REPORT| 2016 MCET 1 nzmotp@gmail.com EMO PLAYER SEMINAR REPORT Submitted in partial fulfillment of the requirements for the award of Bachelor of Technology degree in Electronics & Communication Engineering By NISAMUDEEN P (MQAMEEC020) DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING MALABAR COLLEGE OF ENGINEERING & TECHNOLOGY DESAMANGALAM, PALLUR (P.O), THRISSUR. KERALA - 679532.
  • 2. EMO PLAYER SEMINAR REPORT| 2016 MCET 2 nzmotp@gmail.com BONAFIDE CERTIFICATE This is to certify that this Seminar Report is the bonafide work of NISAMUDEEN P (MQAMEEC020) who carried out the Seminar entitled EMO PLAYER under our supervision from __________ to ___________. (Guide) Mr. VYSHAK VITTAL Mr. ANSHAD A.S (Assistant professor ECE Department) HOD ECE Name of the Examiners Signature 1) 2)
  • 3. EMO PLAYER SEMINAR REPORT| 2016 MCET 3 nzmotp@gmail.com DECLARATION I NISAMUDEEN P (MQAMEEC020) hereby declare that the Seminar Report entitled EMO PLAYER done by me under the guidance of Mr. VYSHAK VITTAL at Malabar College of Engineering & Technology is submitted in partial fulfillment of the requirements for the award of Bachelor of Technology degree in Electronics & Communication Engineering. DATE: PLACE: SIGNATURE OF THECANDIDATE NISAMUDEEN P
  • 4. EMO PLAYER SEMINAR REPORT| 2016 MCET 4 nzmotp@gmail.com ACKNOWLEDGEMENT I thank ALMIGHTY GOD, who laid the foundation for knowledge and wisdom and has always been the source of my strength, inspiration and who guides me through out. I express my sincere thanks to Dr. V.NANDANAN, Principal, for giving me the permission to present this seminar work and providing me with all the facilities for the successful completion of the same. I express deep sense of gratitude to Prof. ANSHAD, HOD, Department of Electronic & Communication for providing me with best facilities and atmosphere for the creative work guidance and encouragement. I would like to thank my coordinator, Ms. GOPIKA GOPINATH V, Assistant professor and my guide Mr. VYSHAK VITTAL, Assistant professor, for their judicious and importunate piece of information, expert suggestion, guidance and support during every stage of this work. I express sincere thanks to all the Lectures in the ECE Department for providing me with valuable guidance and encouragement throughout the work. I also express my sincere thanks to all my classmates, friends, and much great felt to my loving Parents for their encouragement, prayers and inspiration during every stage of this work. CONTENTS
  • 5. EMO PLAYER SEMINAR REPORT| 2016 MCET 5 nzmotp@gmail.com 1. LIST OF FIGURES [iii] 2. LIST OF TABLES [iv] 3. ABSTRACT 4. CHAPTER 1: INTRODUCTION 4 5. CHAPTER 2: LITERATURE REVIEW 2.1 Face detection methods 4 2.2 Feature extraction methods 4 2.3 Expression Recognition 5 6. CHAPTER 3: CURRENT MUSIC SYSTEMS 7 3.1 Limitations of existing system 7 7. CHAPTER 4: PROPOSED SYSTEM 8 4.1 Emotion Feature Extraction Module 10 4.1.1 Face detection and tracking 11 4.1.2 Facial Expression Recognition 12 4.2 Audio Feature Extraction Module 14 4.2.1 MIR 1.5 Toolbox 15 4.2.2 Chroma Toolbox 16 4.2.3 Auditory ToolboX 16
  • 6. EMO PLAYER SEMINAR REPORT| 2016 MCET 6 nzmotp@gmail.com 4.2.4 MFCC 17 4.3 Emotion- Audio integration module 17 4.3.1 Audio Emotion Recognition 17 4.4 Classifier 20 4.5 Databases 21 4.5 Music 21 8. CHAPTER 5: Viola and Jones algorithm 22 5.1 Advantages 22 5.2 Disadvantages 23 5.3 MATLAB code on frontal cart for object detection 23 9. CHAPTER 6: Experimental results 24 6.1 EMOTION EXTRACTION 24 6.2 AUDIO FEATURE EXTRACTION 25 10. CHAPTER 7: CONCLUSION 26 11. CHAPTER 8: APLICATIONS 27 12 CHAPTER 9: FUTURE SCOPE 28 13 CHAPTER 10: REFRENCES 29
  • 7. EMO PLAYER SEMINAR REPORT| 2016 MCET 7 nzmotp@gmail.com LIST OF FIGURES 1. Framework of recognizing emotions with privileged information 03 2. functional block diagram for basic music system 07 3. flow chart 09 4. Feature Extraction from Facial Image 12 5. Regions involved in Expression analysis 13 6. Face region grids (Bounding box) 17 7. Audio Preprocessing 14 8. Chroma Toolbox 16 9. Phases of the MFCC computation 16 10. Integration of three modules 18 11. Mapping of modules 19
  • 8. EMO PLAYER SEMINAR REPORT| 2016 MCET 8 nzmotp@gmail.com LIST OF TABLES Table 1: Time Estimation of Various modules of Emotion Extraction Module 28 Table 2: Estimated Accuracy for different categories of Audio Feature 29 Table 3: Average Time Estimation of Various modules of Proposed System 29
  • 9. EMO PLAYER SEMINAR REPORT| 2016 MCET 9 nzmotp@gmail.com ABSTRACT The human face is an important organ of an individual‘s body and it especially plays an important role in extraction of an individual‘s behavior and emotional state. Manually segregating the list of songs and generating an appropriate playlist based on an individual‘s emotional features is a very tedious, time consuming, labor intensive and upheld task. Various algorithms have been proposed and developed for automating the playlist generation process. However the proposed existing algorithms in use are computationally slow, less accurate and sometimes even require use of additional hardware like EEG or sensors. This proposed system based on facial expression extracted will generate a playlist automatically thereby reducing the effort and time involved in rendering the process manually. Thus the proposed system tends to reduce the computational time involved in obtaining the results and the overall cost of the designed system, thereby increasing the overall accuracy of the system. Testing of the system is done on both user dependent (dynamic) and user independent (static) dataset. Facial expressions are captured using an inbuilt camera. The accuracy of the emotion detection algorithm used in the system for real time images is around 85-90%, while for static images it is around 98- 100%.The proposed algorithm on an average calculated estimation takes around 0.95-1.05 sec to generate an emotion based music playlist. Thus, it yields better accuracy in terms of performance and computational time and reduces the designing cost, compared to the algorithms used in the literature survey. Implementation and testing of the proposed algorithm is carried out using an inbuilt camera. Hence, the proposed algorithm reduces the overall cost of the system successfully. Also, on average, the proposed algorithm takes 1.10 sec to generate a playlist based on facial expression. Thus, it yields better performance, in terms of computational time, as compared to the algorithms in the existing literature.
  • 10. EMO PLAYER SEMINAR REPORT| 2016 MCET 10 nzmotp@gmail.com CHAPTER 1 INTRODUCTION Music plays a very important role in enhancing an individual‘s life as it is an important medium of entertainment for music lovers and listeners and sometimes even imparts a therapeutic approach. In today‘s world, with ever increasing advancements in the field of multimedia and technology, various music players have been developed with features like fast forward, reverse, variable playback speed (seek & time compression),local playback, streaming playback with multicast streams and including volume modulation, genre classification etc. Although these features satisfy the user‘s basic requirements, yet the user has to face the task of manually browsing through the playlist of songs and select songs based on his current mood and behavior. That is the requirements of an individual, a user sporadically suffered through the need and desire of browsing through his playlist, according to his mood and emotions. Using traditional music players, a user had to manually browse through his playlist and select songs that would soothe his mood and emotional experience. This task was labor intensive and an individual often faced the dilemma of landing at an appropriate list of songs. Emotions are synonymous with the aftermath of interplay between an individual’s cognitive gauging of an event and the corresponding physical response towards it. Among the various ways of expressing emotions, including human speech and gesture, a facial expression is the most natural way of relaying them. The ability to understand human emotions is desirable for human-computer interaction. In the past decade, considerable amounts of research have been done on emotion recognition from voice, visual behavior, and physiological signals respectively or jointly. Very good progresses have been achieved in this field, and several commercial products have been developed, such as smile detection in camera. Emotion is a subjective response to the outer stimulus. A facial expression is a discernible manifestation of the emotive state, cognitive activity, motive, and psychopathology of a person. Homo sapiens have been blessed with an ability to interpret and analyze an individual’s emotional state.
  • 11. EMO PLAYER SEMINAR REPORT| 2016 MCET 11 nzmotp@gmail.com The existing systems, providing the privileged information for the emotion recognitions are  Audio Emotion Recognition (AER)  Music Information Retrieval (MIR)  Emotion Recognition with the Help of Privileged Information AER is a technique which deals with classifying a received audio signal, by considering its various audio features into various classes of emotions and moods. Whereas MIR is a field that extracts some critical information from an audio signal by exploring some audio features like pitch, energy, MFCC, flux etc. Though both Audio Emotion Recognition (AER) and Music Information Retrieval (MIR) included the capabilities of avoiding manual segregation of songs and generation of playlist, yet it is unable to incorporate fully a human emotion controlled music player. The advent of Audio Emotion Recognition (AER) and Music Information Retrieval (MIR) equipped the traditional systems with a feature that automatically parsed the playlist, based on different classes of emotions. While AER deals with categorizing an audio signal under various classes of emotions, based on certain audio features , MIR is a field that relies upon exploring crucial information and extracting various audio features (e.g. pitch, energy, flux, MFCC, kurtosis etc.) from an audio signal. Although AER and MIR augmented the capabilities of traditional music players by eradicating the need of manual segregation of playlist and annotation of songs, based on a user’s emotion, yet, such systems did not incorporate mechanisms that enabled a music player to be fully controlled by human emotions. Although human speech and gesture are a common way of expressing emotions, but facial expression is the most ancient and natural way of expressing feelings, emotions and mood. Facial expression is the most ancient and natural way of expressing feelings, emotions. But in the above two methods ,Audio Emotion Recognition (AER) and Music Information Retrieval (MIR) of emotion recognition systems there is no dependence up on the face of the user.
  • 12. EMO PLAYER SEMINAR REPORT| 2016 MCET 12 nzmotp@gmail.com In third, it is a novel approach to classify emotions from EEG signals, and a novel method to tag videos’ emotions, here the integrations of privileged information as from EEG feature and the Video feature space. By comparing this information with the music play list system for the automatic generation as given in figure 1. Figure 1: Framework of recognizing emotions with privileged information The main objective of this paper is to design an efficient and accurate algorithm that would generate a playlist based on current emotional state and behavior of the user. And paper plays a novel approach that removes the risk that the user has to face the task of manually browsing through the playlist of songs to select. The performance of an efficient and accurate algorithm, that would generate a playlist based on current emotional state and behavior of the user. EEG DATA Video data EEG/Video features to Be recognized / tagged EEG/VIDEO feature space Video feature EEG feature EEG feature space Video feature space Tag Emotion Video model EEG model EEG +video
  • 13. EMO PLAYER SEMINAR REPORT| 2016 MCET 13 nzmotp@gmail.com CHAPTER 2 LITERATURE REVIEW Various techniques and approaches have been proposed and developed to classify human emotional state of behavior. The proposed approaches have focused only on the some of the basic emotions. 2.1 Face detection methods The techniques for face detection can be distinguished into two groups: holistic, where face is treated as a whole unit and analytic, where co-occurrence of characteristic facial elements is studied. Pantic and Rothkrantz proposed system which process images of frontal and profile face view. Vertical and horizontal histogram analysis is used to find face boundaries. Then, face contour is obtained by thresholding the image with HSV color space values. Kobayashi and Hara used image captured in monochrome mode to find face brightness distribution. Position of face is estimated by iris localization. For the purpose of feature recognition, facial features have been categorized into two major categories such as Appearance-based feature extraction and Geometric based feature extraction. Geometric based feature extraction technique considered only the shape or major prominent points of some important facial features such as mouth and eyes in the system proposed by Changbo. Around a total of 58 major landmark points was considered in crafting an ASM. The appearance based extraction feature like texture, have also been considered in different areas of work and development. An efficient method for coding and implementing extracted facial features together with multi-orientation and multi-resolution set of Gabor filters was proposed by Michael Lyons. 2.2 Feature extraction methods Pantic and Rothkrantz selected a set of facial points from frontal and profile face images. The expression is measured by a distance between position of those points in the initial image (neutral face) and peak image (affected face).
  • 14. EMO PLAYER SEMINAR REPORT| 2016 MCET 14 nzmotp@gmail.com Cohn developed geometric feature based system in which the optical flow algorithm is performed only in 13x13 pixel regions surrounding facial landmarks. Shan investigated the Local Binary Pattern method for texture encoding in facial expression description. Two methods of feature extraction were proposed. In the first one, features are extracted from fixed set of patches and in the second method from most probable patches found by boosting. The music mood tags and A-V values from a total 20 subjects were tested and analyzed in Jung Hyun Kim‘s work, and based on the results obtained from the analysis, the A-V plane was classified into 8 regions(clusters), depicting mood by data mining efficient k-means clustering algorithm. Thayer proposed a very useful 2-dimenesional (Stress v/s energy) model plotted on two axes with emotions depicted by a 2- dimensional co-ordinate system, lying on either 2 axes or the 4 quadrants formed by the 2-dimensional plot. 2.3 Expression Recognition The last part of the FER system is based on machine learning theory; precisely it is the classification task. The input to the classifier is a set of features which were retrieved from face region in the previous stage. Classification requires supervised training, so the training set should consist of labeled data. There are a lot of different machine learning techniques for classification task, namely: K-Nearest Neighbors, Artificial Neural Networks, Support Vector Machines, Hidden Markov Models, and Expert Systems with rule based classifier, Bayesian Networks or Boosting Techniques (Adaboost, Gentleboost).The field of Facial Expression recognition (FER) synthesized algorithm that excelled in furnishing such demands. FER enabled the computer systems to monitor an individual’s emotional state effectively and react appropriately. While various systems have been designed to recognize facial expressions, their time and memory complexity is relatively high and hence fail in achieving a real time performance. Their feature extraction techniques are also less reliable and are responsible for reducing the overall accuracy of the system. An accurate statistical based approach for was proposed by Renuka R. Londhe. The paper was majorly focused on the study of the changes in curvatures on the face and intensities of corresponding pixels of images.
  • 15. EMO PLAYER SEMINAR REPORT| 2016 MCET 15 nzmotp@gmail.com Artificial Neural Networks (ANN) was used in the classification extracted features into 6 major universal emotions like anger, disgust, fear, happy, sad, and surprise. Some of the drawbacks of the existing system are as follows i. Existing systems are highly complex in terms of time and storage for recognizing facial expressions in a real environment. ii. Existing systems lack accuracy in generating a playlist based on the current emotional experience of a user. iii. Existing systems employed additional hardware like EEG systems and sensors that increased the overall cost of the system. iv. Some of the existing system imposed a requirement of human speech for generating a playlist, in accordance with a user’s facial expressions. Several approaches have been proposed and have been adopted to classify human emotions successfully. This paper primarily aims and focuses on resolving the drawbacks involved in the existing system by designing an automated emotion based music player for the generation of customized playlist based on user extracted facial features and thus avoiding the employment of any additional hardware. It also includes a mood randomized and appetizer function that shifts the mood generated playlist to another same level of randomized mood generated playlist after some duration. In order to reduce the human effort and time needed for manual segregation of songs from a playlist, in correlation with different classes of emotions and moods, various approaches have been proposed. Numerous approaches have been designed to extract facial features and audio features from an audio signal and very few of the systems designed have the capability to generate an emotion based music playlist using human emotions and the existing designs of the systems are capable to generate an automated playlist using an additional hardware like Sensors or EEG systems thereby increasing the cost of the design proposed.
  • 16. EMO PLAYER SEMINAR REPORT| 2016 MCET 16 nzmotp@gmail.com CHAPTER 3 CURRENT MUSIC SYSTEMS The features available in the existing Music players present in computer systems are as follows: i. Manual selection of Songs ii. Party Shuffle iii. Playlists iv. Music squares where user has to classify the songs manually according to particular emotions for only four basic emotions .Those are Passionate, Calm, Joyful and Excitement. Limitations of existing system: i. It requires the user to manually select the songs. ii. Randomly played songs may not match to the mood of the user. iii. User has to classify the songs into various emotions and then for playing the songs user has to manually select a particular emotion. iv. Figure 2: functional block diagram for basic music system WHOLE MUSIC PLAYING SYSTEM SELECTION OF PLAYLISTS PLAYLISTS: PLAYLIST-1 PLAYLIST-2 PLAYLIST-3 FAVORITES RECENTLY ADDED SELECTION OF SONGS
  • 17. EMO PLAYER SEMINAR REPORT| 2016 MCET 17 nzmotp@gmail.com CHAPTER 4 PROPOSED SYSTEM Proposed automatic play list generation scheme is combination of multiple schemes together. In this paper, we consider the notion of collecting human emotion from the user’s expressions, and explore how this information could be used to improve the user experience with music players. We present a new emotion based and user-interactive music system, EMO PLAYER. It aims to provide user-preferred music with emotion awareness. The system starts recommendation with expert knowledge. If the user does not like the recommendation, he/she can decline the recommendation and select the desired music himself/herself. This paper is based on the idea of automating much of the interaction between the music player and its user. It introduces a "smart" music player, EMO PLAYER that learns its user's emotions, and tailors its music selections accordingly. After an initial training period, EMO PLAYER is able to use its internal algorithms to make an educated selection of the song that would best fit its user's emotion. The proposed algorithm revolves around an automated music recommendation system that generates a subset of the original playlist or a customized playlist in accordance with a user’s facial expressions. A User’s facial expression helps the music recommendation system to decipher the current emotional state of a user and recommend a corresponding subset of the original playlist. It is composed of three main modules: Facial expression recognition module, Audio emotion recognition module and a System integration module. Facial expression recognition and audio emotion recognition modules are two mutually exclusive modules. Hence, the system integration module maps the two sub-spaces by constructing and querying a meta-data file. It is composed of meta-data corresponding to each audio file. Figure 3 depicts the flowchart of the proposed algorithm.
  • 18. EMO PLAYER SEMINAR REPORT| 2016 MCET 18 nzmotp@gmail.com Figure 3: flow chart
  • 19. EMO PLAYER SEMINAR REPORT| 2016 MCET 19 nzmotp@gmail.com 4.1 Emotion Extraction Module Image of a user is captured using a webcam or it can be accessed from the stored image in the hard disk. This acquired image undergoes image enhancement in the form of tone mapping in order to restore the original contrast of the image. After image enhancement all images are converted into binary image format and the face is detected using Viola and Jones algorithm where the Frontal Cart property of the algorithm is used that only detects upright and face forwarding features with a maximum threshold value set in the range of 16- 20. The output of Viola and Jones Face detection block forms an input to the facial feature extraction block. To increase the accuracy and an aim to obtain real time performance only features of eyes and mouth are appropriate enough to depict the emotions accurately. For extracting the features of mouth and eyes certain calculations and measurements are taken into consideration. Equations (1), (2), (3) and (4) illustrate the bounding box calculations for extracting features of a mouth. The four equations for defining the feature of a mouth with the help of bounding box is given as follows X (start pt of mouth) =X (mid pt of nose) – (X (end pt of nose) – (Xstart pt of nose)) (1) X (end pt of mouth) =X (mid pt of nose) + ((Xend pt of nose) – (Xstart pt of nose)) (2) Y (start pt of mouth) =Y (mid pt of nose) +15 (3) Y (end pt of mouth) =Y (start pt of mouth) +103 (4) Where (X (start pt of mouth), Y (start pt of mouth)) and (X (end pt of mouth), Y (end pt of mouth)) illustrates start and end points of the bounding box for mouth respectively X (mid pt of nose), Y (mid pt of nose)) illustrates midpoint of noise and ((Xend pt of nose), (Xstart pt of nose)) illustrates end and start point of noise. Classification is performed using Support Vector Machine (SVM) which classifies it into 7 classes of emotions.
  • 20. EMO PLAYER SEMINAR REPORT| 2016 MCET 20 nzmotp@gmail.com Face detection and tracking First part of the system is a module for face detection and landmark localization in the image. The algorithm for face detection is based on work by Viola and Jones. In this approach image is represented by a set of Haar-like features. Possible types of features are bounding box technique, 2, 3 rectangular feature and four rectangular feature. Bounding box is the physical shape for a given object and can be used to graphically define the physical shapes of the objects. Feature value is calculated by subtracting sum of the pixels covered by white rectangle from sum of pixels under gray rectangle. Two rectangular features detect contrast between two vertically or horizontally adjacent regions. Three rectangular features detect contrasted region placed between two similar regions and four rectangular features detect similar regions placed diagonally. Rectangle features can be computed very rapidly using an intermediate representation for the image which we call the integral image. This value helps in coagulating the multiple boxes into a single bounding box. Thus feature can be computed rapidly because the value of each rectangle requires only 4 pixel references. Having the representation of the image in rectangular features, the classifier needs to be trained to decide if the image contains searched object (face) or not. The number of features is much higher than the number of pixels in the original image. However, it was proven that even a small set of well- chosen features can build a strong classifier. That is why, the AdaBoost algorithm can be used for training. Each step selects the most discriminative feature which separates positive and negative examples in the best way. The method is widely used in area of face detection. However, it can be trained to detect any object. This algorithm is quick and efficient and could be used in real- time applications. The facial image obtained from the face detection stage forms an input to the feature extraction stage. To obtain real time performance and to reduce time complexity, for the intent of expression recognition, only eyes and mouth are considered. The combination of two features is adequate to convey emotions accurately. Figure 4 depicts the schematic for feature extraction.
  • 21. EMO PLAYER SEMINAR REPORT| 2016 MCET 21 nzmotp@gmail.com Facial Expression Recognition Figure 4: Feature Extraction from Facial Image All RGB and gray scale images are converted into a binary image. This preprocessed image is fed into the face detection block. Face detection is carried out using viola and jones algorithm. The default property of ‘Frontal Cart’ with a merging threshold of 16 is employed to carry out this process. The ‘frontal cart’ property only detects the frontal face that is upright and forward facing. Since, viola and jones, by default, produces multiple bounding boxes around the face, merging threshold of 16 helps in coagulating these multiple boxes into a single bounding box. In-case of multiple faces, this threshold detects the face closest to the camera and filters out all the distant images.
  • 22. EMO PLAYER SEMINAR REPORT| 2016 MCET 22 nzmotp@gmail.com Figure 5: Regions involved in Expression analysis The binary image obtained from the face detection block forms an input to the feature extraction block, where eyes and mouth are extracted. Eyes are extracted using the viola and jones method, however, to extract mouth, certain measurements were considered. To extract mouth, first bounding box for nose is calculated using viola and jones and then using the bounding box for nose, bounding box of mouth is deduced. Figure 6: Face region grids (Bounding box)
  • 23. EMO PLAYER SEMINAR REPORT| 2016 MCET 23 nzmotp@gmail.com 4.2 Audio Feature Extraction Module In this module a list of songs forms the input. As songs are audio files, they require a certain amount of preprocessing Stereo signals obtained from the Internet are converted to 16 bit PCM mono signal around a variable sampling rate of 48.6 kHz. The conversion process is done using Audacity technique. The pre-processed signal obtained undergoes an audio feature extraction, where features like rhythm toning is extracted using MIR 1.5 Toolbox, pitch is extracted using Chroma Toolbox and other features like centroid, spectral flux, spectral roll off, kurtosis,15 MFCC coefficients are extracted using Auditory Toolbox. Figure 7: Audio Preprocessing
  • 24. EMO PLAYER SEMINAR REPORT| 2016 MCET 24 nzmotp@gmail.com Audio signals are categorized into 8 types viz. sad, joy-anger, joy-surprise, joy-excitement, joy, anger, sad-anger and others. 1. Songs that resemble cheerfulness, energetic and playfulness are classified under joy. 2. Songs that resemble very depressing are classified under the sad. 3. Songs that reflect mere attitude, revenge are classified under anger. 4. Songs with anger in playful is classified under Joy-anger category. 5. Songs with very depress mode and anger mood are classified under Sad-Anger category. 6. Songs which reflect excitement of joy is classified under Joy-Excitement category. 7. Songs which reflect surprise of joy is classified under Joy-surprise category. 8. All other songs fall under ‗others‘category. The extractions of converted files are done by three tool box units, MIR 1.5 Toolbox, Chroma Toolbox, and Auditory Toolbox. MIR 1.5 Toolbox Here the Features like tonality, rhythm, structures, etc extracted using integrated set of functions written in MATLAB. It is one of the important tool boxes for audio editing feature .Additionally to the basic computational processes, the toolbox also includes higher- level musical feature extraction tools, whose alternative strategies, and their multiple combinations, can be selected by the user. The toolbox was initially conceived in the context of the Brain Tuning project financed by the European Union (FP6-NEST). One main objective was to investigate the relation between musical features and music-induced emotion and the associated neural activity.
  • 25. EMO PLAYER SEMINAR REPORT| 2016 MCET 25 nzmotp@gmail.com Chroma Toolbox Chroma Toolbox is MATLAB implementations for extracting various types of novel pitch- based and chroma-based audio features. It contains feature extractors for pitch features as well as parameterized families of variants of chroma-like features. Figure 8: Chroma Toolbox Auditory Toolbox In auditory tool box the Features like centroid, spectral flux, spectral roll off, kurtosis, and 15 MFCC coefficients are extracted using Auditory Toolbox with the implementation on MATLAB. Audio signals are categorized into 8 types viz. sad, joy-anger, joy-surprise, joy- excitement, joy, anger, sad-anger and others. MFCC returns a description of the spectral shape of the sound. The computation of the cepstral follows the scheme present in Figure 9. Figure 9: Phases of the MFCC computation
  • 26. EMO PLAYER SEMINAR REPORT| 2016 MCET 26 nzmotp@gmail.com 4.2.1 MFCC MFCC are perceptually motivated features that are also based on the Short-time Fourier Transform (STFT). After taking the log-amplitude of the magnitude spectrum, the FFT bins are grouped and smoothed according to the perceptually motivated Mel- frequency scaling. Finally, in order to decorrelate the resulting feature vectors, a Discrete Cosine Transform (DCT) is performed. Although typically 13 coefficients are used for speech representation, it was found that the first five coefficients are adequate for music representation (Tzanetakis, 2002). 4.3 Emotion- Audio integration module Emotions extracted for the songs are stored as a meta-data in the database. Mapping is performed by querying the meta-data database. The emotion extraction module and audio feature extraction module is finally mapped and combined using an Emotion-Audio integration module. Audio Emotion Recognition In Music Emotion recognition block, the playlist of a user forms the input. An audio signal initially undergoes certain amount of preprocessing. Since, music files acquired from the internet are usually stereo signals; all stereo signals are converted to 16 bit PCM mono signal at a sampling rate of 44.1 kHz. The conversion is performed using Audacity. Conversion of a stereo signal into mono signal is crucial to reduce the mathematical complexity of processing the similar content of both the channels of a stereo signal. An 80 second window is then extracted from the entire audio signal. Audio files are usually very large in size. They are computationally expensive in terms of memory. Hence, to reduce the memory complexity of an audio file incurred during audio feature extraction, an 80 sec window is extracted from each audio file. This also ensures uniformity in terms of size of each audio file. During extraction, first 20 seconds of an audio file are discarded and are not considered. This helps in ensuring an efficient retrieval of significant information from an audio file. Figure 7 depicts the process of extraction of an 80 second window.
  • 27. EMO PLAYER SEMINAR REPORT| 2016 MCET 27 nzmotp@gmail.com This pre-processed signal then undergoes audio feature extraction, where centroid, spectral flux, spectral roll off, kurtosis, zero-crossing rate and 13 MFCC coefficients are extracted. Toolboxes used for audio feature extraction includes MIR 1.5, Auditory Toolbox and Chroma toolbox. Music based emotion recognition is then carried out using Support Vector Machine’s ‘one vs other’ approach. Audio signals are classified under 6 categories viz. Joy, Sad, Anger, Sad-Anger, Joy-Anger and Others. While various mood models have been proposed in the literature, they failed in capturing the perceptions of a user in real time. They are figments of theoretical aspects that researchers have associated with different audio signals. The mood model and the classes considered in this paper take into account, how a user may associate different moods with an audio signal and a song. While composing a song, an artist may not maintain the uniformity of a mood across the entire excerpt of a song. Songs are usually based on a theme or a situation. A song may boast sadness in first half while the second half may become cheerful. Hence the model adopted in this paper takes into consideration all these aspects and generates paired classes for such songs, apart from the individual classes. The model adopted is as follows. All those songs that are cheerful, energetic and playful are classified under joy, Songs that are very depressing are classified under the class sad, Songs that reflect attitude, ‘anger associated with patriotism’, and are revengeful are classified under anger, The category Joy-Anger is associated with songs that possess anger in a playful mode, Sad-anger category is composed of all those songs that revolve around the theme of being extremely depressed and angry, All other songs apart from these general categories falls under the ‘other’ category and When a user is detected with emotions such as surprise and fear, the songs from the other category are suggested. Figure 10: integration of three modules Emotion Extraction Module Audio Feature Extraction Module Emotion- Audio integration module Data Base (Meta - data)
  • 28. EMO PLAYER SEMINAR REPORT| 2016 MCET 28 nzmotp@gmail.com Figure.11: mapping of modules Figure 11 depicts the mapping of facial emotion recognition module and audio emotion recognition module. The name of the file and emotions associated with the song is then recorded in a database as its meta-data. The final mapping between the two blocks is carried out by querying the meta-data database. Since, Facial Emotion recognition and Audio emotion recognition modules are two mutually exclusive components, system integration is performed using an intermediate mapping module that relates the two mutually exclusive blocks with the help of the audio Meta data file. Figure 11 depicts the mapping of Facial emotions and Audio emotions. While fear and surprise categories of facial emotions are mapped onto the ‘Others’ category of audio emotions, each of joy, sad and anger categories of facial emotions are mapped onto two categories in audio emotions as shown in diagram below. During play list generation, when the system classifies a facial image under one of the three emotions, it considers both individual as well as paired classes. For example, if a facial image is classified under joy, the system will display songs under joy and joy-anger categories.
  • 29. EMO PLAYER SEMINAR REPORT| 2016 MCET 29 nzmotp@gmail.com 4.3 Classifier Many classifiers have been applied to expression recognition such as neural network (NN), support vector machines (SVM), linear discriminant analysis (LDA), K- nearest neighbor, multinomial logistic ridge regression (MLR), hidden Markov models (HMM), tree augmented naive Bayes, and others. In proposed method the Support Vector Machine with Radial Based Kernel Function is used as a classifier. The Support Vector Machine is an adaptive learning system which receives labeled training data and transforms it into higher dimensional feature space. Then separating hyper-plane with respect to margin maximization is computed to determine the best separation between classes. The greatest advantage of SVM is that even with small set of training data it has good performance in generalization. Shan evaluated the performance of LBP features as facial expression descriptors and they obtained the result of 89% by use of SVM with RBF Kernel trained with CK database. Recently Bartlett conducted similar experiments. They selected 313 image sequences from the Cohn-Kanade database. The sequences came from 90 subjects, with 1 to 6 emotions per subject. The facial images were converted into a Gabor magnitude representation using a bank of 40 Gabor filters. Then they performed 10-fold cross-validation experiments using SVM with linear, polynomial, and RBF kernels. Linear and RBF kernels performed best, with recognition rates of 84.8% and 86.9% respectively. As a powerful machine learning technique for data classification, SVM performs an implicit mapping of data into a higher (maybe infinite) dimensional feature space, and then finds a linear separating hyper-plane with the maximal margin to separate data in this higher dimensional space. SVM finds the hyper-plane that maximizes the distance between the support vectors and the hyper-plane. SVM allows domain-specific selection of the kernel function. Though new kernels are being proposed, the most frequently used kernel functions are the linear, polynomial, and Radial Basis Function (RBF) kernels.
  • 30. EMO PLAYER SEMINAR REPORT| 2016 MCET 30 nzmotp@gmail.com 4.4 Databases One of the most important aspects of developing any new recognition or detection system is the choice of the database that will be used for testing the new system. For the purpose of training, two facial expression databases are to be used. First one is the FG-NET Facial Expression and Emotion Database [4] which consists of MPEG video files with spontaneous emotions recorded. Database contains examples gathered from 18 subjects (9 female and 9 male). The system has to be trained with captured video frames in which the displayed emotion is very representative. The training set consists of images of seven states neutral and emotional (surprise, fear, disgust, sadness, happiness and anger). Second database is the Cohn-Kanade Facial Expression Database [5] and contains 486 image sequences displayed by 97 posers. The sequence displays the emotion from the start to the peak, however, only the last image of a sequence is used for training. The training and testing sets formed from Cohn-Kanade database contain images divided into seven classes: neutral, surprise, fear, disgust, sadness, happiness and anger. Having appropriate training sets, the procedure of classifier training can be performed. The database was planned and designed in a general and expandable way. It supports the current needs but also allows different mood models, including different types (both categorical and dimensional), user accounts, lists of artists, albums, genres, features and classification profiles, saving different classification and tracking values for the same song, based on different combinations of features and classifiers, for instance. 4.5 Music The last and the most important part of this system is the playing of music according to the user’s current emotional state. Once the face expression of the user has been classified, user’s corresponding emotion state is found. A musical database consisting of songs from a variety of domains and pertaining to a number of emotions is maintained .Every song in each emotion category of the database is assigned weights. Also, each emotion detected is assigned weights. The algorithm finds out a song closest in weight with the classified emotion. This song is then played.
  • 31. EMO PLAYER SEMINAR REPORT| 2016 MCET 31 nzmotp@gmail.com CHAPTER 5 Viola and Jones algorithm The Viola–Jones object detection framework is the first object detection framework to provide competitive object detection rates in real-time proposed in 2001 by Paul Viola and Michael Jones. Although it can be trained to detect a variety of object classes, it was motivated primarily by the problem of face detection. The problem to be solved is detection of faces in an image. A human can do this easily, but a computer needs precise instructions and constraints. To make the task more manageable, Viola–Jones requires full view frontal upright faces. Thus in order to be detected, the entire face must point towards the camera and should not be tilted to either side. While it seems these constraints could diminish the algorithm's utility somewhat, because the detection step is most often followed by a recognition step, in practice these limits on pose are quite acceptable. It can be trained to detect a variety of object classes; it was motivated primarily by the problem of face detection. Here a computer needs precise instructions and constraints for the detection to make the task more manageable, Viola–Jones requires full view frontal upright faces. Thus in order to be detected, the entire face must point towards the camera and should not be tilted to either side. Here we use the viola frontal cart property, the ‘frontal cart’ property only detects the frontal face that is upright and forward facing. Here this property with a merging threshold of 16 is employed to carry out this process. This value helps in coagulating the multiple boxes into a single bounding box. Advantages • Extremely fast feature computation and Efficient feature selection. • Scale and location invariant detector; Instead of scaling the image itself (e.g. pyramid-filters), we scale the features. • Such a generic detection scheme can be trained for detection of other types of objects (e.g. cars, hands)
  • 32. EMO PLAYER SEMINAR REPORT| 2016 MCET 32 nzmotp@gmail.com Disadvantages • Detector is most effective only on frontal images of faces • It can hardly cope with 45° face rotation both around the vertical and horizontal axis. • Sensitive to lighting conditions • We might get multiple detections of the same face, due to overlapping sub-windows. MATLAB code on frontal cart for object detection function [ ] = Viola_Jones_img ( Img ) %Viola_Jones_img( Img ) % Img - input image % Example how to call function: Viola_Jones_img( imread ('name_of_the_picture.jpg')) faceDetector = vision.CascadeObjectDetector; bboxes = step(faceDetector, Img); figure, imshow( Img ), title( 'Detected faces' );hold on for i = 1 : size (bboxes, 1 ) rectangle( 'Position' ,bboxes ( i ,:), ‘ LineWidth ' , 2 , ‘ EdgeColor ' , 'y' ); end end
  • 33. EMO PLAYER SEMINAR REPORT| 2016 MCET 33 nzmotp@gmail.com CHAPTER 6 Experimental results Testing and implementation is performed using either MATLAB R2013a or latest 2014 version of MATLAB on Windows7/8, 32 bit operating system and Intel i3 core processor. Facial expression extraction is done on both user independent and dependent dataset. A dataset consisting of facial image of 25 individuals was selected for user independent experiment and dataset of 10 individuals was selected for user dependent experimentation. An image of size 4000X3000 was used for static and dynamic dataset experiment. Implementation and experimentation in this paper is carried out using MATLAB R2013a on Windows8, 32 bit operating system and Intel’s i5 M460(2.53 GHz) processor . For facial emotion recognition, 2 experiments were carried out. i. Using user independent dataset. ii. Using user dependent dataset. 6.1 EMOTION EXTRACTION A user independent dataset of 30 images and user dependent dataset of 5 images is selected for extraction of emotions. Estimated time for various modules of Emotion Extraction Module is illustrated in Table 1. Table 1 Time Estimation of Various modules of Emotion Extraction Module
  • 34. EMO PLAYER SEMINAR REPORT| 2016 MCET 34 nzmotp@gmail.com A user independent dataset of 30 images and user dependent dataset of 5 images is selected for extraction of emotions. Estimated time for various modules of Emotion Extraction Module is illustrated in Table 2. 6.2 AUDIO FEATURE EXTRACTION A dataset of around 200 songs was considered for experimentation and testing of audio feature extraction module and the songs were collected from various Bollywood music sites like Djmaza.in, Songs.pk etc. Estimated accuracy for various categories of emotions is depicted in Table 2. It is given as follows Table 2 Estimated Accuracy for different categories of Audio Feature 6.3 EMOTION BASED MUSIC PLAYER Table 4 gives the average time taken by the proposed algorithm (i.e. algorithm to generate playlist, based on facial expressions) and its facial expression recognition and system integration modules. Table 3 Average Time Estimation of Various modules of Proposed System
  • 35. EMO PLAYER SEMINAR REPORT| 2016 MCET 35 nzmotp@gmail.com CHAPTER 7 CONCLUSION Experimental results have shown that the time required for audio feature extraction is negligible (around 0.0006 sec) and songs are stored pre-handed the total estimation time of the proposed system is proportional to the time required for extraction of facial features (around 0.9994 sec).Also the various classes of emotion yield a better accuracy rate as compared to previous existing systems. The computational time taken is 1.000sec which is very less thus helping in achieving a better real time performance and efficiency. The system thus aims at providing the Windows operating system users with a cheaper, additional hardware free and accurate emotion based music system. The Emotion Based Music System will be of great advantage to users looking for music based on their mood and emotional behavior. It will help reduce the searching time for music thereby reducing the unnecessary computational time and thereby increasing the overall accuracy and efficiency of the system. The system will not only reduce physical stress but will also act as a boon for the music therapy systems and may also assist the music therapist to therapize a patient. Also with its additional features mentioned above, it will be a complete system for music lovers and listeners. A wide variety of image processing techniques was developed to meet the facial expression recognition system requirements. Apart from theoretical background, this work provides ways to design and implement Emotion based music player. Proposed system will be able to process the video of facial behavior, recognize displayed actions in terms of basic emotions and then play music based on these emotions. Major strengths of the system are full automation as well as user and environment independence. Even though the system cannot handle occlusions and significant head rotations, the head shifts are allowed. In the future work, we would like to focus on improving the recognition rate of our system.
  • 36. EMO PLAYER SEMINAR REPORT| 2016 MCET 36 nzmotp@gmail.com CHAPTER 8 APLICATIONS Music therapists use music to enhance social or interpersonal, affective, cognitive, and behavioral functioning. Research indicates that music therapy is effective at reducing muscle tension and anxiety, and at promoting relaxation, verbalization, interpersonal relationships, and group cohesiveness. This can set the stage for open communication and provide a starting place for non-threatening support and processing symptoms associated with or exacerbated by trauma and disaster, such as the 9/11 event. A therapist can talk with a client, but a qualified music therapist can use music to actively link a client to their psycho- emotional state quickly. In certain settings, the active use of music therapy interventions has resulted in a shorter length of stay (treatment period) and more efficient response to the client’s overall intervention plan. The use of songs in music therapy with cancer patients and their families effectively provides important means for support and tools for change. Human contact and professional support can often assist in diminishing the suffering involved in cancer. There exists an inherent association between songs and human contact since lyrics represent melodic verbal communication. Thus, the use of songs in music therapy can have naturally meaningful applications. The proposed algorithm was successful in crafting a mechanism that can find its application in music therapy systems and can help a music therapist to therapize a patient, suffering from disorders like acute depression, stress or aggression. The system is prone to give unpredictable results in difficult light conditions, hence as part of the future work, removing such a drawback from the system is intuited.
  • 37. EMO PLAYER SEMINAR REPORT| 2016 MCET 37 nzmotp@gmail.com CHAPTER 9 FUTURE SCOPE In future we would like to develop a mood- enhancing music player in the future which starts with the user’s current emotion (which may be sad) and then plays music of positive emotions thereby eventually giving a joyful feeling to the user. Finally, we would like to improve the time efficiency of our system in order to make it appropriate to use in different applications.
  • 38. EMO PLAYER SEMINAR REPORT| 2016 MCET 38 nzmotp@gmail.com CHAPTER 10 REFERENCES [1]. Chang, C. Hu, R. Feris, and M. Turk, ―Manifold based analysis of facial expression,‖ Image Vision Comput ,IEEE Trans. Pattern Anal. Mach. Intell. vol. 24, pp. 05–614, June 2006. [2]. A. habibzad, ninavin, Mir kamalMirnia,‖ A new algorithm to classify face emotions through eye and lip feature by using particle swarm optimization.‖ [3]. Byeong-jun Han, Seungmin Rho, Roger B. Dannenberg and Eenjun Hwang, ―SMERS: music emotion recognition using support vector regression‖, 10thISMIR , 2009. [4]. Alvin I. Goldmana, b.Chandra and SekharSripadab, ―Simulationist models of face- based emotion recognition‖. [5]. Carlos A. Cervantes and Kai-Tai Song , ―Embedded Design of an Emotion-Aware Music Player‖, IEEE International Conference on Systems, Man, and Cybernetics, pp 2528- 2533 ,2013. [6]. Fatma Guney, ‖Emotion Recognition using Face Images‖, Bogazici University, Istanbul, Turkey 34342. [7]. Jia-Jun Wong, Siu-Yeung Cho,‖ Facial emotion recognition by adaptive processing of tree structures‖. [8]. K.Hevener,‖The affective Character of the major and minor modes in music‖, The American Journal of Psychology,Vol 47(1) pp 103-118,1935. [9]. Samuel Strupp, Norbert Schmitz, and KarstenBerns, ―Visual-Based Emotion Detection for Natural Man-Machine Interaction‖.
  • 39. EMO PLAYER SEMINAR REPORT| 2016 MCET 39 nzmotp@gmail.com [10]. Michael lyon and Shigeru Akamatsu, ―Coding Facial expression with Gabor wavelets.‖, IEEE conf. on Automatic face and gesture recognition, March 2000 [11]. Kuan-Chieh Huang, Yau-Hwang Kuo, Mong-Fong Horng ,‖Emotion Recognition by a novel triangular facial feature extraction method‖. [12]. S. Dornbush, K. Fisher, K. McKay, A. Prikhodko and Z. Segall ‖Xpod- A Human Activity and Emotion Aware Mobile Music Player‖, UMBC Ebiquity, November 2005. [13]. Renuka R. Londhe, Dr. Vrushshen P. Pawar, ―Analysis of Facial Expression and Recognition Based On Statistical Approach‖, International Journal of Soft Computing and Engineering (IJSCE) Volume-2, May 2012. [14]. Russell, ―A circumplex model of affect‖, Journal of Personality and Social Psychology Vol-39(6), pp1161–1178 , 1980. [15]. Sanghoon Jun, Seungmin Rho, Byeong-jun Han and Eenjun Hwang, ―A fuzzy inference-based music emotion recognition system‖,VIE,2008. [16]. Thayer ‖ The biopsychology of mood & arousal‖, Oxford University Press ,1989. [17]. Z. Zeng ―A survey of affect recognition methods: Audio, visual, and spontaneous expressions,‖IEEE. Transaction Pattern Analysis, vol 31, January 2009