Spotle AI-thon - The AI Global Challenge had 7000+ participants from best campuses in India, Singapore worked on addressing the mental health challenge with AI. Top 10 teams from IIT Roorkee, CMI, NIT, IIM Indore, Charotar University, DIAT made it to the final round. This is a showcase Top 10 presentation from Team La Casa De Papel (Saurabh Agarwal and Dhruv Grover), IIT Roorkee
2. TEAM LA CASA DE PAPEL
Saurabh Agarwal Dhruv Grover
Undergraduate student
at IIT Roorkee.
Undergraduate student
at IIT Roorkee.
3. About Us :
We are engineering undergrads
who wish to bring about a
change in the world for good
through some of the thousand
crazy ideas cultivating in our
brains!
4. Problem
Statement
This covid pandemic has affected the mental
health of people in several ways. Here we are
given the task to analyse mental health mental
health of people through their tweets.
5. 01 Overview A clear workflow of
problem.
Data cleaning
Includes cleaning data before
analysing it.02
Pre-trained
model
This includes detailed analysis and
hypothesis of data both before and
after labelling.04
A glimpse of
Unsupervised
Implementation
A confirmation that the
data is present in clusters
without outliers, thus
validating cleaning.05
03
Exploratory Data
Analysis
This model is used to predict
labels on cleaned data.
6. Overview01
Input Data Cleaned data
Unsupervised
clustering
EDA on data
Prediction using
pretrained-model
Analysis of
data with labels
7. Data Is The Key
In making the world a global village!
Causing exponential growth in all fields!Realizing the dream of the AGE OF AI !
Bringing in communities together!
So it needs to be cleaned
8. Data Cleaning
02
This is one of the important step
in twitter sentiment analysis.
Following slide gives its clear
understanding.
9. Removing next line
character.
Removing white
spaces. For e.g.
convert “and the”
to “and the”.
Removing
hashtags, links etc.
Removing
non-informative
sentences.
Replacing encoding
with its decoding.
Storing hashtags in
a python list to
analyze the data.
Data Cleaning goes as
shown in the diagram
10. Pre-trained model used for EDA
labelling
03
Input Data Embeddings
Dense
Biderectional
LSTM
ReLU
Dropout =0.5
DenseSoftmaxOutput
12. Hashtag analysis
Top 10 most used hashtags in the dataset.
From the graph, It looks like,
the most concerning situation is
covid19 right now.
Hypothesis
A reasonable hypothesis can’t be
formed from this as there are
possibilities of both positive and
negative tweets regarding the covid19
situation. But one thing can be said for
sure that covid19 is quite concerning
situation right now.
13. Hypothesis
Date wise analysis
The number of tweets posted per day
is increasing, and don’t confuse with
the small bars on 13th and 22nd
September because we are not given
the tweets of the whole 24 hours of
these days.
Since people's activity on social media is
increasing day by day. So people’s
physical interaction with other people is
decreasing which is causing an adverse
effect on people’s mental health.
14. Hour wise analysis
This analysis picks up the schedule of
people as the main point. After all, we
all like making scheduled planning. As it
can be seen from here most of the
tweets are posting during working
hours.
Since people's activity on social media
is highest during working hours. Thus
people’s efficiency at work is decreasing
which is causing an adverse effect on
people’s mental health.
Hypothesis
15. S.No. Emotion Total number of tweets has
that emotions
1. Joy 209905
2. Sadness 122456
3. Anger 61184
4. Fear 56345
5. Love 30495
6. Surprise 13607
Predictions on the
tweets from whole
world
Considering ‘joy’ and ‘love’ as ‘happy’, about
48.66% of the population is happy. Similarly,
considering ‘sadness’, ‘anger’ and ‘fear’ as
‘unhappy’, about 48.5% of the population is
unhappy.
Hypothesis
Positive sentiments are a little more likely
in people than negative sentiments. So,
the world’s mental health is still not in a
good state as it was looking.
16. Predictions on
the tweets from
India only
S.No
.
Emotion Total number of tweets has
that emotions
1. Joy 56573
2. Sadness 36632
3. Anger 18774
4. Fear 12791
5. Love 7780
6. Surprise 3106
Considering ‘joy’ and ‘love’ as ‘happy’,
about 47.43% of the population is happy.
Similarly, considering ‘sadness’, ‘anger’
and ‘fear’ as ‘unhappy’, about 50.27% of
the population is unhappy.
India’s mental health is also not as that much
worse with respect to a pandemic, but India’s
mental health is slightly worse than the
world’s mental health.
Hypothesis
17. A glimpse of
Unsupervised
Learning
05
This includes clustering of data
using k-means clustering to
keep check on data
preprocessing, whether it is
generating or not generating
outliers.
18. K-means clustering
Application
● Each tweet was encoded into a 300
dimensional vector.
● Data was divided into 3 clusters
Isomap
All the datapoints were converted to 2
dimensions using isomap. Same are
represented in the figure.
Intuition
As the there is no outlier cluster formed
we can validate that the preprocessing.
Here two big cluster may specify positive
and negative emotions with third the
derivative of both.
19. Around 48% of the world’s population is happy, and around 48% of the world’s
population is unhappy.
Now in India, around 47% of India's population is happy, and around 50% of India’s
population is unhappy. So, India has a slightly more percentage of the unhappy
population than the world.
And we can’t say that the cause of unhappiness for all the unhappy people is the corona
since there are other reasons also.
Therefore we can conclude that both India’s and world’s mental health is not as bad due
to the covd19. But India’s mental health is slightly worse than the world’s mental health.
Conclusion
20. THANK YOU !
A Presentation By -
Team La Casa De Papel
Sourabh Agarwal(Email : naman_m@ch.iitr.ac.in)######### correct
Dhruv Grover (Email : dhruv_g@ch.iitr.ac.in)