2. Hello!
I am Yi-Shin Chen
Currently in NTHU CS
Intelligent Data Engineering and
Application Lab (IDEA Lab)
You can find me at:
yishin@gmail.com
2
3. We Promote Diversity at
More than 50 % students come from other countries
Belize
France
St Lucia
Honduras
India China
Japan
Taiwan
Indonesia
São Tomé
3
5. “I don’t understand woman!! Their
words are very vague and ambiguous”
From Carlos Argueta, my first foreign Ph.D. graduate
He’s the one to select the topic of sentiment analysis.
And the first suffering from depression in our lab
5
8. Natural Language Processing
▷Analyze Part-of-Speech (POS) tagging
▷Understand word meaning
▷Analyze the relationships between words
Need dictionaries & semantic relationships
Word positions affect statement meanings
Need different data for different languages
This is the best thing happened in my life.
Det. Det. NN PNPre.Verb VerbAdj Difficult
8
9. Data Mining/Machine Learning
▷Collect massive data
▷Manually annotate training data
▷Analyze data with classifiers
Recollect training data for different
languages
Low recall rates (<<25%) Easier?
9
11. Emotion Embedded in Trivia
▷Most trivia are ignored in previous works
• Stop Words are the first batch to be removed
→E.g., often, above, again
• Determiner, pronoun are usually ignored
• Most nouns are considered unimportant
My mom always said school is more important
😒 Angry 😂 Sad 👶 Joy
11
12. Emotional Mistakes
▷Mistakes everywhere
• Some are careless
→E.g., Luve you
• Some are intentional
→E.g., I’m soooooooo happppppy
▷Mistakes are not recorded in dictionaries
• How to annotate mistakes?
→ Annotation cost A LOT!
12
13. Children
are our
mentors
Mumbling from a mom
▷My one-year-old kid can detect my emotion
• Without seeing my face
• I did not change my tone
• How come she is always right?
▷Guessing
• She did not know grammar
• She did not memorize any dictionary
• My statements might have a lot of mistakes
Goal
Multi-lingual
13
16. Philosophy Slow Life
▷Our students are often delayed by various reasons
▷Not follow the trends
• Usually against common sense in academic
No POS Tagging
No dictionary
Multilingual
😱
Failure Success
16
POS Tagging
Multiple dictionaries
One language
17. Teamwork
▷Implementation team
• Coding
• More coding
▷Dreaming team
• Reading papers
• Design
▷Boasting team
• Writing papers
• Generating presentation
▷Anonymous
17
19. Subconscious Crowdsourcing
▷Crowdsourcing in subconscious
• Free
• Extract the subconscious from daily-life records
→ Ex1: “computers/companies/product-support/apple” in
delicious tag
→ Ex2: “Trump” “Nickname generator” in search log
→ Ex3: “School day again #sad” in Twitter
Chun-Hao Chang, Elvis Saravia and Yi-Shin Chen, Subconscious Crowdsourcing: A Feasible Data Collection
Mechanism for Mental Disorder Detection on Social Media, The 2016 IEEE/ACM International Conference on
Advances in Social Networks Analysis and Mining (ASONAM 2016), San Francisco, CA, USA, 18 - 21 August,
2016 19
21. Subconscious Emotion Big Data
▷Twitter, a good public source
Throwing my phone always calms me down #anger
My sister always makes things look much more worse than they seem >:[ #anger
Why my brother always crabby !?!? #rude #youranadult #anger #issues
WHY DOES MY COMPUTER ALWAYS FREEZE??? NEVER FAILS. #anger
Im wanna crazy,if my life always sucks like this. #anger
Hashtag and emoticon can represent emotion well;
hence can be treated as annotated answers
21
28. Preprocessing Steps
▷Hints: Remove troublesome ones
o Too short
→ Too short to get important features
o Contain too many hashtags
→ Too much information to process
o Are retweets
→ Increase the complexity
o Have URLs
→ Too trouble to collect the page data
o Convert user mentions to <usermention> and hashtags to
<hashtag>
→ Remove the identification. We should not peek answers!
Big
Data
anyway
28
29. Basic Guidelines
▷ Identify the common and differences between
the experimental and control groups
• Analyze the frequency of words
→ TF•IDF (Term frequency, inverse document frequency)
• Analyze the co-occurrence between words/patterns
→ Co-occurrence
• Analyze the importance between words
→ Centrality
Graph
29
30. Graph Construction
▷Construct two graphs
• E.g.
→Emotion one: I love the World of Warcraft new game
→ Not-emotion one: 3,000 killed in the world by ebola
I
of
Warcraft
new
game
WorldLove
the
0.9
0.84
0.65
0.12
0.12
0.53
0.67
0.45
3,000
world
by
ebola
the
killed in
0.49
0.87
0.93
0.83
0.55
0.25 30
31. Graph Processes
▷Remove the common ones between two graphs
• Leave the significant ones only appear in the
emotion graph
▷Analyze the centrality of words
• Betweenness, Closeness, Eigenvector, Degree, Katz
→ Can use the free/open software, e.g, Gaphi, GraphDB
▷Analyze the cluster degrees
• Clustering Coefficient
GraphKey
patterns
31
33. Ranking Emotion Patterns
▷ Ranking the emotion patterns for each emotion
• Frequency, exclusiveness, diversity
• One ranked list for each emotion
SadJoy Anger
33
34. Emotion Pattern Samples
SadJoy Anger
finally * my
tomorrow !!! *
<hashtag> birthday .+
* yay !
:) * !
princess *
* hehe
prom dress *
memories *
* without my
sucks * <hashtag>
* tonight :(
* anymore ..
felt so *
. :( *
* :((
my * always
shut the *
teachers *
people say *
-.- *
understand why *
why are *
with these *
34
47. Basic Guidelines
▷ Identify the common and differences between
the experimental and control groups
• Word/pattern frequency
• Emotion related data (e.g., flipping rates, occurrence rates)
• Social interaction (e.g., retweet, reply)
• Lifestyle (e.g., online time, stay-up or not)
• Age and gender
Features
47
48. Apply Classifiers
▷ By utilize the extracted features
▷ Various classifiers
• Neural Networks
• Naïve Bayes and Bayesian Belief Networks
• Support Vector Machines
• Random forest
48