5. “I don’t understand woman!! Their
words are very vague and ambiguous”
From Carlos Argueta, my first foreign Ph.D. graduate
He’s the one to select the topic of sentiment analysis.
And the first suffering from depression in our lab
5
19. Subconscious Crowdsourcing
▷群眾的潛意識智慧
• 免費
• 從人們的日常紀錄中,擷取共同潛意識
→Ex1: “computers/companies/product-
support/apple” in delicious tag
→Ex2: “鹿茸 馬”, “馬卡茸”, “水母” in search log
→Ex3: “School day again #sad” in Twitter
Chun-Hao Chang, Elvis Saravia and Yi-Shin Chen, Subconscious Crowdsourcing: A Feasible Data Collection
Mechanism for Mental Disorder Detection on Social Media, The 2016 IEEE/ACM International Conference on
Advances in Social Networks Analysis and Mining (ASONAM 2016), San Francisco, CA, USA, 18 - 21 August, 2016
19
21. 潛意識情緒大資料
▷Twitter, 目前最容易大量下載的資料
Throwing my phone always calms me down #anger
My sister always makes things look much more worse than they seem >:[ #anger
Why my brother always crabby !?!? #rude #youranadult #anger #issues
WHY DOES MY COMPUTER ALWAYS FREEZE??? NEVER FAILS. #anger
Im wanna crazy,if my life always sucks like this. #anger
Hashtag和表情符號最能標註情緒,所以可以當成人工標記的答案
21
28. 資料蒐集後的前處理
▷重點:拿掉麻煩的、不會處理的
o Too short
→ 短到拿不到特徵
o Contain too many hashtags
→ 資訊太多很難處理
o Are retweets
→ 會增加計算複雜度
o Have URLs
→ 還要再抓一次資料,這樣太累了
o Convert user mentions to <usermention> and
hashtags to <hashtag>
→ 消去識別碼, 不能偷看答案
反正是
大數據
28
30. Graph Construction
▷建立兩種圖(情緒圖 & 非情緒圖)
• E.g.
→情緒文字:I love the World of Warcraft new game
→ 非情緒文字: 3,000 killed in the world by ebola
I
of
Warcraft
new
game
WorldLove
the
0.9
0.84
0.65
0.12
0.12
0.53
0.67
0.45
3,000
world
by
ebola
the
killed in
0.49
0.87
0.93
0.83
0.55
0.25 30