Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Boston Dataswap Topic Modeling by Alice Oh
1. Topic Models & Computational Social Science
October 17, 2013
Alice Oh
alice.oh@kaist.edu
aoh@seas.harvard.edu
http://uilab.kaist.ac.kr/members/aliceoh/
Thursday, October 17, 2013
5. Motivation
• What are the topics discussed in the article?
• Is the article related to
• household finances?
• price of gasoline?
• price of Apple stock?
• How would you build an automatic system for answering these questions?
Thursday, October 17, 2013
17. Graphical View
Discovered
Topic Distributions
Observed
Discovered
nascar, races, track, raceway, race, cars, fuel, auto, racing
economic, slowdown, sales, recession, costs, spending, save
fans, spectators, sports, leagues, teams, competition
Topics: multinomial over words
Thursday, October 17, 2013
Topics
sales xxx slowdown
recession cars races
spending xxx save
costs fuel
10
18. Do you feel what I feel?
Social Aspects of Emotions in Twitter Conversations
Suin Kim, JinYeong Bak, Alice Oh
ICWSM 2012
11
Thursday, October 17, 2013
19. Twitter conversation data
• Twitter conversation data: approx 220k dyads who “reply” to each other,
1,670k conversational chains (We now have about 5x this amount)
!
"!
$!
#!
%!
Thursday, October 17, 2013
22. Asking Research Questions
Human emotion is typically studied as a within-person, one-direction,
non-repetitive phenomenon; focus has traditionally been on how one
individual feels in reaction to various stimuli at a certain point of
time. But people recognize and inevitably react emotionally and
otherwise to expressions of emotion of other people. We propose
that organizational dyads and groups inhabit emotion cycles:
Emotions of an individual influence the emotions, thoughts and
behaviors of others; others’ reactions can then influence their
future interactions with the individual expressing the original
emotion, as well as that individual’s future emotions and
behaviors. People can mimic the emotions of others, thereby
extending the social presence of a specific emotion, but can also
respond to others’ emotions, extending the range of emotions
present.
14
Thursday, October 17, 2013
23. Topic model with a twist
•
Dirichlet forest prior (Andrzejewski et al.)
•
Mixture of Dirichlet tree distribution
•
•
Dirichlet tree: Generalization of Dirichlet distribution
Knowledge is expressed using Must-link and Cannot-link
primitives
•
Must-link(love, sweetheart)
•
Cannot-link(exciting, bored)
15
Thursday, October 17, 2013
DF-LDA
24. Topic model with a twist
•
Dirichlet forest prior (Andrzejewski et al.)
•
Mixture of Dirichlet tree distribution
•
•
Dirichlet tree: Generalization of Dirichlet distribution
Knowledge is expressed using Must-link and Cannot-link
primitives
•
Must-link(love, sweetheart)
•
Cannot-link(exciting, bored)
β
q
η
15
Thursday, October 17, 2013
DF-LDA
25. Domain knowledge in Dirichlet forest prior
Seed Words
joy
awesom
amaz
wonder
excit
glad
fine
beauti
high
lucki
super
perfect
complet
special
bless
safe
proud
sadness anticipation surprise acceptance disgust
sorri
bad
aw
sad
wrong
hurt
blue
dead
lost
crush
weak
depress
wors
low
terribl
lone
hope
wait
await
inspir
excit
bore
readi
expect
nervou
calm
motiv
prepar
certain
anxiou
optimist
forese
amaz
wow
wonder
weird
lucki
differ
awkward
confus
holi
strang
shock
odd
embarrass
overwhelm
astound
astonish
okai
ok
same
alright
safe
lazi
relax
peac
content
normal
secur
complet
numb
fulfil
comfort
defeat
Must-link within a class
fear
shit
bitch
ass
mean
damn
mad
jealou
piss
annoi
angri
upset
moron
rage
screw
stuck
irrit
scare
stress
horror
nervou
terror
alarm
behind
panic
fear
afraid
desper
threaten
tens
terrifi
fright
anxiou
Cannot-link between classes
16
Thursday, October 17, 2013
sick
wrong
evil
fat
ugli
horribl
gross
terribl
selfish
miser
pathet
disgust
worthless
aw
asham
fuck
anger
26. Anticipation
Topic 125
hope
better
feel
thank
soon
Topic 26
good
thank
hope
miss
29
Topic 146
come
wait
week
day
june
Topic 146
good
day
time
work
Sadness
Topic 6
oh
sorry
haha
know
didnt
Topic 59
hurt
got
good
bad
Joy
Topic 114
omg
love
haha
thank
really
Topic 107
love
thank
follow
wow
17
Topic 106
tweet
reply
didn’t
read
sorry
Topic 155
oh
really
make
feel
70
Topic 159
good
day
hope
morning
thank
Topic 158
love
thank
miss
hug
Anger
Topic 131
lmao
fuck
ass
bitch
shit
Topic 4
ass
yo
lmao
nigga
Disgust
Topic 116
oh
fuck
don’t
ye
ew
Topic 116
look
haha
oh
know
7
Topic 22
don’t
oh
think
yeah
lmao
Topic 174
don’t
think
say
people
21
Topic 19
lmao
shit
damn
fuck
oh
Topic 13
shit
nigga
smh
yea
Surprise
Topic 172
yeag
know
think
true
funny
Topic 89
know
don’t
think
look
Acceptance
Topic 43
ok
oh
thank
cool
okay
Topic 102
know
try
let
ok
Emotion Topics
Topic 199
xx
thank
good
okay
follow
Topic 8
night
love
good
sleep
14
Topic 15
think
don’t
know
make
really
Topic 94
haha
dont
think
really
18
Fear
Topic 48
omg
oh
lmao
shit
scare
Topic 78
happen
heart
attack
hospital
5
Topic 27
don’t
come
night
sleep
outside
Topic 140
time
got
work
day
Neutral
Topic 180
com
www
http
check
youtube
Topic 156
twitter
facebook
people
account
19
Topic 184
account
google
app
work
email
Topic 67
food
chicken
cook
rt
How do we express emotions?
17
Thursday, October 17, 2013
28. A (Love): @amithpr @dhempe @OperaIndia - Would you have any update on
@mrunmaiy's health - hope she is recovering well?
B (neut): @labnol @dhempe she is recovering but slow. The injury is on the spine
therefore worrisome. Still in icu.
A (Sadness): @amithpr thanks for the update.. extremely said to hear that news..
B (neut): @labnol #prayformrun She is a fighter and will come out of this
B (neut): @AyeItsMeiMei just tell ur followers to report her for spam. then she'll be
kicked off twitter
A (Anger): @Jakeosaurous dude I didn't even do shit to her I'm just here tweeting &
she calls me a ugly bitch? I was like oh wow thanks?
B (neut): @AyeItsMeiMei yeah clearly shes so ugly she cant even use her real pic:P
so dont feel bad
A (Love): @Jakeosaurous haha. I don't care. She's getting spammed with hate.
Hahaha. (": thanks though.
B (neut): @AyeItsMeiMei np
Emotion-tagged
conversations
Thursday, October 17, 2013
19
30. Defining “Influence”
User A
User B
Having a tough day
Not really religious,
today. RIP Harrison. I’ll
but thanks man. :)
miss you a ton :/
(Acceptance)
(Sadness)
Just pray about it.
God will help you.
(Anticipation)
Time
If you need talk
you know I’m here.
21
Thursday, October 17, 2013
31. Defining “Influence”
User A
User B
Having a tough day
Not really religious,
today. RIP Harrison. I’ll
but thanks man. :)
miss you a ton :/
(Acceptance)
(Sadness)
Just pray about it.
God will help you.
(Anticipation)
Time
If you need talk
you know I’m here.
emotion influencing tweet
21
Thursday, October 17, 2013
32. Disgust → Joy
Sadness → Joy
Acceptance → Anger
Topic 61
watch
new
live
tv
tonight
Topic 63
watch
good
think
know
look
Topic 18
wear
look
think
love
black
Topic 24
love
thank
great
new
look
Topic 31
i’m
got
lmax
shit
da
Topic 13
lmao
shit
nigga
smh
yea
Suggesting
Greeting
Sympathy
Swear words
Emotion Influences
Joy → Sadness
Topic 117
tweet
people
don’t
read
post
Topic 59
hurt
got
bad
pain
feel
Anticipation → Surprise
Topic 96
music
listen
play
song
good
Topic 178
follow
tweet
people
twitter
thank
Complaining
What can you say to make your
partner feel better?
22
Thursday, October 17, 2013
33. Self-disclosure and relationship strength in online
conversations
JinYeong Bak, Suin Kim, and Alice Oh
ACL 2012
23
Thursday, October 17, 2013
34. Methodology
}
Twitter Data
}
}
}
Relationship Strength
}
}
}
Chain frequency (CF)
Chain length (CL)
Self-Disclosure
}
}
}
}
131K users
2M conversations
Personal information
Open communication
Profanity
Analysis with Topic Models
}
}
Latent Dirichlet allocation (LDA, [Blei, JMLR 2003])
Aspect and sentiment unification model (ASUM, [Jo, WSDM 2011])
24
Thursday, October 17, 2013
2012-07-11
35. Relationship Strength
} Social
psychology literature states relationship strength can be
measured by communication frequency and length [Granovetter, 1973;
Levin and Cross, 2004]
} CF: chain frequency
}
The number of conversational chains between the dyad
averaged per month
} CL: chain
}
length
The length of conversational chains between the dyad
averaged per month
} Relationship
strength
A high CF or CL for a dyad means the relationship is strong
} A low CF or CL for a dyad means the relationship is weak
}
25
Thursday, October 17, 2013
2012-07-11
36. Self-Disclosure
}
Open communication - Openness
}
}
}
}
}
}
Personal Information
}
}
}
Negative openness
Nonverbal openness
Emotional openness
Receptive openness – difficult to find in tweets
General-style openness – not clearly defined in the literature
Personally Identifiable Information (PII)
Personally Embarrassing Information (PEI)
Profanity
}
nigga, ass, wtf, lmao
26
Thursday, October 17, 2013
2012-07-11
37. Self-Disclosure - Openness
Negative openness
}
Method
We use ASUM with emoticons as seed words
[ “Aspect and sentiment unification model for online review analysis”, Jo, WSDM’11]
} ASUM is LDA-based joint model of topic and sentiment
} ASUM takes unannotated data and classifies each sentence (tweet) as
positive/negative/neutral
}
27
Thursday, October 17, 2013
2012-07-11
38. Self-Disclosure - Openness
Nonverbal openness
}
Method
We look for emoticons, ‘lol’, ‘xxx’
} Emoticons are like facial expressions -- :)
:( :P
} ‘lol’ (laughing out loud) and ‘xxx’ (kisses) are very frequently used in a
similar manner to nonverbal openness
}
28
Thursday, October 17, 2013
2012-07-11
39. Self-Disclosure - Openness
Emotional openness
}
Method
}
Look for tweets that contain common expressions of feeling words
[We feel fine (Harris, J, 2009)]
29
Thursday, October 17, 2013
2012-07-11
40. Self-Disclosure – Personal Information
Personally Identifiable Information (PII)
Ex) name, location,
email address, job,
social security number
Personally Embarrassing Information (PEI)
Ex) clinical history,
sexual life,
job loss,
family problem
30
Thursday, October 17, 2013
2012-07-11
42. Self-Disclosure – Personal Information
Example of PII, PEI and Profanity topics
}
Shown by high probability words in each topic
PII 1
PII 2
PEI 1
PEI 2
PEI 3
Profanity
san
tonight
pants
teeth
family
nigga
live
time
wear
doctor
brother
lmao
state
tomorrow
boobs
dr
sister
shit
texas
good
naked
dentist
uncle
ass
south
ill
wearing
tooth
cousin
bitch
32
Thursday, October 17, 2013
2012-07-11
51. Agenda Setting Theory (McCombs & Shaw, 1972)
• Media affects audiences by having an influence on
• What to think about
• How to think about it
• Examples of traditional media studies
• Media affects the outcome of presidential elections (Perloff and Krauss, 1985)
• Media coverage influences the control of infectious diseases (Cui et al., 2008)
• Tone of news articles affects the number of visitors to museums (Zyglidopoulos et
al., 2012)
Thursday, October 17, 2013
52. Limitation of Traditional Media Studies
1.Use of traditional off-line newspapers and TV as target media
• Analysis is limited to a small volume over a short duration
• Issues are arbitrarily chosen
2.Use of off-line MIP (Most Important Problems) surveys
• Self-reports are not reliable
• Only a small subset of the population can be surveyed
3.Use of manual coding for content analysis
• You need experts
• It is difficult to replicate and generalize to other domains
Thursday, October 17, 2013
53. Computational Analysis of Agenda Setting Theory
1.Use of traditional off-line newspapers and TV as target media
• Crawl online news to get several years’ data
• Use machine learning to automatically discover the important issues
2.Use of off-line MIP (Most Important Problems) surveys
• Look at counts of social media shares
• Look at counts of user comments
3.Use of manual coding for content analysis
• Use unsupervised machine learning to analyze content for tone (polarity) of articles
and comments
• Try it for different issues to see whether ML approach can generalize over many
domains
Thursday, October 17, 2013
56. DATA STATISTICS
2011.01 – 2013.04
Section
#Articles
#Comments
#Commenters
#Shares
Politics
1,863
174,680
14,106
2,080,889
Business
2,043
130,921
17,791
3,657,544
Opinion
4,820
149,618
30,556
6,620,489
Sports
814
17,282
5,484
712,507
Technology
456
13,571
4,993
570,732
Science
945
50,113
11,114
4,709,041
World
3,673
134,572
14,882
3,534,637
Health
3,060
92,964
18,185
6,001,082
17,674
763,721
117,111
27,886,921
Total
From http://www.npr.org/
45
Thursday, October 17, 2013
57. Issue Detection using HDP
Section
Issue (Labeled by using Mturk)
#Articles
Politics
presidential election
infringement of human rights
race for Washington
government economics
presidential campaigns and money
candidate-marriage & immigration
political viewpoints
575
195
167
274
163
261
157
Business
economic decline under Obama
employment and paid slavery
agriculture
banks and loan
stock market and business
housing market
tax and business
energy and finance
new business and running
514
218
131
198
166
170
180
222
138
Health
health care reform laws
vaccination
HIV and treatment
medication
healthcare and costs
food and obesity
sleep study and children
food and safety
health tech and new treatment
mental health in families
349
189
496
197
224
245
210
223
125
117
Detected Issue list and the number of articles of each issue for three sections out of eight
sections.
46
Thursday, October 17, 2013
58. ▶ Effects from media exposure
CORRELATION IN ISSUE
47
Thursday, October 17, 2013
61. Content Polarity & Audience Behavior
INFLUENTIAL FACTOR
Tone (Polarity) of article
GOAL
Identify the effects of article tone, positive and negative, on the commenting and
sharing behaviors of the audience
50
Thursday, October 17, 2013
65. For more information
David
Blei’s
homepage:
h2p://www.cs.princeton.edu/~blei/
David
Mimno’s
bibliography:
h2p://www.cs.princeton.edu/~mimno/topics.html
videolectures.net
–
David
Blei,
Yee-‐Whye
Teh,
Michael
Jordan
Conferences:
NIPS,
ICML,
UAI,
ECML,
KDD,
EMNLP
Tools:
Mallet,
GenSym,
various
LDA
libraries
Email
me:
alice.oh@kaist.edu
Thursday, October 17, 2013