THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
Self-disclosure in twitter conversations - talk in QCRI
1. Self-disclosure in Twitter conversations
JinYeong Bakjy.bak@kaist.ac.krDepartment of Computer Science, KAIST
2. About Me
2 2014-10-23
JinYeong Bak
Ph.D. student at KAIST, U&I Lab
Research interests
Bayesian Data Analysis
Computational Social Science
3. About Me
2 2014-10-23
JinYeong Bak
Ph.D. student at KAIST, U&I Lab
Research interests
Bayesian Data Analysis
Computational Social Science
Research Intern, MSRA, 2013, Supervisor: Chin-Yew Lin
Related publications
Self-Disclosure and Relationship Strength in Twitter Conversations, ACL 2012 (with Suin Kim, Alice Oh)
Self-disclosure topic model for classifying and analyzing Twitter conversations, EMNLP 2014 (with Chin-Yew Lin, Alice Oh)
10. Limitations in Previous Works
5 2014-10-23
Survey
Hand coding
Lab environment
Hard to identify
self-disclosure
in naturally occurring and
large dataset
16.
The verbal expressions by which a person reveals aspects of self to others [Jourard1971b]
Process of making the self known to others [Jourard&Lasakow1958]
3~40% of everyday conversation is consist of self-disclosure [Dunbar et al.1997]
Self-disclosure: Definition
10 2014-10-23
17. Self-disclosure: Level
11 2014-10-23
Self-disclosure level [Vondracek et al.1971, Barak et al.2007]
No disclosure (G level)
General information and ideas
Medium disclosure (M level)
General information about self or someone close to him
High disclosure (H level)
Sensitive information about self or someone close to him
18. Self-disclosure: G level
General information and ideas
No information about self or someone close to him
12 2014-10-23
19. Self-disclosure: M level
General information about self or someone close to him
Personal events, age, occupation and family members
13 2014-10-23
20. Self-disclosure: H level
Sensitive information about self or someone close to him
Problematic behaviors of self and family members
Physical appearance, health, death, sexual topics
14 2014-10-23
21. Self-disclosure: Relations
15 2014-10-23
Human relationship
Degree of self-disclosure in a relationship depends on the
strength of the relationship [Duck2007]
Strategic self-disclosure can strengthen the relationship
22. Self-disclosure: Relations
16 2014-10-23
Benefits
Can get social support from others [Derlega et al.1993]
Can cope with stress [Derlega et al.1993,Tamir and Mitchell2012]
Examples
23. Self-disclosure: Relations
17 2014-10-23
Consideration
Easy to be attacked when private information is opened
Need to manage privacy boundary (e.g. people, topics) [Petronio2002]
Example
24. Limitations in Previous Works
18 2014-10-23
Survey
Asking questions to participants
Cons) Biased by participants memory
25. Limitations in Previous Works
18 2014-10-23
Survey
Asking questions to participants
Cons) Biased by participants memory
Hand coding
Analyzing dataset by human
Cons) Cannot apply to large dataset
26. Limitations in Previous Works
18 2014-10-23
Survey
Asking questions to participants
Cons) Biased by participants memory
Hand coding
Analyzing dataset by human
Cons) Cannot apply to large dataset
Lab environment
Experiments held in lab or artificial environment
Cons) Not real/naturally occurring dataset
27. Research Questions
19 2014-10-23
How can we find self-disclosure in large & naturally
occurring corpus automatically?
28. Research Questions
19 2014-10-23
How can we find self-disclosure in large & naturally
occurring corpus automatically?
What are relations between self-disclosure and social
features in large & naturally occurring corpus?
30. Twitter
21
Online social networking service
www.twitter.com
200 million users send over 400 million tweets daily
(2013.09)
2014-10-23
https://twitter.com/NoSyu
32. Conversation in Twitter
23 2014-10-23
Users have a conversation in Twitter
https://twitter.com/britneyspears
33. Conversation Topics
24 2014-10-23
Users discuss several topics with others
Soccer
Politics
34. Conversation Topics
25 2014-10-23
Users discuss several topics with others
Places
Family
35. Twitter Conversations
A Twitter conversation
5 or more tweets
At least one reply by each user
https://twitter.com/britneyspears
Example ofa Twitter conversation
26 2014-10-23
36. Twitter Conversations
A Twitter conversation
5 or more tweets
At least one reply by each user
Twitter conversation data
Aug 2007 to Jul 2013
102K users
2M conversations
17M tweets
https://twitter.com/britneyspears
Example ofa Twitter conversation
26 2014-10-23
38. Self-disclosure: Relations
28 2014-10-23
Human relationship
Degree of self-disclosure in a relationship depends on the
strength of the relationship [Duck2007]
Strategic self-disclosure can strengthen the relationship
39. Research Question
29 2014-10-23
Does Twitter conversations also show a similar pattern?
Dyads with high relationship strength show more self-disclosure
behavior
Dyads with low relationship strength show less self-disclosure
behavior
42. Methodology
Twitter data
131K users
2M conversations
Relationship strength
Conversation frequency (CF)
Conversation length (CL)
Self-disclosure
Personal information
Profanity
30 2014-10-23
43. Methodology
Twitter data
131K users
2M conversations
Relationship strength
Conversation frequency (CF)
Conversation length (CL)
Self-disclosure
Personal information
Profanity
Analysis with topic models
Latent Dirichlet allocation (LDA, [Blei, JMLR 2003])
30 2014-10-23
44. Relationship Strength
CF: conversation frequency
The numberof conversational chains between the dyad averaged per month
CL: conversation length
The lengthof conversational chains between the dyad averaged per month
31 2014-10-23
45. Relationship Strength
CF: conversation frequency
The numberof conversational chains between the dyad averaged per month
CL: conversation length
The lengthof conversational chains between the dyad averaged per month
Relationship strength
A high CF or CL for a dyad means the relationship is strong
A low CF or CL for a dyad means the relationship is weak
31 2014-10-23
46. Self-disclosure
Personal information
Personally Identifiable Information (PII)
Personally Embarrassing Information (PEI)
Profanity
nigga, ass, wtf, lmao
32 2014-10-23
47. Self-disclosure: Personal Information
Personally Identifiable Information (PII)
Personally Embarrassing Information (PEI)
33 2014-10-23
Ex) name, location,
email address, job,
social security number
Ex) clinical history,
sexual life,
job loss,
family problem
48. Self-disclosure: Personal Information
Discover topics in each conversation
Use LDA[Blei2003]with 푘푘=300
LDA outputs a topic proportion for each conversation
LDA outputs a multinomial word distribution for each topic
34 2014-10-23
49. Self-disclosure: Personal Information
Discover topics in each conversation
Use LDA[Blei2003]with 푘푘=300
LDA outputs a topic proportion for each conversation
LDA outputs a multinomial word distribution for each topic
Find related topics
Annotate conversations that best represent each topic
Use Amazon Mechanical Turk
Turkers annotated conversations for
Existence of PII
Existence of PEI
Keywords
34 2014-10-23
50. Self-disclosure: Personal Information
Example of PII, PEI and Profanity topics
Shown by high probability words in each topic
PII 1
PII 2
PEI1
PEI 2
PEI 3
Profanity
san
tonight
pants
teeth
family
nigga
live
time
wear
doctor
brother
lmao
state
tomorrow
boobs
dr
sister
shit
texas
good
naked
dentist
uncle
ass
south
ill
wearing
tooth
cousin
bitch
35 2014-10-23
53. Results: Interpretation
PII
When they meet new acquaintances, they use PII to introduce themselves
38 2014-10-23
54. Summary
Used a large corpus of Twitter conversations
Measured relationship strength by conversation frequency and conversation length
Measured self-disclosure by
PII, PEI
Profanity
Confirmed hypothesis that stronger relationships show more self-disclosure behaviors in Twitter conversations
39 2014-10-23
55. Weakness of the Paper
40 2014-10-23
Use naïve definition of degree of self-disclosure
PII, PEI, Profanity
Need to use more concrete definition for self-disclosure degree
56. Weakness of the Paper
40 2014-10-23
Use naïve definition of degree of self-disclosure
PII, PEI, Profanity
Need to use more concrete definition for self-disclosure degree Self-disclosure level
57. Weakness of the Paper
40 2014-10-23
Use naïve definition of degree of self-disclosure
PII, PEI, Profanity
Need to use more concrete definition for self-disclosure degree
Use naïve computational method
LDA with post-processing
Need to build more concrete novel method
Self-disclosure level
58. Weakness of the Paper
40 2014-10-23
Use naïve definition of degree of self-disclosure
PII, PEI, Profanity
Need to use more concrete definition for self-disclosure degree
Use naïve computational method
LDA with post-processing
Need to build more concrete novel method
Self-disclosure level
Self-disclosure Topic Model
60. Difficulties for SD research
Lack of ground-truth dataset of SD level
No tagged dataset for Twitter conversation
No accessible self-disclosure datasets
42 2014-10-23
61. Difficulties for SD research
Lack of ground-truth dataset of SD level
No tagged dataset for Twitter conversation
No accessible self-disclosure datasets
Lack of study about SD in computational linguistics
Definitions and relations with others in social psychology
Survey or hand-coding
Related word categories in LIWC [Houghton et al.2012]
42 2014-10-23
62. Ground-truth Dataset
Process
Sample random 301 Twitter conversations
Ask it to three judges
Tag self-disclosure level to each tweet
Work on a web-based platform
43
Screenshot of annotation web-based platform
2014-10-23
63. Ground-truth Dataset
Process
Sample random 301 Twitter conversations
Ask it to three judges
Tag self-disclosure level to each tweet
Work on a web-based platform
Result
Tagged G: 122, M: 147, H: 32 conversations
Fleiss kappa: 0.68
43
Screenshot of annotation web-based platform
2014-10-23
64. Assumptions: First person pronouns
First person pronouns are good indicators for self-disclosure
Ex) ‘I’, ‘My’
Used in previous research [Joinsonet al.2001, Barak et al.2007]
44 2014-10-23
65. Assumptions: First person pronouns
First person pronouns are good indicators for self-disclosure
Ex) ‘I’, ‘My’
Observed highly discriminative features between G and M/H in annotated dataset
45
Unigram
Bigram
Trigram
my
I love
I have a
I
I was
is going to
I’m
I have
to go to
but
my dad
wantto go
was
go to
and I was
I’ve
my mom
going to miss
2014-10-23
66. Assumptions: Topics
M and H level have different topics
[General vsSensitive] information about self or intimate
46 2014-10-23
67. Assumptions: Topics
Self-disclosure related topics by LDA
Location
Time
Adult
Health
Family
Profanity
san
tonight
pants
teeth
family
nigga
live
time
wear
doctor
brother
lmao
state
tomorrow
boobs
dr
sister
shit
texas
good
naked
dentist
uncle
ass
south
ill
wearing
tooth
cousin
bitch
47 2014-10-23
68. Assumptions: Topics
M and H level have different topics
[General vsSensitive] information about self or intimate
Can be formalized as topics
Personally Identifiable Information
General information about self
Ex) name, location, email address, job, …
Secrets
Sensitive information about self
Ex) physical appearance, health, sexuality, death, …
48 2014-10-23
69. Graphical model of Self-Disclosure Topic Model
Self-Disclosure Topic Model (SDTM)
Based on probabilistic topic modeling
49 2014-10-23
70. Graphical model of Self-Disclosure Topic Model
Self-Disclosure Topic Model (SDTM)
Based on probabilistic topic modeling
Classifying G and M/H level
Observed first-person pronouns
Using learned maximum entropy classifier
49 2014-10-23
71. Graphical model of Self-Disclosure Topic Model
Self-Disclosure Topic Model (SDTM)
Based on probabilistic topic modeling
Classifying G and M/H level
Observed first-person pronouns
Using learned maximum entropy classifier
Classifying M and H level
Observed words
Using seed words for each level
49 2014-10-23
72. Self-Disclosure Topic Model (SDTM)
50 2014-10-23
Rough description of how to infer self-disclosure in SDTM
Maximum Entropy
Classifier
Topic Model
G level
M level
H level
Topic Model with Seed Words
Tweet
73. Self-Disclosure Topic Model (SDTM)
50 2014-10-23
Rough description of how to infer self-disclosure in SDTM
Maximum Entropy
Classifier
Topic Model
G level
M level
H level
Topic Model with Seed Words
Tweet
74. Self-Disclosure Topic Model (SDTM)
50 2014-10-23
Rough description of how to infer self-disclosure in SDTM
Maximum Entropy
Classifier
Topic Model
G level
M level
H level
Topic Model with Seed Words
Tweet
75. Self-Disclosure Topic Model (SDTM)
50 2014-10-23
Rough description of how to infer self-disclosure in SDTM
Maximum Entropy
Classifier
Topic Model
G level
M level
H level
Topic Model with Seed Words
Tweet
76. Self-Disclosure Topic Model (SDTM)
50 2014-10-23
Rough description of how to infer self-disclosure in SDTM
Maximum Entropy
Classifier
Topic Model
G level
M level
H level
Topic Model with Seed Words
Tweet
77. Self-Disclosure Topic Model (SDTM)
50 2014-10-23
Rough description of how to infer self-disclosure in SDTM
Maximum Entropy
Classifier
Topic Model
G level
M level
H level
Topic Model with Seed Words
Tweet
78. Maximum Entropy Classifier
51 2014-10-23
Learned from annotated dataset
Works better than others
(C4.5, Naïve Bayes, SVM with linear kernel, polynomial kernel
and radial basis)
Used to identify aspect and opinions in topic model [Zhao2010]
79. Seed Words
Seed words are prior knowledge for each level
G level
No seed words (symmetric prior)
M level
Data-driven approach in Twitter conversation
H level
Data-driven approach from external dataset
52 2014-10-23
80. Seed Words
M level
Data-driven approach
Use Twitter conversation dataset
Get frequently occurred trigram that begin with ‘I’ and ‘my’
53 2014-10-23
81. Seed Words
M level
Data-driven approach
Use Twitter conversation dataset
Get frequently occurred trigram that begin with ‘I’ and ‘my’
Example seed words
53
Name
Birthday
Location
Occupation
My nameis
My birthday is
Ilive in
My jobis
My last name
Mybirthday party
Ilived in
My new job
My realname
My bdayis
I live on
My high school
2014-10-23
82. Seed Words
H level
Data-driven approach
Use external dataset (Six Billion Secrets)
http://www.sixbillionsecrets.com
Users write and share his/her secrets
26,523 posts
Extract high ranked word features
54 2014-10-23
Example of secret posts in Six Billion Secrets
83. Seed Words
H level
Data-driven approach
Use external dataset (Six Billion Secrets)
http://www.sixbillionsecrets.com
Users write and share his/her secrets
26,523 posts
Extract high ranked word features
Example seed words
54
Physical appearance
Health condition
Death
chubby
addicted
dead
fat
surgery
died
scar
syndrome
suicide
acne
disorder
funeral
2014-10-23
Example of secret posts in Six Billion Secrets
84. Classifying Performance
Data
Annotated Twitter conversation
Random shuffled 80/20 train/test
55 2014-10-23
85. Classifying Performance
Data
Annotated Twitter conversation
Random shuffled 80/20 train/test
Methods
BOW+
Bag of Words + Bigrams + Trigrams features, Maximum entropy
FirstP
Occurrence of first-person pronouns features, Maximum entropy
SEED
Seed words and trigrams features, Maximum entropy
FirstP+SEED
FirstP and SEED feature, Two stage Maximum entropy
SDTM
Self-disclosure Topic Model
55 2014-10-23
93. Self-disclosure & Social features
What are relations between self-disclosure and social features
in Twitter conversations?
Research questions
1.Does high self-disclosure lead to longer conversations?
2.Is there difference in conversation length patterns over time depending on overall self-disclosure level?
3.Does high self-disclosure users have many conversation partners?
4.Does high self-disclosure users have more conversations frequently?
59
94. Research Questions
Q1) Does high self-disclosure lead to longer conversations?
60 2014-10-23
95. Research Questions
Q2) Is there difference in conversation length patterns over time depending on overall self-disclosure level?
61 2014-10-23
High SD level dyad
Low SD level dyad
96. Research Questions
62 2014-10-23
Q3) Does high self-disclosure users have many conversation
partners?
High SD level user
Low SD level user
97. Research Questions
63 2014-10-23
Q4) Does high self-disclosure users have more conversations
frequently?
High SD level user
Low SD level user
98. Results
High ranked topics in each level (G, M, H levels)
Shown by high probability words in each topic
G 1
G 2
M 1
M 2
H 1
H 2
obama
league
send
going
better
ass
he’s
win
email
party
sick
bitch
romney
game
i’ll
weekend
feel
fuck
vote
season
sent
day
throat
yo
right
team
dm
night
cold
shit
president
cup
address
dinner
hope
fucking
64 2014-10-23
99. Results
Q1) Does high self-disclosure lead to longer conversations?
Ans) Positive relations between initial SD level and changes CL
65 2014-10-23
100. Results
Q2) Is there difference in CL patterns over time by overall SD level?
Ans) ‘high’ and ‘mid’ groups increase CL over time, not ‘low’
‘high’ groups talk more in a conversation than ‘mid’ & ‘low’ groups
66 2014-10-23
101. Results
67 2014-10-23
Q3) Does high self-disclosure users have many conversation partners?
Ans) ‘mid’ self-disclosure users have more conversation partners than
others
#Partners
# Conv/ Day
Words / Conv
ConvLength
low
3.33
0.46
59.17
4.13
mid
3.55
0.52
61.17
4.28
high
3.47
0.54
63.26
4.45
p-value
<0.001
<0.001
<0.1
<0.001
102. Results
68 2014-10-23
Q4) Does high self-disclosure users have more conversations
frequently?
Ans) ‘high’ self-disclosure users have more conversations per day than
others
#Partners
# Conv/ Day
Words / Conv
ConvLength
low
3.33
0.46
59.17
4.13
mid
3.55
0.52
61.17
4.28
high
3.47
0.54
63.26
4.45
p-value
<0.001
<0.001
<0.1
<0.001
103. Results
69 2014-10-23
Finding)
Researchers often look at the number of words in a conversation
for relation with self-disclosure
Conversation length is more significant than # words
#Partners
# Conv/ Day
Words / Conv
ConvLength
low
3.33
0.46
59.17
4.13
mid
3.55
0.52
61.17
4.28
high
3.47
0.54
63.26
4.45
p-value
<0.001
<0.001
<0.1
<0.001
104. Summary
70 2014-10-23
Self-disclosure (SD)
Definition from social psychology
Limitations inprevious research
Computational approaches for self-disclosure
Twitter conversation dataset
Self-disclosure topic model (SDTM)
Self-disclosure & Social features
Relationship strength over time
Conversation partners and frequency
105. Future Work
71 2014-10-23
Self-disclosure for a user timeline tweets
Have positive relations with
Loneliness [Al-Saggaf.2014]
Online social network usage[Trepte.2013]
Predict user’s
Loneliness and give a social support
Usage patterns in online social network and give feedback
Self-disclosure by machine
Looks like human in dialogue system
Can increase satisfaction in talking cure dialogue system
106. Reference
72 2014-10-23
[Jourard1971b] Sidney M Jourard. 1971b. The transparent self (rev. ed.). Princeton,
NJ: VanNostrand.
[Jourard1958] Sidney M Jourard and Paul Lasakow. 1958. Some factors in self-disclosure.
The Journal of Abnormal and Social Psychology, 56(1):91.
[Dunbar et al.1997] Robin IM Dunbar, Anna Marriott,
and Neil DC Duncan. 1997. Human conversational behavior. Human Nature, 8(3):231–246.
[Vondracek and Vondracek1971] Sarah I Vondracek and Fred W Vondracek. 1971.
The manipulation and measurement of self-disclosure in preadolescents. Merrill-
Palmer Quarterly of Behavior and Development, 17(1):51–58.
[Chelune and others1979] Gordon J Chelune et al. 1979. Self-disclosure: Origins,
patterns, and implications of openness in interpersonal relationships. Jossey-Bass San
Francisco.
[Barak&Gluck-Ofri2007] Azy Barak and Orit Gluck-Ofri. 2007. Degree and
reciprocity of self-disclosure in online forums. CyberPsychology & Behavior,
10(3):407–417.
[Jo2011] Jo, Yohan, and Alice H. Oh. "Aspect and sentiment unification model for
online review analysis." Proceedings of the fourth ACM international conference on Web
search and data mining. ACM, 2011.
107. Reference
73 2014-10-23
[Tamir and Mitchell2012] Diana I Tamir and Jason P Mitchell. 2012. Disclosing
information about the self is intrinsically rewarding. roceedings of the National
Academy of Sciences, 109(21):8038–8043.
[Duck2007] Steve Duck. 2007. Human relationships. Sage.
[Bak et al.2012] JinYeong Bak, Suin Kim, and Alice
Oh. 2012. Self-disclosure and relationship strength in twitter conversations. In Proceedings of the 50thAnnual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2, pages 60–64. Association for Computational Linguistics.
[Derlega et al.1993] Valerian J. Derlega, Sandra Metts, Sandra Petronio, and Stephen
T. Margulis. 1993. Self-Disclosure, volume 5 of SAGE Series on Close Relationships.
SAGE Publications, Inc.
[Wills1985] Thomas Ashby Wills. 1985. Supportive functions of interpersonal
relationships.
[Trepte.2013] Sabine Trepte and Leonard Reinecke. 2013. The reciprocal effects of
social network site use and the disposition for selfdisclosure: A longitudinal study.
Computers in Human Behavior, 29(3):1102 – 1112.
[Harris, J, 2009] Kamvar, Sep, and Jonathan Harris. We feel fine: An almanac of
human emotion. Simon and Schuster, 2009.
108. Reference
74 2014-10-23
[Houghton and Joinson2012] David J Houghton and Adam N Joinson. 2012.
Linguistic markers of secrets and sensitive self-disclosure in twitter. In System
Science (HICSS), 2012 45th Hawaii International
Conference on, pages 3480–3489. IEEE.
[Steinfield et al.2008] Charles Steinfield, Nicole B Ellison, and Cliff Lampe. 2008.
Social capital, selfesteem, and use of online social network sites: A longitudinal
analysis. Journal of Applied Developmental Psychology, 29(6):434–445.
[Petronio2002] Petronio, S. 2002. Boundaries of privacy: Dialectics of disclosure. 29.
Albany, NY
[Valkenburg2011] Valkenburg, Patti M and Sumter. 2011. Sindy R and Peter, Jochen,
Gender differences in online and offline self-disclosure in pre-adolescence and
adolescence. British journal of developmental psychology
[Sprecher2012] Susan Sprecher, Stanislav Treger and Joshua D. Wondra. 2012. Effects
of self-disclosure role on liking, closeness, and other impressions in get-acquainted
interactions. Journal of Social and Personal Relationships.
[Zhao2010] Wayne Xin Zhao, Jing Jiang, HongfeiYan, and Xiaoming Li. 2010. Jointly
modeling aspects and opinions with a maxent-lda hybrid. In Proceedings of EMNLP.
109. Thank you!
Any questions or comments?
JinYeongBakjy.bak@kaist.ac.krDepartment of Computer Science, KAIST
75 2014-10-23