Geographically distributed collaborative teams often rely on synchronous text-based online communication for accomplishing tasks and maintaining social contact. This technology leaves a trace that can help researchers understand affect expression and dynamics in distributed groups. Although manual labeling of affect in chat logs has shed light on complex group communication phenomena, scaling this process to larger data sets through automation is difficult. We present a pipeline of natural language processing and machine learning techniques that can be used to build automated classifiers of affect in chat logs. Interpreting affect as a dynamic, contextualized process, we explain our development and application of this method to four years of chat logs from a longitudinal study of a multicultural distributed scientific collaboration. With ground truth generated through manual labeling of affect over a subset of the chat logs, our approach can successfully identify many commonly occurring types of affect.
The full paper: http://dx.doi.org/10.1145/2441776.2441813
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Statistical Affect Detection in Collaborative Chat
1. Statistical Affect Detection
in Collaborative Chat
CSCW 2013: Mining Social Media Data, Feb. 23
Michael Brooks, Katie Kuksenok, Megan K. Torkildson,
Daniel Perry, John J. Robinson, Taylor Jackson Scott, Ona
Anicello, Ariana Zukowski, Paul Harris, Cecilia R. Aragon
Scientific
Collaboration
& Creativity
Lab
3. June, 2007
6:07:57 Ray cool, it worked amusement, relief
6:08:04 Matt woot excitement, joy
6:08:07 Ray awesome, I don't think he needs that acceptance, no affect
long of a sleep after turning it off
6:08:47 We enhanced eready to detect the no affect
sticking
6:08:58 Matt good job supportive, acceptance
6:09:21 seems it did well there happiness, no affect
6:09:26 Ray yeah, pretty cool huh? interest, agreement,
happiness
6:09:43 Matt helps keep me from having to stopaic no affect
and restart
6:09:55 Ray indeed, that was the point agreement
Scientific Collaboration & Creativity Lab 2/27/2013 3
4. Nearby Supernova Factory
• 30 astrophysicists
• US / France
• Daily remote operation of
telescope
• Rely on chat to communicate
Scientific Collaboration & Creativity Lab 2/27/2013 4
7. SNfactory Chat Logs
• Four years of logs - 449,684 messages
• Manual coding for affective expressions
– 27,344 chat messages coded
– 1-5 coders per message
– 30 affect codes
– Multiple codes allowed
Scott et al. SIGDOC 2012. Adapting Grounded Theory to Construct a Taxonomy
of Affect in Collaborative Online Chat.
Scientific Collaboration & Creativity Lab 2/27/2013 7
8. June, 2007
6:07:57 Ray cool, it worked amusement, relief
6:08:04 Matt woot excitement, joy
6:08:07 Ray awesome, I don't think he needs that acceptance, no affect
long of a sleep after turning it off
6:08:47 We enhanced eready to detect the no affect
sticking
6:08:58 Matt good job supportive, acceptance
6:09:21 seems it did well there happiness, no affect
6:09:26 Ray yeah, pretty cool huh? interest, agreement,
happiness
6:09:43 Matt helps keep me from having to stopaic no affect
and restart
6:09:55 Ray indeed, that was the point agreement
Scientific Collaboration & Creativity Lab 2/27/2013 8
10. Linguistic Inquiry and Word Count
(LIWC)
• Detects words for Positive / Negative Emotions
I wish every day Positive: 15%
could be sunny Negative: 8%
and warm. Rain …
makes me angry.
Scientific Collaboration & Creativity Lab 2/27/2013 10
11. June, 2005
11:44:08 Gabri ok that's better relief, serenity
11:44:17 Marcel GREAT ! excitement, happiness,
relief, joy
11:44:17 Gabri let's start aic and see anticipation, no affect
11:44:23 Marcel yes ... no affect
11:44:31 Derek Great what? confusion
11:44:32 Gabri can you do that? interest, no affect
11:44:50 derek.. it seems that now the focus is ok no affect
11:45:04 and we can finally start observing no affect
11:45:23 Derek Oh good! happiness, relief, joy
11:45:48 I have been waiting for this moment, because I amusement
want to leave the room and get my midnight
snack. ;)
11:46:54 Gabri go... amusement, no affect
11:47:02 and enjoy your snack amusement, no affect
11:47:13 Derek HEhe. amusement
11:47:18 I will bring it back here of course. amusement
Scientific Collaboration & Creativity Lab 2/27/2013 11
12. The telescope is stuck! >:(
frustration
The telescope is stuuuuuuuuuck...
annoyance
The telescope is stuck??
confusion
Scientific Collaboration & Creativity Lab 2/27/2013 12
15. Emoticons
Naomi: I think we'd better stopaic... :( sadness
Matt: today was a gym + laundry day :) amusement, happiness
Marcel: and she can't teach over an ssh- amusement
channel ;-)
Scientific Collaboration & Creativity Lab 2/27/2013 15
16. Word Sets
Swear Words
Ray: why the **** doesn't stop_script ******* rage
STOP THE ******* SCRIPT
Matt: ******* ******* ******* I think I broke it frustration, anger,
apprehension,
embarrassment
Negations
Paul: but I wouldn't hazzard a guess apprehension
Ray: cannot talk to camera frustration, no-affect
Scientific Collaboration & Creativity Lab 2/27/2013 16
17. Character Features
Letter Repetition
Ray: noooooooooooooooo, it must be stopped annoyance, anger, fear
Marcel: AAaah too late, they will find meeee amusement
Punctuation
Rick: looks like something bad happened here... apprehension
Rene: 1 month before max??!? surprise, confusion,
considering
Capitalization
Marcel: ON TARGET ! relief, joy
Paul: we must set-up adopt an EXPLODING STAR amusement, no-affect
Scientific Collaboration & Creativity Lab 2/27/2013 17
18. Feature Value
Alice: ok, so where was “ok” 1
the ******* SN on the “telescope” 0
image? “where” 1
“SN” 1
“image” 1
question marks 1
swears 1
emoticon :) 0
1st person pronouns 0
capitals 2
repetition 0
punctuation 1
length 45
…
Scientific Collaboration & Creativity Lab 2/27/2013 18
19. Feature importance
Confusion Messages labeled Confusion
???? length Ben: ??? - the answer is likely found in
# question marks the otsim code
"understand" Marcel: well ... I'm not so sure ...
"confus_" Gary: Why do we care at all then?
"why" Ray: ummm I mean how does it get to
"what" the header
"nothing"
"wrong"
msg. length
"thought"
Scientific Collaboration & Creativity Lab 2/27/2013 19
20. Feature importance
Apprehension Messages labeled Apprehension
"bad" Pascal: the problem is than the
"something" automated detection will not work ...
"problem" too much galaxy
"we" Ray: But now bad stuff in window
"seem" Ben: pascal, we had a problem with
"too" do_fchart
msg. length Gabriel: So something is completely
"not" wrong
# 3rd sg. Pronouns
# swearing
Scientific Collaboration & Creativity Lab 2/27/2013 20
21. Feature importance
Amusement Messages labeled Amusement
emoticon ";)" Kevin: hehe
emoticon ":)" Ray: hahahaah
laughter Stef: lol ok derek :)
emoticon ";-)" Ray: He never sleeps -- you know that.
"fun" Pascal: but I think it could be interesting
laughter length for Extreeeeeeeeeeme photometry
"p" study ;-)
# people names
"sleep"
"of"
Scientific Collaboration & Creativity Lab 2/27/2013 21
22. Specialized Features
• Count words based on the data
• Medium-specific features
– Emoticons, punctuation…
• Context-specific features
– People names, jargon…
• Affect-specific features
– Swearing vs. emoticons
Scientific Collaboration & Creativity Lab 2/27/2013 22
23. 5:17:48 Marcel ok, so let's cycle the stuff September, 2006
5:18:04 Rick ok…
5:18:40 Marcel damn mouse cutandpast
5:19:03 Ray off 1 right? then on 1?
5:19:32 Marcel have you telnet sdsugreen ??
5:19:58 Ray director on lbl2 looks dead
5:20:34 Marcel ok, one thind at a time. have you cycled the baytech on sdsugreen ?
5:20:36 Ray what is best way to revive it
5:20:39 baytech
5:20:40 yes
5:20:46 not sdsu
5:21:08 go ahead and do it I am not evneon this **** shift...grrr
5:21:22 Marcel ok, maybe we have to kill director and restart it mkanually
5:21:32 Ray yeah but that's tricky; all these damn arguments
5:23:53 Rick emile, I have no idea what's going on here
5:23:57 only that it is bad
Scientific Collaboration & Creativity Lab 2/27/2013 23
24. 5:17:48 Marcel ok, so let's cycle the stuff September, 2006
5:18:04 Rick ok…
5:18:40 Marcel damn mouse cutandpast
5:19:03 Ray off 1 right? then on 1?
5:19:32 Marcel have you telnet sdsugreen ??
5:19:58 Ray director on lbl2 looks dead
5:20:34 Marcel ok, one thind at a time. have you cycled the baytech on sdsugreen ?
5:20:36 Ray what is best way to revive it
5:20:39 baytech
5:20:40 yes
5:20:46 not sdsu
5:21:08 go ahead and do it I am not evneon this **** shift...grrr
5:21:22 Marcel ok, maybe we have to kill director and restart it mkanually
5:21:32 Ray yeah but that's tricky; all these damn arguments
5:23:53 Rick emile, I have no idea what's going on here
5:23:57 only that it is bad
Scientific Collaboration & Creativity Lab 2/27/2013 24
26. Support Vector Machine
• Accurate
• Fast
# “ok”
• Transparent
# swear words
“frustration” applies
“frustration” does not apply
Scientific Collaboration & Creativity Lab 2/27/2013 26
27. Support Vector Machine
• Accurate
• Fast
# “ok”
?
• Transparent
# swear words
“frustration” applies
“frustration” does not apply
Scientific Collaboration & Creativity Lab 2/27/2013 27
29. Interpretability
• How is the classifier
making decisions?
# “ok”
• What features are
important in the model?
# swear words
“frustration” applies
“frustration” does not apply
Scientific Collaboration & Creativity Lab 2/27/2013 29
30. Feature importance
Amusement Messages labeled Amusement
emoticon ";)" Kevin: hehe
emoticon ":)" Ray: hahahaah
laughter Stef: lol ok derek :)
emoticon ";-)" Ray: He never sleeps -- you know that.
"fun" Pascal: but I think it could be interesting
laughter length for Extreeeeeeeeeeme photometry
"p" study ;-)
# people names
"sleep"
"of"
Scientific Collaboration & Creativity Lab 2/27/2013 30
33. Sequential Modeling
5:19:58 Ray director on lbl2 looks dead
5:20:34 Marcel ok, one thind at a time. have you cycled the baytech on sdsugreen ?
5:20:36 Ray what is best way to revive it
5:20:39 baytech
5:20:40 yes
5:20:46 not sdsu
5:21:08 go ahead and do it I am not evneon this **** shift...grrr
5:21:22 Marcel ok, maybe we have to kill director and restart it mkanually
5:21:32 Ray yeah but that's tricky; all these damn arguments
5:23:53 Rick emile, I have no idea what's going on here
5:23:57 only that it is bad
Scientific Collaboration & Creativity Lab 2/27/2013 33
35. Affect in Twitter
45000
40000
35000
30000
Number of Tweets
25000
game resumes
20000
blackout
game over
halftime
game resumes
kickoff
15000
10000
5000
0
Time (EST), 2/3/2013 positive negative neutral
Scientific Collaboration & Creativity Lab 2/27/2013 35
36. Classify…
• Positive/negative/neutral
sentiment
• Highly granular emotions
• Anything else you can label
github.com/etcgroup/aloe
In…
Download it, use it, & tell us what • longer, formal documents (blog
you think! posts, reviews)
• individual sentences
Michael Brooks • instant messages
mjbrooks@uw.edu • tweets
http://depts.washington.edu/sccl
• Anything else you can put in CSV
Scientific Collaboration & Creativity Lab 2/27/2013 36
Notes de l'éditeur
Researchers working with social media have more data available than ever before.There is great potential for new insights, but the data sets are very large and complex. How can we help people understand data sets collected from social media and other online communication?Our research group is studying how a combination of visualization and machine learning can be integrated into a qualitative research workflow to help researchers dig into these new data sources in a rich, but also scalable way.
In this paper, we focus on a large collection of chat logs from scientists working together on a specific project.Our group is doing ongoing qualitative research to understand how, when, and why the scientists express emotion, or affect, and how affect relates to creativity and problem solving in this data set.The data set is too large to manually code it ourselves, and privacy and specialized domain knowledge prevent us from using something like Mechanical Turk.In this talk, I will present some of the issues we have explored around using machine learning to automatically label the data, in support of scalable rich analysis.I will focus on the importance of developing a diverse, specialized feature set and the use of interpretable classification algorithms.
I’ll start by giving a bit of background about the data…
Ray and Matt are discussing a new program that Ray created to automatically un-stick the telescope, saving the scientists a lot of time.Many lines have multiple types of affect, while some lines have no affect.
Most affect codes are very rare.Reliability ranges from 0.4 to 0.8
Before I go on…LIWC is an popular text analysis tool that can be used for finding emotions or sentiment in text.LIWC processes blocks of text, counting words that belong to specific sets of dictionary words that have been previously determined to have particular meanings.This is called a lexicon-based approach.The words sunny and warm are part of LIWC’s Positive Affect lexicon, while angry is part of its Negative Affect lexicon.So, LIWC would output that this text has two positive words and one negative word.
For data sets like ours, we believe that this kind of approach is not appropriate.While LIWC’s validity has been carefully studied for very narrow domains of English writing, informal online communications such as chat messages and tweets use a lot of domain-specific vocabulary and non-standard textual cues to communicate affect, almost becoming another language entirely. The medium and the context of communication are often critical to correctly understanding emotional content.
Let me illustrate this with a quick example. This is a chat message rewritten three ways.LIWC is not built to recognize expressions such as emoticons, or intentionally mispelled words. Punctuation cues are not taken into account.Furthermore, in general English, a word like stuck may not have strong emotional connotations, but in our data set, it is used when scientists are struggling with telescope problems. Therefore it is quite an effective way to recognize frustration, for example. LIWC and other tools that use standard English lexicons will miss out on these signals.So if we aren’t going to use a predefined, validated lexicon of affect-laden words, what will we use to recognize affect?
We based our features on a combination of previous literature and our knowledge of this chat data set we were working with.
We look at all of the words that occur anywhere in the training data and select the most common 4-600 of those.Each becomes a feature that our classifiers can use to recognize affect. The words do not come from a predefined list, but from the data itself.This helps us pick up on jargon and other unconventional word usage.
Using a list of over 2000 punctuation patterns recognized as emoticons, we also add the most frequently occurring emoticons to the feature set.
In addition to these corpus-based features, we have a several specific types of words that we look for. So, we have a feature for the # of swear words in the message, or the number of negation words.
We look at character-level features like the number of repeated consecutive letters, sequences of exclamation points, or the number of capital letters.These are used extensively in chat messages and other informal online communication to signal emotion, mood, or affect.
Here’s an example to illustrate how this works.On the right, is a subset of the features that we extract from the message.In reality the list is about 800 features long.
I’m going to skip ahead for a moment to some results.One we train and evaluate classifiers for the affect codes that we want to automatically label, one thing we can do is look and see which of those 800+ features were actually important.This example shows the top 10 most highly weighted features for the classifier trained to recognize confusion.On the right are a few example messages that our coders labeled with confusion.Clearly, the presence of question marks and certain key words (understand, why, what…) are useful for knowing when someone is confused.
Compare that to the top features for Apprehension.A different set of key words has risen to the top…, in addition to the number of 3rdsg pronouns and swear words.The examples on the right can help you see how those words are used and why they might be associated with apprehension.
And for amusement, emoticons and laughter expressions were the most useful features.Note that the presence of names of specific scientists were also important factors in labeling for amusement.
The conclusion we want to stress is that for communication that resembles chat, specialized features are critical for recognizing a wide range of affect codes.Features that were intimately based on the data (word counts and emoticons) but also features specific to the communication medium (emoticons and punctuation) were highly utilized.And the usefulness of each feature varied greatly from one type of affect to another.
Now, I’llexplain in more detail how those features are used in classification, and why we strongly recommend using interpretable, transparent classification algorithms for automated or partially automated coding as part of qualitative research.As I’ve said, we focused only on the 13 most frequently used types of affect. We created one binary classifier for each affect code.
This means that the problem facing the classifier is the following: Given Ray’s message “what is the best way to revive it”, does the code frustration apply?
We compared the performance of a wide variety of classification algorithms, a few of which are shown here. We selected a linear support vector machine because it had a very promising performance characteristics, but also because it is fast to train and use, and provides a level of transparency to its inner workings not afforded by lots of other algorithms.
I’ll explain a little about how linear SVMs are used to classify text.Let’s say that you have only two features, #ok and #swears. The messages in your training data can each be plotted in this 2D space.In this example there is a pretty clear separation between those that were manually labeled with the frustration code and those which were not.When you train an SVM classifier on this data, it finds a line that best separates the frustrated messages from the non-frustrated messages (according to a particular definition of “best separates”). Such as this one.
Then, given a new unlabeled message with few swear words and a medium number of “ok”s, the classifier can label it as non-frustrated because it falls on that side of the line.
This chart shows precision and recall from 10-fold cross validation for each of our 13 affect codes, using balanced data.Precision is the percent of messages out of all of the messages that the classifier labeled as positive, which were truly supposed to be positive.Recall is the percent out of all of the truly positive messages that the classifier successfully labeled as positive.So, performance is between 60 and 80% for most codes, with a high 93% for interest.But, how can we know if these classifiers are actually useful for automatically coding chat messages for our research?
Now, this is what I meant when I said the SVM is relatively transparent or interpretable.Supposed we learned the following model from the data.From this, we can see that swear words have more predictive power for frustration, while # of “ok” hardly makes any difference.In other words, by looking at the slope of the line, we can find out which features were the most important.
This is exactly where these tables from earlier came from.Examination of the SVM feature weights gives us a very easy way to gain a measure of insight into how and why the classifier behaves the way it does, which can help us understand how useful it might be for automatic coding.
And in general, webelieve that for this kind of application, understand how/why the classifier does or doesn’t work may be far more important than optimizing specific classification performance metrics (like precision, recall, accuracy, f1 score)
Sequential modeling approaches such as hidden-markov modelsContext is clearly important to understanding the emotion communicated in chat messages. Looking at messages in isolation can only get you so far.Sequential modeling techniques can more directly take contextual information into account.
Further, we are studying how visual analytics and interactive machine learning can be combined to create powerful tools for analyzing large social communication data sets.
Finally, we are extending this work by developing new features and algorithms for processing tweets, where data set size can easily extends into the millions of messages, and different signals are used to communicate affect.
We have published the code from this study on GitHub, as a Java program called ALOE.ALOE uses the Weka machine learning library, and can easily be extended and used for affect classification and other text classification work. We invite you to try it out and let us know what you think.Questions?