SlideShare une entreprise Scribd logo
1  sur  37
Télécharger pour lire hors ligne
Statistical Affect Detection
      in Collaborative Chat
CSCW 2013: Mining Social Media Data, Feb. 23

 Michael Brooks, Katie Kuksenok, Megan K. Torkildson,
Daniel Perry, John J. Robinson, Taylor Jackson Scott, Ona
Anicello, Ariana Zukowski, Paul Harris, Cecilia R. Aragon

  Scientific
  Collaboration
  & Creativity
  Lab
Scientific Collaboration & Creativity Lab   2/27/2013   2
June, 2007
  6:07:57     Ray cool, it worked                          amusement, relief
  6:08:04    Matt woot                                      excitement, joy
  6:08:07     Ray awesome, I don't think he needs that    acceptance, no affect
                  long of a sleep after turning it off

  6:08:47          We enhanced eready to detect the             no affect
                   sticking

  6:08:58    Matt good job                               supportive, acceptance
  6:09:21          seems it did well there                happiness, no affect
  6:09:26     Ray yeah, pretty cool huh?                  interest, agreement,
                                                               happiness

  6:09:43    Matt helps keep me from having to stopaic          no affect
                  and restart

  6:09:55     Ray indeed, that was the point                   agreement



            Scientific Collaboration & Creativity Lab    2/27/2013             3
Nearby Supernova Factory
• 30 astrophysicists
• US / France
• Daily remote operation of
  telescope
• Rely on chat to communicate




      Scientific Collaboration & Creativity Lab   2/27/2013   4
5
6
SNfactory Chat Logs
• Four years of logs - 449,684 messages
• Manual coding for affective expressions
  –   27,344 chat messages coded
  –   1-5 coders per message
  –   30 affect codes
  –   Multiple codes allowed



Scott et al. SIGDOC 2012. Adapting Grounded Theory to Construct a Taxonomy
of Affect in Collaborative Online Chat.


        Scientific Collaboration & Creativity Lab       2/27/2013            7
June, 2007
  6:07:57     Ray cool, it worked                          amusement, relief
  6:08:04    Matt woot                                      excitement, joy
  6:08:07     Ray awesome, I don't think he needs that    acceptance, no affect
                  long of a sleep after turning it off

  6:08:47          We enhanced eready to detect the             no affect
                   sticking

  6:08:58    Matt good job                               supportive, acceptance
  6:09:21          seems it did well there                happiness, no affect
  6:09:26     Ray yeah, pretty cool huh?                  interest, agreement,
                                                               happiness

  6:09:43    Matt helps keep me from having to stopaic          no affect
                  and restart

  6:09:55     Ray indeed, that was the point                   agreement



            Scientific Collaboration & Creativity Lab    2/27/2013             8
Top 13 Affect Codes
                          Times Used                            Reliability (Kappa)
int…                                     4351        interest                                 0.808
am…                               3213           amusement                        0.611
co…                       1763                   considering               0.49
agr…                  1623                        agreement                0.491
an…                1212                           annoyance                               0.77
co…            1125                                confusion                      0.615
acc…          975                                 acceptance                          0.657
ap…          799                                apprehension                0.529
fru…    541                                       frustration                  0.55
sup…    518                                       supportive                     0.583
sur…   464                                          surprise                   0.543
ant…   426                                       anticipation            0.424
ser…   369                                          serenity                      0.602


              Scientific Collaboration & Creativity Lab            2/27/2013                  9
Linguistic Inquiry and Word Count
               (LIWC)
• Detects words for Positive / Negative Emotions


     I wish every day                       Positive: 15%
     could be sunny                         Negative: 8%
     and warm. Rain                         …
     makes me angry.




      Scientific Collaboration & Creativity Lab    2/27/2013   10
June, 2005
 11:44:08    Gabri ok that's better                                        relief, serenity
 11:44:17   Marcel GREAT !                                             excitement, happiness,
                                                                             relief, joy
 11:44:17    Gabri let's start aic and see                             anticipation, no affect
 11:44:23   Marcel yes ...                                                    no affect
 11:44:31   Derek Great what?                                                confusion
 11:44:32    Gabri can you do that?                                      interest, no affect
 11:44:50           derek.. it seems that now the focus is ok                 no affect
 11:45:04           and we can finally start observing                        no affect
 11:45:23   Derek Oh good!                                              happiness, relief, joy
 11:45:48           I have been waiting for this moment, because I          amusement
                    want to leave the room and get my midnight
                    snack. ;)
 11:46:54    Gabri go...                                                amusement, no affect
 11:47:02           and enjoy your snack                                amusement, no affect
 11:47:13   Derek HEhe.                                                     amusement
 11:47:18           I will bring it back here of course.                    amusement


             Scientific Collaboration & Creativity Lab               2/27/2013                11
The telescope is stuck! >:(
   frustration


The telescope is stuuuuuuuuuck...
   annoyance


The telescope is stuck??
   confusion




       Scientific Collaboration & Creativity Lab   2/27/2013   12
• Word counts
• Emoticons
• Word sets
   –   Swear words
   –   Pronouns
   –   Negations
   –   Participant names
• Characters
   – Capitalization
   – Letter repetition
   – Punctuation
• Metadata
   – segment duration, length, rate


       Scientific Collaboration & Creativity Lab   2/27/2013   13
• Word counts
• Emoticons
• Word sets
   –   Swear words
   –   Pronouns
   –   Negations
   –   Participant names
• Characters
   – Capitalization
   – Letter repetition
   – Punctuation
• Metadata
   – segment duration, length, rate


       Scientific Collaboration & Creativity Lab   2/27/2013   14
Emoticons
Naomi: I think we'd better stopaic... :(       sadness
Matt: today was a gym + laundry day :)         amusement, happiness
Marcel: and she can't teach over an ssh-       amusement
channel ;-)




       Scientific Collaboration & Creativity Lab         2/27/2013    15
Word Sets
                              Swear Words
Ray: why the **** doesn't stop_script *******       rage
STOP THE ******* SCRIPT
Matt: ******* ******* ******* I think I broke it    frustration, anger,
                                                    apprehension,
                                                    embarrassment


                                 Negations
Paul: but I wouldn't hazzard a guess                apprehension
Ray: cannot talk to camera                          frustration, no-affect




        Scientific Collaboration & Creativity Lab           2/27/2013        16
Character Features
                       Letter Repetition
Ray: noooooooooooooooo, it must be stopped        annoyance, anger, fear
Marcel: AAaah too late, they will find meeee      amusement


                           Punctuation
Rick: looks like something bad happened here...     apprehension
Rene: 1 month before max??!?                        surprise, confusion,
                                                    considering


                         Capitalization
Marcel: ON TARGET !                                   relief, joy
Paul: we must set-up adopt an EXPLODING STAR          amusement, no-affect



       Scientific Collaboration & Creativity Lab          2/27/2013          17
Feature Value
Alice: ok, so where was                              “ok”      1
the ******* SN on the                        “telescope”       0
        image?                                  “where”        1
                                                    “SN”       1
                                                “image”        1
                                         question marks        1
                                                  swears       1
                                             emoticon :)       0
                                    1st person pronouns        0
                                                 capitals      2
                                               repetition      0
                                            punctuation        1
                                                   length      45
                                                        …
    Scientific Collaboration & Creativity Lab      2/27/2013        18
Feature importance
   Confusion             Messages labeled Confusion
   ???? length           Ben: ??? - the answer is likely found in
# question marks            the otsim code
  "understand"           Marcel: well ... I'm not so sure ...
    "confus_"            Gary: Why do we care at all then?
      "why"              Ray: ummm I mean how does it get to
      "what"                the header
    "nothing"
     "wrong"
   msg. length
    "thought"


       Scientific Collaboration & Creativity Lab   2/27/2013        19
Feature importance
   Apprehension          Messages labeled Apprehension
       "bad"             Pascal: the problem is than the
   "something"              automated detection will not work ...
    "problem"               too much galaxy
       "we"              Ray: But now bad stuff in window
      "seem"             Ben: pascal, we had a problem with
       "too"                do_fchart
    msg. length          Gabriel: So something is completely
       "not"                wrong
# 3rd sg. Pronouns
    # swearing


       Scientific Collaboration & Creativity Lab   2/27/2013    20
Feature importance
  Amusement             Messages labeled Amusement
  emoticon ";)"         Kevin: hehe
  emoticon ":)"         Ray: hahahaah
    laughter            Stef: lol ok derek :)
 emoticon ";-)"         Ray: He never sleeps -- you know that.
      "fun"             Pascal: but I think it could be interesting
laughter length             for Extreeeeeeeeeeme photometry
       "p"                  study ;-)
# people names
     "sleep"
       "of"


      Scientific Collaboration & Creativity Lab   2/27/2013       21
Specialized Features
• Count words based on the data
• Medium-specific features
   – Emoticons, punctuation…
• Context-specific features
   – People names, jargon…
• Affect-specific features
   – Swearing vs. emoticons




      Scientific Collaboration & Creativity Lab   2/27/2013   22
5:17:48   Marcel ok, so let's cycle the stuff                             September, 2006
5:18:04     Rick ok…
5:18:40   Marcel damn mouse cutandpast
5:19:03      Ray off 1 right? then on 1?
5:19:32   Marcel have you telnet sdsugreen ??
5:19:58      Ray director on lbl2 looks dead
5:20:34   Marcel ok, one thind at a time. have you cycled the baytech on sdsugreen ?
5:20:36      Ray what is best way to revive it
5:20:39            baytech
5:20:40            yes
5:20:46            not sdsu
5:21:08            go ahead and do it I am not evneon this **** shift...grrr
5:21:22   Marcel ok, maybe we have to kill director and restart it mkanually
5:21:32      Ray yeah but that's tricky; all these damn arguments
5:23:53     Rick emile, I have no idea what's going on here
5:23:57            only that it is bad


          Scientific Collaboration & Creativity Lab               2/27/2013          23
5:17:48   Marcel ok, so let's cycle the stuff                             September, 2006
5:18:04     Rick ok…
5:18:40   Marcel damn mouse cutandpast
5:19:03      Ray off 1 right? then on 1?
5:19:32   Marcel have you telnet sdsugreen ??
5:19:58      Ray director on lbl2 looks dead
5:20:34   Marcel ok, one thind at a time. have you cycled the baytech on sdsugreen ?
5:20:36      Ray what is best way to revive it
5:20:39            baytech
5:20:40            yes
5:20:46            not sdsu
5:21:08            go ahead and do it I am not evneon this **** shift...grrr
5:21:22   Marcel ok, maybe we have to kill director and restart it mkanually
5:21:32      Ray yeah but that's tricky; all these damn arguments
5:23:53     Rick emile, I have no idea what's going on here
5:23:57            only that it is bad


          Scientific Collaboration & Creativity Lab               2/27/2013          24
Classifier    F-measure        Precision    Recall   Accuracy
Naïve Bayes        0.650           0.637      0.691         0.637
Logistic Reg.      0.730           0.731      0.731         0.730
SVM (SMO)          0.759           0.766      0.751         0.761
   C4.5 (J48)      0.700           0.724      0.680         0.710




  Scientific Collaboration & Creativity Lab     2/27/2013           25
Support Vector Machine
• Accurate
• Fast




                                   # “ok”
• Transparent



                                                  # swear words
                                            “frustration” applies
                                            “frustration” does not apply


     Scientific Collaboration & Creativity Lab      2/27/2013      26
Support Vector Machine
• Accurate
• Fast




                                   # “ok”
                                                 ?
• Transparent



                                                     # swear words
                                            “frustration” applies
                                            “frustration” does not apply


     Scientific Collaboration & Creativity Lab        2/27/2013      27
Precision   Recall
                0.0   0.1   0.2   0.3   0.4   0.5      0.6   0.7   0.8   0.9   1.0

     interest
 amusement
 considering
  agreement
  annoyance
   confusion
  acceptance
apprehension
  frustration
  supportive
    surprise
 anticipation
    serenity

        Scientific Collaboration & Creativity Lab              2/27/2013             28
Interpretability
• How is the classifier
  making decisions?




                                    # “ok”
• What features are
  important in the model?


                                                   # swear words
                                             “frustration” applies
                                             “frustration” does not apply


      Scientific Collaboration & Creativity Lab      2/27/2013      29
Feature importance
  Amusement             Messages labeled Amusement
  emoticon ";)"         Kevin: hehe
  emoticon ":)"         Ray: hahahaah
    laughter            Stef: lol ok derek :)
 emoticon ";-)"         Ray: He never sleeps -- you know that.
      "fun"             Pascal: but I think it could be interesting
laughter length             for Extreeeeeeeeeeme photometry
       "p"                  study ;-)
# people names
     "sleep"
       "of"


      Scientific Collaboration & Creativity Lab   2/27/2013       30
Interpretable Classifiers
• Explain classification errors
• Suggest improvement strategies
• Discover interesting anomalies




      Scientific Collaboration & Creativity Lab   2/27/2013   31
Future Work




Scientific Collaboration & Creativity Lab   2/27/2013   32
Sequential Modeling
5:19:58      Ray director on lbl2 looks dead
5:20:34   Marcel ok, one thind at a time. have you cycled the baytech on sdsugreen ?
5:20:36      Ray what is best way to revive it
5:20:39           baytech
5:20:40           yes
5:20:46           not sdsu
5:21:08           go ahead and do it I am not evneon this **** shift...grrr
5:21:22   Marcel ok, maybe we have to kill director and restart it mkanually
5:21:32      Ray yeah but that's tricky; all these damn arguments
5:23:53     Rick emile, I have no idea what's going on here
5:23:57           only that it is bad




          Scientific Collaboration & Creativity Lab              2/27/2013         33
Interactive Visual Analysis




Scientific Collaboration & Creativity Lab   2/27/2013   34
Affect in Twitter
                   45000



                   40000



                   35000



                   30000
Number of Tweets




                   25000




                                                               game resumes
                   20000




                                                                 blackout




                                                                                                              game over
                                                    halftime




                                                                              game resumes
                                 kickoff




                   15000



                   10000



                   5000



                      0




                                                      Time (EST), 2/3/2013                   positive   negative          neutral




                           Scientific Collaboration & Creativity Lab                             2/27/2013                          35
Classify…
                                      • Positive/negative/neutral
                                        sentiment
                                      • Highly granular emotions
                                      • Anything else you can label
  github.com/etcgroup/aloe
                                      In…
Download it, use it, & tell us what   • longer, formal documents (blog
           you think!                   posts, reviews)
                                      • individual sentences
        Michael Brooks                • instant messages
       mjbrooks@uw.edu                • tweets
http://depts.washington.edu/sccl
                                      • Anything else you can put in CSV


         Scientific Collaboration & Creativity Lab   2/27/2013        36
Statistical Affect Detection in Collaborative Chat

Contenu connexe

En vedette

1.1 menyatakan kuantiti secara intuitif
1.1 menyatakan kuantiti secara intuitif1.1 menyatakan kuantiti secara intuitif
1.1 menyatakan kuantiti secara intuitifshahfira
 
Presentacion historia 1
Presentacion historia 1Presentacion historia 1
Presentacion historia 1salon36ulsa
 
Pokok krismas
Pokok krismasPokok krismas
Pokok krismasshahfira
 
Numerasi k1 (mengenal_nombor)
Numerasi k1 (mengenal_nombor)Numerasi k1 (mengenal_nombor)
Numerasi k1 (mengenal_nombor)shahfira
 
Numerasi k2 (membilang_11_-_20)
Numerasi k2 (membilang_11_-_20)Numerasi k2 (membilang_11_-_20)
Numerasi k2 (membilang_11_-_20)shahfira
 
Numerasi k2 (membilang_0_-_10)
Numerasi k2 (membilang_0_-_10)Numerasi k2 (membilang_0_-_10)
Numerasi k2 (membilang_0_-_10)shahfira
 
Numerasi k2 (membilang 0 -10)
Numerasi k2 (membilang 0 -10)Numerasi k2 (membilang 0 -10)
Numerasi k2 (membilang 0 -10)shahfira
 
Numerasi k2 (membilang_11_-_20)
Numerasi k2 (membilang_11_-_20)Numerasi k2 (membilang_11_-_20)
Numerasi k2 (membilang_11_-_20)shahfira
 
Numerasi k1 (pra_nombor)
Numerasi k1 (pra_nombor)Numerasi k1 (pra_nombor)
Numerasi k1 (pra_nombor)shahfira
 

En vedette (13)

IPL
IPLIPL
IPL
 
Unit 1
Unit 1Unit 1
Unit 1
 
1.1 menyatakan kuantiti secara intuitif
1.1 menyatakan kuantiti secara intuitif1.1 menyatakan kuantiti secara intuitif
1.1 menyatakan kuantiti secara intuitif
 
Urok1
Urok1Urok1
Urok1
 
Presentacion historia 1
Presentacion historia 1Presentacion historia 1
Presentacion historia 1
 
Pokok krismas
Pokok krismasPokok krismas
Pokok krismas
 
Robocop
RobocopRobocop
Robocop
 
Numerasi k1 (mengenal_nombor)
Numerasi k1 (mengenal_nombor)Numerasi k1 (mengenal_nombor)
Numerasi k1 (mengenal_nombor)
 
Numerasi k2 (membilang_11_-_20)
Numerasi k2 (membilang_11_-_20)Numerasi k2 (membilang_11_-_20)
Numerasi k2 (membilang_11_-_20)
 
Numerasi k2 (membilang_0_-_10)
Numerasi k2 (membilang_0_-_10)Numerasi k2 (membilang_0_-_10)
Numerasi k2 (membilang_0_-_10)
 
Numerasi k2 (membilang 0 -10)
Numerasi k2 (membilang 0 -10)Numerasi k2 (membilang 0 -10)
Numerasi k2 (membilang 0 -10)
 
Numerasi k2 (membilang_11_-_20)
Numerasi k2 (membilang_11_-_20)Numerasi k2 (membilang_11_-_20)
Numerasi k2 (membilang_11_-_20)
 
Numerasi k1 (pra_nombor)
Numerasi k1 (pra_nombor)Numerasi k1 (pra_nombor)
Numerasi k1 (pra_nombor)
 

Dernier

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 

Dernier (20)

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 

Statistical Affect Detection in Collaborative Chat

  • 1. Statistical Affect Detection in Collaborative Chat CSCW 2013: Mining Social Media Data, Feb. 23 Michael Brooks, Katie Kuksenok, Megan K. Torkildson, Daniel Perry, John J. Robinson, Taylor Jackson Scott, Ona Anicello, Ariana Zukowski, Paul Harris, Cecilia R. Aragon Scientific Collaboration & Creativity Lab
  • 2. Scientific Collaboration & Creativity Lab 2/27/2013 2
  • 3. June, 2007 6:07:57 Ray cool, it worked amusement, relief 6:08:04 Matt woot excitement, joy 6:08:07 Ray awesome, I don't think he needs that acceptance, no affect long of a sleep after turning it off 6:08:47 We enhanced eready to detect the no affect sticking 6:08:58 Matt good job supportive, acceptance 6:09:21 seems it did well there happiness, no affect 6:09:26 Ray yeah, pretty cool huh? interest, agreement, happiness 6:09:43 Matt helps keep me from having to stopaic no affect and restart 6:09:55 Ray indeed, that was the point agreement Scientific Collaboration & Creativity Lab 2/27/2013 3
  • 4. Nearby Supernova Factory • 30 astrophysicists • US / France • Daily remote operation of telescope • Rely on chat to communicate Scientific Collaboration & Creativity Lab 2/27/2013 4
  • 5. 5
  • 6. 6
  • 7. SNfactory Chat Logs • Four years of logs - 449,684 messages • Manual coding for affective expressions – 27,344 chat messages coded – 1-5 coders per message – 30 affect codes – Multiple codes allowed Scott et al. SIGDOC 2012. Adapting Grounded Theory to Construct a Taxonomy of Affect in Collaborative Online Chat. Scientific Collaboration & Creativity Lab 2/27/2013 7
  • 8. June, 2007 6:07:57 Ray cool, it worked amusement, relief 6:08:04 Matt woot excitement, joy 6:08:07 Ray awesome, I don't think he needs that acceptance, no affect long of a sleep after turning it off 6:08:47 We enhanced eready to detect the no affect sticking 6:08:58 Matt good job supportive, acceptance 6:09:21 seems it did well there happiness, no affect 6:09:26 Ray yeah, pretty cool huh? interest, agreement, happiness 6:09:43 Matt helps keep me from having to stopaic no affect and restart 6:09:55 Ray indeed, that was the point agreement Scientific Collaboration & Creativity Lab 2/27/2013 8
  • 9. Top 13 Affect Codes Times Used Reliability (Kappa) int… 4351 interest 0.808 am… 3213 amusement 0.611 co… 1763 considering 0.49 agr… 1623 agreement 0.491 an… 1212 annoyance 0.77 co… 1125 confusion 0.615 acc… 975 acceptance 0.657 ap… 799 apprehension 0.529 fru… 541 frustration 0.55 sup… 518 supportive 0.583 sur… 464 surprise 0.543 ant… 426 anticipation 0.424 ser… 369 serenity 0.602 Scientific Collaboration & Creativity Lab 2/27/2013 9
  • 10. Linguistic Inquiry and Word Count (LIWC) • Detects words for Positive / Negative Emotions I wish every day Positive: 15% could be sunny Negative: 8% and warm. Rain … makes me angry. Scientific Collaboration & Creativity Lab 2/27/2013 10
  • 11. June, 2005 11:44:08 Gabri ok that's better relief, serenity 11:44:17 Marcel GREAT ! excitement, happiness, relief, joy 11:44:17 Gabri let's start aic and see anticipation, no affect 11:44:23 Marcel yes ... no affect 11:44:31 Derek Great what? confusion 11:44:32 Gabri can you do that? interest, no affect 11:44:50 derek.. it seems that now the focus is ok no affect 11:45:04 and we can finally start observing no affect 11:45:23 Derek Oh good! happiness, relief, joy 11:45:48 I have been waiting for this moment, because I amusement want to leave the room and get my midnight snack. ;) 11:46:54 Gabri go... amusement, no affect 11:47:02 and enjoy your snack amusement, no affect 11:47:13 Derek HEhe. amusement 11:47:18 I will bring it back here of course. amusement Scientific Collaboration & Creativity Lab 2/27/2013 11
  • 12. The telescope is stuck! >:( frustration The telescope is stuuuuuuuuuck... annoyance The telescope is stuck?? confusion Scientific Collaboration & Creativity Lab 2/27/2013 12
  • 13. • Word counts • Emoticons • Word sets – Swear words – Pronouns – Negations – Participant names • Characters – Capitalization – Letter repetition – Punctuation • Metadata – segment duration, length, rate Scientific Collaboration & Creativity Lab 2/27/2013 13
  • 14. • Word counts • Emoticons • Word sets – Swear words – Pronouns – Negations – Participant names • Characters – Capitalization – Letter repetition – Punctuation • Metadata – segment duration, length, rate Scientific Collaboration & Creativity Lab 2/27/2013 14
  • 15. Emoticons Naomi: I think we'd better stopaic... :( sadness Matt: today was a gym + laundry day :) amusement, happiness Marcel: and she can't teach over an ssh- amusement channel ;-) Scientific Collaboration & Creativity Lab 2/27/2013 15
  • 16. Word Sets Swear Words Ray: why the **** doesn't stop_script ******* rage STOP THE ******* SCRIPT Matt: ******* ******* ******* I think I broke it frustration, anger, apprehension, embarrassment Negations Paul: but I wouldn't hazzard a guess apprehension Ray: cannot talk to camera frustration, no-affect Scientific Collaboration & Creativity Lab 2/27/2013 16
  • 17. Character Features Letter Repetition Ray: noooooooooooooooo, it must be stopped annoyance, anger, fear Marcel: AAaah too late, they will find meeee amusement Punctuation Rick: looks like something bad happened here... apprehension Rene: 1 month before max??!? surprise, confusion, considering Capitalization Marcel: ON TARGET ! relief, joy Paul: we must set-up adopt an EXPLODING STAR amusement, no-affect Scientific Collaboration & Creativity Lab 2/27/2013 17
  • 18. Feature Value Alice: ok, so where was “ok” 1 the ******* SN on the “telescope” 0 image? “where” 1 “SN” 1 “image” 1 question marks 1 swears 1 emoticon :) 0 1st person pronouns 0 capitals 2 repetition 0 punctuation 1 length 45 … Scientific Collaboration & Creativity Lab 2/27/2013 18
  • 19. Feature importance Confusion Messages labeled Confusion ???? length Ben: ??? - the answer is likely found in # question marks the otsim code "understand" Marcel: well ... I'm not so sure ... "confus_" Gary: Why do we care at all then? "why" Ray: ummm I mean how does it get to "what" the header "nothing" "wrong" msg. length "thought" Scientific Collaboration & Creativity Lab 2/27/2013 19
  • 20. Feature importance Apprehension Messages labeled Apprehension "bad" Pascal: the problem is than the "something" automated detection will not work ... "problem" too much galaxy "we" Ray: But now bad stuff in window "seem" Ben: pascal, we had a problem with "too" do_fchart msg. length Gabriel: So something is completely "not" wrong # 3rd sg. Pronouns # swearing Scientific Collaboration & Creativity Lab 2/27/2013 20
  • 21. Feature importance Amusement Messages labeled Amusement emoticon ";)" Kevin: hehe emoticon ":)" Ray: hahahaah laughter Stef: lol ok derek :) emoticon ";-)" Ray: He never sleeps -- you know that. "fun" Pascal: but I think it could be interesting laughter length for Extreeeeeeeeeeme photometry "p" study ;-) # people names "sleep" "of" Scientific Collaboration & Creativity Lab 2/27/2013 21
  • 22. Specialized Features • Count words based on the data • Medium-specific features – Emoticons, punctuation… • Context-specific features – People names, jargon… • Affect-specific features – Swearing vs. emoticons Scientific Collaboration & Creativity Lab 2/27/2013 22
  • 23. 5:17:48 Marcel ok, so let's cycle the stuff September, 2006 5:18:04 Rick ok… 5:18:40 Marcel damn mouse cutandpast 5:19:03 Ray off 1 right? then on 1? 5:19:32 Marcel have you telnet sdsugreen ?? 5:19:58 Ray director on lbl2 looks dead 5:20:34 Marcel ok, one thind at a time. have you cycled the baytech on sdsugreen ? 5:20:36 Ray what is best way to revive it 5:20:39 baytech 5:20:40 yes 5:20:46 not sdsu 5:21:08 go ahead and do it I am not evneon this **** shift...grrr 5:21:22 Marcel ok, maybe we have to kill director and restart it mkanually 5:21:32 Ray yeah but that's tricky; all these damn arguments 5:23:53 Rick emile, I have no idea what's going on here 5:23:57 only that it is bad Scientific Collaboration & Creativity Lab 2/27/2013 23
  • 24. 5:17:48 Marcel ok, so let's cycle the stuff September, 2006 5:18:04 Rick ok… 5:18:40 Marcel damn mouse cutandpast 5:19:03 Ray off 1 right? then on 1? 5:19:32 Marcel have you telnet sdsugreen ?? 5:19:58 Ray director on lbl2 looks dead 5:20:34 Marcel ok, one thind at a time. have you cycled the baytech on sdsugreen ? 5:20:36 Ray what is best way to revive it 5:20:39 baytech 5:20:40 yes 5:20:46 not sdsu 5:21:08 go ahead and do it I am not evneon this **** shift...grrr 5:21:22 Marcel ok, maybe we have to kill director and restart it mkanually 5:21:32 Ray yeah but that's tricky; all these damn arguments 5:23:53 Rick emile, I have no idea what's going on here 5:23:57 only that it is bad Scientific Collaboration & Creativity Lab 2/27/2013 24
  • 25. Classifier F-measure Precision Recall Accuracy Naïve Bayes 0.650 0.637 0.691 0.637 Logistic Reg. 0.730 0.731 0.731 0.730 SVM (SMO) 0.759 0.766 0.751 0.761 C4.5 (J48) 0.700 0.724 0.680 0.710 Scientific Collaboration & Creativity Lab 2/27/2013 25
  • 26. Support Vector Machine • Accurate • Fast # “ok” • Transparent # swear words “frustration” applies “frustration” does not apply Scientific Collaboration & Creativity Lab 2/27/2013 26
  • 27. Support Vector Machine • Accurate • Fast # “ok” ? • Transparent # swear words “frustration” applies “frustration” does not apply Scientific Collaboration & Creativity Lab 2/27/2013 27
  • 28. Precision Recall 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 interest amusement considering agreement annoyance confusion acceptance apprehension frustration supportive surprise anticipation serenity Scientific Collaboration & Creativity Lab 2/27/2013 28
  • 29. Interpretability • How is the classifier making decisions? # “ok” • What features are important in the model? # swear words “frustration” applies “frustration” does not apply Scientific Collaboration & Creativity Lab 2/27/2013 29
  • 30. Feature importance Amusement Messages labeled Amusement emoticon ";)" Kevin: hehe emoticon ":)" Ray: hahahaah laughter Stef: lol ok derek :) emoticon ";-)" Ray: He never sleeps -- you know that. "fun" Pascal: but I think it could be interesting laughter length for Extreeeeeeeeeeme photometry "p" study ;-) # people names "sleep" "of" Scientific Collaboration & Creativity Lab 2/27/2013 30
  • 31. Interpretable Classifiers • Explain classification errors • Suggest improvement strategies • Discover interesting anomalies Scientific Collaboration & Creativity Lab 2/27/2013 31
  • 32. Future Work Scientific Collaboration & Creativity Lab 2/27/2013 32
  • 33. Sequential Modeling 5:19:58 Ray director on lbl2 looks dead 5:20:34 Marcel ok, one thind at a time. have you cycled the baytech on sdsugreen ? 5:20:36 Ray what is best way to revive it 5:20:39 baytech 5:20:40 yes 5:20:46 not sdsu 5:21:08 go ahead and do it I am not evneon this **** shift...grrr 5:21:22 Marcel ok, maybe we have to kill director and restart it mkanually 5:21:32 Ray yeah but that's tricky; all these damn arguments 5:23:53 Rick emile, I have no idea what's going on here 5:23:57 only that it is bad Scientific Collaboration & Creativity Lab 2/27/2013 33
  • 34. Interactive Visual Analysis Scientific Collaboration & Creativity Lab 2/27/2013 34
  • 35. Affect in Twitter 45000 40000 35000 30000 Number of Tweets 25000 game resumes 20000 blackout game over halftime game resumes kickoff 15000 10000 5000 0 Time (EST), 2/3/2013 positive negative neutral Scientific Collaboration & Creativity Lab 2/27/2013 35
  • 36. Classify… • Positive/negative/neutral sentiment • Highly granular emotions • Anything else you can label github.com/etcgroup/aloe In… Download it, use it, & tell us what • longer, formal documents (blog you think! posts, reviews) • individual sentences Michael Brooks • instant messages mjbrooks@uw.edu • tweets http://depts.washington.edu/sccl • Anything else you can put in CSV Scientific Collaboration & Creativity Lab 2/27/2013 36

Notes de l'éditeur

  1. Researchers working with social media have more data available than ever before.There is great potential for new insights, but the data sets are very large and complex. How can we help people understand data sets collected from social media and other online communication?Our research group is studying how a combination of visualization and machine learning can be integrated into a qualitative research workflow to help researchers dig into these new data sources in a rich, but also scalable way.
  2. In this paper, we focus on a large collection of chat logs from scientists working together on a specific project.Our group is doing ongoing qualitative research to understand how, when, and why the scientists express emotion, or affect, and how affect relates to creativity and problem solving in this data set.The data set is too large to manually code it ourselves, and privacy and specialized domain knowledge prevent us from using something like Mechanical Turk.In this talk, I will present some of the issues we have explored around using machine learning to automatically label the data, in support of scalable rich analysis.I will focus on the importance of developing a diverse, specialized feature set and the use of interpretable classification algorithms.
  3. I’ll start by giving a bit of background about the data…
  4. Ray and Matt are discussing a new program that Ray created to automatically un-stick the telescope, saving the scientists a lot of time.Many lines have multiple types of affect, while some lines have no affect.
  5. Most affect codes are very rare.Reliability ranges from 0.4 to 0.8
  6. Before I go on…LIWC is an popular text analysis tool that can be used for finding emotions or sentiment in text.LIWC processes blocks of text, counting words that belong to specific sets of dictionary words that have been previously determined to have particular meanings.This is called a lexicon-based approach.The words sunny and warm are part of LIWC’s Positive Affect lexicon, while angry is part of its Negative Affect lexicon.So, LIWC would output that this text has two positive words and one negative word.
  7. For data sets like ours, we believe that this kind of approach is not appropriate.While LIWC’s validity has been carefully studied for very narrow domains of English writing, informal online communications such as chat messages and tweets use a lot of domain-specific vocabulary and non-standard textual cues to communicate affect, almost becoming another language entirely. The medium and the context of communication are often critical to correctly understanding emotional content.
  8. Let me illustrate this with a quick example. This is a chat message rewritten three ways.LIWC is not built to recognize expressions such as emoticons, or intentionally mispelled words. Punctuation cues are not taken into account.Furthermore, in general English, a word like stuck may not have strong emotional connotations, but in our data set, it is used when scientists are struggling with telescope problems. Therefore it is quite an effective way to recognize frustration, for example. LIWC and other tools that use standard English lexicons will miss out on these signals.So if we aren’t going to use a predefined, validated lexicon of affect-laden words, what will we use to recognize affect?
  9. We based our features on a combination of previous literature and our knowledge of this chat data set we were working with.
  10. We look at all of the words that occur anywhere in the training data and select the most common 4-600 of those.Each becomes a feature that our classifiers can use to recognize affect. The words do not come from a predefined list, but from the data itself.This helps us pick up on jargon and other unconventional word usage.
  11. Using a list of over 2000 punctuation patterns recognized as emoticons, we also add the most frequently occurring emoticons to the feature set.
  12. In addition to these corpus-based features, we have a several specific types of words that we look for. So, we have a feature for the # of swear words in the message, or the number of negation words.
  13. We look at character-level features like the number of repeated consecutive letters, sequences of exclamation points, or the number of capital letters.These are used extensively in chat messages and other informal online communication to signal emotion, mood, or affect.
  14. Here’s an example to illustrate how this works.On the right, is a subset of the features that we extract from the message.In reality the list is about 800 features long.
  15. I’m going to skip ahead for a moment to some results.One we train and evaluate classifiers for the affect codes that we want to automatically label, one thing we can do is look and see which of those 800+ features were actually important.This example shows the top 10 most highly weighted features for the classifier trained to recognize confusion.On the right are a few example messages that our coders labeled with confusion.Clearly, the presence of question marks and certain key words (understand, why, what…) are useful for knowing when someone is confused.
  16. Compare that to the top features for Apprehension.A different set of key words has risen to the top…, in addition to the number of 3rdsg pronouns and swear words.The examples on the right can help you see how those words are used and why they might be associated with apprehension.
  17. And for amusement, emoticons and laughter expressions were the most useful features.Note that the presence of names of specific scientists were also important factors in labeling for amusement.
  18. The conclusion we want to stress is that for communication that resembles chat, specialized features are critical for recognizing a wide range of affect codes.Features that were intimately based on the data (word counts and emoticons) but also features specific to the communication medium (emoticons and punctuation) were highly utilized.And the usefulness of each feature varied greatly from one type of affect to another.
  19. Now, I’llexplain in more detail how those features are used in classification, and why we strongly recommend using interpretable, transparent classification algorithms for automated or partially automated coding as part of qualitative research.As I’ve said, we focused only on the 13 most frequently used types of affect. We created one binary classifier for each affect code.
  20. This means that the problem facing the classifier is the following: Given Ray’s message “what is the best way to revive it”, does the code frustration apply?
  21. We compared the performance of a wide variety of classification algorithms, a few of which are shown here. We selected a linear support vector machine because it had a very promising performance characteristics, but also because it is fast to train and use, and provides a level of transparency to its inner workings not afforded by lots of other algorithms.
  22. I’ll explain a little about how linear SVMs are used to classify text.Let’s say that you have only two features, #ok and #swears. The messages in your training data can each be plotted in this 2D space.In this example there is a pretty clear separation between those that were manually labeled with the frustration code and those which were not.When you train an SVM classifier on this data, it finds a line that best separates the frustrated messages from the non-frustrated messages (according to a particular definition of “best separates”). Such as this one.
  23. Then, given a new unlabeled message with few swear words and a medium number of “ok”s, the classifier can label it as non-frustrated because it falls on that side of the line.
  24. This chart shows precision and recall from 10-fold cross validation for each of our 13 affect codes, using balanced data.Precision is the percent of messages out of all of the messages that the classifier labeled as positive, which were truly supposed to be positive.Recall is the percent out of all of the truly positive messages that the classifier successfully labeled as positive.So, performance is between 60 and 80% for most codes, with a high 93% for interest.But, how can we know if these classifiers are actually useful for automatically coding chat messages for our research?
  25. Now, this is what I meant when I said the SVM is relatively transparent or interpretable.Supposed we learned the following model from the data.From this, we can see that swear words have more predictive power for frustration, while # of “ok” hardly makes any difference.In other words, by looking at the slope of the line, we can find out which features were the most important.
  26. This is exactly where these tables from earlier came from.Examination of the SVM feature weights gives us a very easy way to gain a measure of insight into how and why the classifier behaves the way it does, which can help us understand how useful it might be for automatic coding.
  27. And in general, webelieve that for this kind of application, understand how/why the classifier does or doesn’t work may be far more important than optimizing specific classification performance metrics (like precision, recall, accuracy, f1 score)
  28. Sequential modeling approaches such as hidden-markov modelsContext is clearly important to understanding the emotion communicated in chat messages. Looking at messages in isolation can only get you so far.Sequential modeling techniques can more directly take contextual information into account.
  29. Further, we are studying how visual analytics and interactive machine learning can be combined to create powerful tools for analyzing large social communication data sets.
  30. Finally, we are extending this work by developing new features and algorithms for processing tweets, where data set size can easily extends into the millions of messages, and different signals are used to communicate affect.
  31. We have published the code from this study on GitHub, as a Java program called ALOE.ALOE uses the Weka machine learning library, and can easily be extended and used for affect classification and other text classification work. We invite you to try it out and let us know what you think.Questions?