SlideShare une entreprise Scribd logo
1  sur  23
When socialbots attack:
Modeling susceptibility of users in online social networks
      Claudia Wagner, Silvia Mitter, Christian Körner, Markus Strohmaier
                                                         Lyon, 16.4.2012
What are socialbots?
A socialbot is a piece of software that controls a user
account in an online social network and passes itself of as
a human being
3
                                                     Danger of socialbots
         Social Engineering
              Gaining access to secure objects by exploiting human
              psychology rather than using hacking techniques
              Harvest private user data such as email addresses, phone
              numbers, and other personal data that have monetary
              value
         Spread Misinformation
              Ratkiewicz et al. describe the use of Twitter bots to run
              smear campaigns during the 2010 U.S. midterm elections.

    J. Ratkiewicz, M. Conover, M. Meiss, B. Goncalves, S. Patil, A. Flammini, and F. Menczer. Truthy:
    mapping the spread of astroturf in microblog streams. In Proceedings of the 20th international
    conference companion on World wide web, WWW '11, pages
Danger of socialbots
   Snowball effects
        Boshmaf et al. show that
        Facebook can be infiltrated by
        social bots sending friend
        requests. 102 socialbots, 6
        weeks, 3.517 friend requests and
        2.079 infections
        Average reported acceptance
        rate: 59,1% up to 80% depending
        on how many mutual friends the
        social bots had with the infiltrated
        users
Y. Boshmaf, I. Muslukhov, K. Beznosov, and M. Ripeanu. The socialbot network. In Proceedings
of the 27th Annual Computer Security Applications Conference, page 93. ACM Press, Dec 2011.
How likely will she
                                                             be infected by a bot
                                                           Experimental Setup
                                                                       ?



    Whom shall we protect to avoid large scale infiltration due to
                       snowball effects?



                     Who is a bot? Whom shall we eliminate?


                 Is she a bot?


src: http://adobeairstream.com/green/a-natural-predicament-sustainability-in-the-21st-century/
Experimental Setup
Two-stage approach
  Predict Infections (binary classification task)
    Who is susceptible for bot attacks – i.e. who gets
    infected?


  Predict Infection level (regression task)
    How susceptible is a user – i.e. how often does a user
    interact with bots?


Dataset: Social Bot Challenge 2011
Social Bot Challenge 2011
Competition organized by Tim Hwang
Aim was to develop socialbots that persuade 500 randomly Twitter
users (targets) to interact with them
Targets have a topic in common: cats
Teams got points if targets replied to, mentioned, retweeted or
followed their lead bot
14 days during which teams were allowed to develop their social
bots.
Game started on the Jan 23rd 2011 (day 1) and ended Feb 5th 2011
(day 14)
At the 30th of January (day 8) the teams were allowed to update
their codebase
#users susceptible
            0   20       40               60   80




       2
       4
       6
       8
days
       10
       12
       14
                                               Dataset
Feature Engineering
      How likely will this user become infected?




User Network



                    Behavior
                                    Content
Network Features
3 directed networks: Follow, retweet and interaction
(retweet, reply, mention and follow) network
Hub and Authority Score (HITS)
  High authority score node has many incoming edges from
  nodes with a high hub score
  High hub score node has many outgoing edges to nodes
  with a high authority score
In-degree and Out-degree
Clustering Coefficient
  number of actual links between the neighbors of a node
  divided by the number of possible links between them
Behavioral Features
      Informational Coverage
      Conversational Coverage
      Question Coverage
      Social Diversity
      Informational Diversity
      Temporal Diversity
      Lexical Diversity
      Topical Diversity
C. Wagner and M. Strohmaier. The wisdom in tweetonomies: Acquiring latent conceptual structures
From social awareness streams. In Proc. of the Semantic Search 2010 Workshop, April 2010.
Linguistic Features
      LIWC uses a word count strategy searching for over
      2300 words
      Words have previously been categorized into over 70
      linguistic dimensions.
          standard language categories
          (e.g., articles, prepositions, pronouns including first person
          singular, first person plural, etc.)
          psychological processes (e.g., positive and negative emotion
          categories, cognitive processes such as use of causation
          words, self-discrepancies),
          relativity-related words (e.g., time, verb tense, motion, space)
          traditional content dimensions
          (e.g., sex, death, home, occupation).
J. Pennebaker, M. Mehl, and K. Niederhoer. Psychological aspects of natural language use: Our words,
our selves. Annual review of psychology, 54(1):547-577, 2003.
Feature Computation
For all targets we computed the features by using all
tweets they authored during the challenge (up to the
point in time where they become infected) and a
snapshot of the follow network which was as
recorded at the 26th of January (day 4)
We only used targets which became susceptible at
day 7 or later
Features do not contain any future information (such
as tweets or social relations which were created
after a user became infected)
Predict Infections
Binary Classification of users into susceptible and non-
susceptible
Train 6 classifiers
97 Features
Compare classifiers via 10 cross-fold validation
Balanced dataset
Feature Ranking
AUC value as
ranking criterion
Top 10 Features
                                     out−degree         verb                conv variety       conv coverage             present




                      1.5




                                                                     2.0




                                                                                                                  2.0
                                                                                            1.5
Social and active




                                                                                                                  1.5
                                                                     1.5
                                                  2




                                                                                            1.0
                      1.0




                                                                                                                  1.0
                                                                     1.0




                                                                                            0.5
                                                  1




                                                                                                                  0.5
                      0.5




                                                                     0.5




                                                                                            0.0




                                                                                                                  0.0
Meformer



                                                  0




                                                                     0.0




                                                                                                                  −0.5
                      0.0




                                                                                           −0.5
                                                                     −0.5
                                                  −1




                                                                                           −1.0
                    −0.5




                                                                     −1.0




                                                                                                                  −1.5
Communicative                          0   1           0   1                   0       1               0   1              0   1


and open
                                       affect     personal pronoun                 i               conv balance          motion
                      2.0




                                                  2




                                                                                            2.0
Emotional
                      1.5




                                                                     2




                                                                                            1.5




                                                                                                                  0.5
                      1.0




                                                  1




                                                                                            1.0
                      0.5




                                                                     1




                                                                                            0.5
                      0.0




                                                                                                                  0.0
                                                  0
                    −1.5 −1.0 −0.5




                                                                                            0.0
                                                                     0




                                                                                           −1.0 −0.5




                                                                                                                  −0.5
                                                  −1




                                                                     −1




                                       0   1           0   1                   0       1               0   1              0   1
Predict Level of Infection
Which factors are correlated with users‘
susceptibility score?
Susceptibility score
  counts number of interactions between a target and
  any lead bot
Method: Regression Trees
  can handle strongly nonlinear relationships with high order
  interactions and different variable types

Fit the model to our 75% of the susceptible users
Users who
• use more negation words (e.g. not, never, no),
• tweet more regularly                                                                                         1

   (i.e. have a high temporal balance)
               Predicting Levels of Susceptibility
• use more words related with the topic death
                                                                                                             negemo




   (e.g. bury, con, kill)                                                                   < 0.40068        >= 0.40068

tend to interact more often with bots
                                                                                2
                                                                             temp_bal




                                                           < 0.37025           >= 0.37025



                                               3
                                              death




                                 < −0.16389        >= −0.16389

                          Node 4 (n = 25)                   Node 5 (n = 7)                  Node 6 (n = 9)                Node 7 (n = 15)



                      8                               8                             8                                 8



                      6                               6                             6                                 6



                      4                               4                             4                                 4



                      2                               2                             2                                 2
Predicting Levels of Susceptibility
  Rank correlation of hold-out users given their real
  susceptibility level and their predicted susceptibility level
  (Kendall τ up to 0.45)
  Goodness of fit (R2 up to 0.3)


Potential Reasons:
  Dataset is too small (we only had 81 susceptible users
  and 61% of them had level 1, 17% had level 2, 10% had
  level 3, very few users had more than 3 interactions)
Summary & Conclusions
Approach to identify susceptible users
Features of all three types contributed to the
identification
Users are more likely to be susceptible if
  they are emotional Meformers
  they use Twitter mainly for communicating
  their communications are not focused to a small circle of
  friends
  they are social and active (i.e., interact with many others)
Summary & Conclusions
Active Twitter users are more susceptible
  They are more likely to see the messages/requests of
  social bots
  But we expected that they develop some skills to
  distinguish social bots from human by using Twitter
  frequently


Predicting users’ susceptibility score is difficult
  More data and further experiments are required
Future Work
Repeating experiments on larger datasets


Taxonomy of social bot strategies
  Massive numbers of con-messages (brute force)
  Manipulation of messages through false retweets (changing pro-
  to con messages)
  Diverting attention by adding con-hashtags to pro-hashtags


Susceptibility of users for different strategies
Emotional Meformers which are active, communicative and social
                                 Experimental Setup
                are more likely to be infected




                                             THANK YOU

                              claudia.wagner@joanneum.at
                                 http://claudiawagner.info




src: http://adobeairstream.com/green/a-natural-predicament-sustainability-in-the-21st-century/

Contenu connexe

Similaire à When socialbots attack: Modeling susceptibility of users in online social networks

Workshop implications web 2.0 for IT
Workshop implications web 2.0 for ITWorkshop implications web 2.0 for IT
Workshop implications web 2.0 for ITSamuel Driessen
 
10/3 Instructional Model
10/3 Instructional Model10/3 Instructional Model
10/3 Instructional Modelcrystalpullen
 
Open Source Microblogging With Laconica
Open Source Microblogging With LaconicaOpen Source Microblogging With Laconica
Open Source Microblogging With LaconicaEvan Prodromou
 
Semantic Technology Solutions For Gov 2 0 Citizen-Friendly Recovery.Gov and D...
Semantic Technology Solutions For Gov 2 0 Citizen-Friendly Recovery.Gov and D...Semantic Technology Solutions For Gov 2 0 Citizen-Friendly Recovery.Gov and D...
Semantic Technology Solutions For Gov 2 0 Citizen-Friendly Recovery.Gov and D...ajmalik
 
10 Rules Of Social Media Strategy
10 Rules Of Social Media Strategy10 Rules Of Social Media Strategy
10 Rules Of Social Media Strategyfrankreef
 
Web 2.0 Measurement: Open Government Innovations Conference
Web 2.0 Measurement: Open Government Innovations ConferenceWeb 2.0 Measurement: Open Government Innovations Conference
Web 2.0 Measurement: Open Government Innovations ConferenceAndrew Krzmarzick
 
2020 Social Workshop on Social Media Strategy for CXOs
2020 Social Workshop on Social Media Strategy for CXOs2020 Social Workshop on Social Media Strategy for CXOs
2020 Social Workshop on Social Media Strategy for CXOs2020 Social
 
Invited talk at Future Networked Technologies / FIT-IT research calls opening...
Invited talk at Future Networked Technologies / FIT-IT research calls opening...Invited talk at Future Networked Technologies / FIT-IT research calls opening...
Invited talk at Future Networked Technologies / FIT-IT research calls opening...Paolo Massa
 
Disease spread in small-size directed networks
Disease spread in small-size directed networksDisease spread in small-size directed networks
Disease spread in small-size directed networksMarco Pautasso
 
IRJET - YouTube Spam Comments Detection
IRJET - YouTube Spam Comments DetectionIRJET - YouTube Spam Comments Detection
IRJET - YouTube Spam Comments DetectionIRJET Journal
 
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Matthew Rowe
 
Visualization for Software Analytics
Visualization for Software AnalyticsVisualization for Software Analytics
Visualization for Software AnalyticsMargaret-Anne Storey
 
DevOps and the cloud: all hail the (developer) king - Daniel Bryant, Steve Poole
DevOps and the cloud: all hail the (developer) king - Daniel Bryant, Steve PooleDevOps and the cloud: all hail the (developer) king - Daniel Bryant, Steve Poole
DevOps and the cloud: all hail the (developer) king - Daniel Bryant, Steve PooleJAXLondon_Conference
 
JAXLondon 2015 "DevOps and the Cloud: All Hail the (Developer) King"
JAXLondon 2015 "DevOps and the Cloud: All Hail the (Developer) King"JAXLondon 2015 "DevOps and the Cloud: All Hail the (Developer) King"
JAXLondon 2015 "DevOps and the Cloud: All Hail the (Developer) King"Daniel Bryant
 
Social media for PR - Communications - Success measurement
Social media for PR - Communications - Success measurementSocial media for PR - Communications - Success measurement
Social media for PR - Communications - Success measurementJose Sanchez
 
Social media for PR Communications - Success measurement plan
Social media for PR Communications - Success measurement planSocial media for PR Communications - Success measurement plan
Social media for PR Communications - Success measurement planJose Sanchez
 
2011 Wintel Targeted Attacks and a Post-Windows Environment APT Toolset
2011 Wintel Targeted Attacks and a Post-Windows Environment APT Toolset2011 Wintel Targeted Attacks and a Post-Windows Environment APT Toolset
2011 Wintel Targeted Attacks and a Post-Windows Environment APT ToolsetKurt Baumgartner
 
Web 2.0 And Social Media
Web 2.0 And Social MediaWeb 2.0 And Social Media
Web 2.0 And Social MediaManish Mohan
 

Similaire à When socialbots attack: Modeling susceptibility of users in online social networks (20)

Workshop implications web 2.0 for IT
Workshop implications web 2.0 for ITWorkshop implications web 2.0 for IT
Workshop implications web 2.0 for IT
 
The 10 3 model
The 10 3 modelThe 10 3 model
The 10 3 model
 
10/3 Instructional Model
10/3 Instructional Model10/3 Instructional Model
10/3 Instructional Model
 
Open Source Microblogging With Laconica
Open Source Microblogging With LaconicaOpen Source Microblogging With Laconica
Open Source Microblogging With Laconica
 
Semantic Technology Solutions For Gov 2 0 Citizen-Friendly Recovery.Gov and D...
Semantic Technology Solutions For Gov 2 0 Citizen-Friendly Recovery.Gov and D...Semantic Technology Solutions For Gov 2 0 Citizen-Friendly Recovery.Gov and D...
Semantic Technology Solutions For Gov 2 0 Citizen-Friendly Recovery.Gov and D...
 
10 Rules Of Social Media Strategy
10 Rules Of Social Media Strategy10 Rules Of Social Media Strategy
10 Rules Of Social Media Strategy
 
Web 2.0 Measurement: Open Government Innovations Conference
Web 2.0 Measurement: Open Government Innovations ConferenceWeb 2.0 Measurement: Open Government Innovations Conference
Web 2.0 Measurement: Open Government Innovations Conference
 
2020 Social Workshop on Social Media Strategy for CXOs
2020 Social Workshop on Social Media Strategy for CXOs2020 Social Workshop on Social Media Strategy for CXOs
2020 Social Workshop on Social Media Strategy for CXOs
 
Invited talk at Future Networked Technologies / FIT-IT research calls opening...
Invited talk at Future Networked Technologies / FIT-IT research calls opening...Invited talk at Future Networked Technologies / FIT-IT research calls opening...
Invited talk at Future Networked Technologies / FIT-IT research calls opening...
 
Disease spread in small-size directed networks
Disease spread in small-size directed networksDisease spread in small-size directed networks
Disease spread in small-size directed networks
 
IRJET - YouTube Spam Comments Detection
IRJET - YouTube Spam Comments DetectionIRJET - YouTube Spam Comments Detection
IRJET - YouTube Spam Comments Detection
 
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
 
Visualization for Software Analytics
Visualization for Software AnalyticsVisualization for Software Analytics
Visualization for Software Analytics
 
DevOps and the cloud: all hail the (developer) king - Daniel Bryant, Steve Poole
DevOps and the cloud: all hail the (developer) king - Daniel Bryant, Steve PooleDevOps and the cloud: all hail the (developer) king - Daniel Bryant, Steve Poole
DevOps and the cloud: all hail the (developer) king - Daniel Bryant, Steve Poole
 
JAXLondon 2015 "DevOps and the Cloud: All Hail the (Developer) King"
JAXLondon 2015 "DevOps and the Cloud: All Hail the (Developer) King"JAXLondon 2015 "DevOps and the Cloud: All Hail the (Developer) King"
JAXLondon 2015 "DevOps and the Cloud: All Hail the (Developer) King"
 
Social media for PR - Communications - Success measurement
Social media for PR - Communications - Success measurementSocial media for PR - Communications - Success measurement
Social media for PR - Communications - Success measurement
 
Social media for PR Communications - Success measurement plan
Social media for PR Communications - Success measurement planSocial media for PR Communications - Success measurement plan
Social media for PR Communications - Success measurement plan
 
2011 Wintel Targeted Attacks and a Post-Windows Environment APT Toolset
2011 Wintel Targeted Attacks and a Post-Windows Environment APT Toolset2011 Wintel Targeted Attacks and a Post-Windows Environment APT Toolset
2011 Wintel Targeted Attacks and a Post-Windows Environment APT Toolset
 
Web 2.0 And Social Media
Web 2.0 And Social MediaWeb 2.0 And Social Media
Web 2.0 And Social Media
 
Web 2.0
Web 2.0Web 2.0
Web 2.0
 

Plus de Claudia Wagner

Measuring Gender Inequality in Wikipedia
Measuring Gender Inequality in WikipediaMeasuring Gender Inequality in Wikipedia
Measuring Gender Inequality in WikipediaClaudia Wagner
 
Slam about "Discrimination and Inequalities in socio-computational systems"
Slam about "Discrimination and Inequalities in socio-computational systems"Slam about "Discrimination and Inequalities in socio-computational systems"
Slam about "Discrimination and Inequalities in socio-computational systems"Claudia Wagner
 
It's a Man's Wikipedia?
It's a Man's Wikipedia? It's a Man's Wikipedia?
It's a Man's Wikipedia? Claudia Wagner
 
WWW2014 Semantic Stability in Social Tagging Streams
WWW2014 Semantic Stability in Social Tagging StreamsWWW2014 Semantic Stability in Social Tagging Streams
WWW2014 Semantic Stability in Social Tagging StreamsClaudia Wagner
 
Welcome 1st Computational Social Science Workshop 2013 at GESIS
Welcome 1st Computational Social Science Workshop 2013 at GESISWelcome 1st Computational Social Science Workshop 2013 at GESIS
Welcome 1st Computational Social Science Workshop 2013 at GESISClaudia Wagner
 
The Impact of Socialbots in Online Social Networks
The Impact of Socialbots in Online Social NetworksThe Impact of Socialbots in Online Social Networks
The Impact of Socialbots in Online Social NetworksClaudia Wagner
 
Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...
Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...
Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...Claudia Wagner
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsClaudia Wagner
 
Knowledge Acquisition from Social Awareness Streams
Knowledge Acquisition from Social Awareness StreamsKnowledge Acquisition from Social Awareness Streams
Knowledge Acquisition from Social Awareness StreamsClaudia Wagner
 
The wisdom in Tweetonomies
The wisdom in TweetonomiesThe wisdom in Tweetonomies
The wisdom in TweetonomiesClaudia Wagner
 

Plus de Claudia Wagner (13)

Measuring Gender Inequality in Wikipedia
Measuring Gender Inequality in WikipediaMeasuring Gender Inequality in Wikipedia
Measuring Gender Inequality in Wikipedia
 
Slam about "Discrimination and Inequalities in socio-computational systems"
Slam about "Discrimination and Inequalities in socio-computational systems"Slam about "Discrimination and Inequalities in socio-computational systems"
Slam about "Discrimination and Inequalities in socio-computational systems"
 
It's a Man's Wikipedia?
It's a Man's Wikipedia? It's a Man's Wikipedia?
It's a Man's Wikipedia?
 
Food and Culture
Food and CultureFood and Culture
Food and Culture
 
WWW2014 Semantic Stability in Social Tagging Streams
WWW2014 Semantic Stability in Social Tagging StreamsWWW2014 Semantic Stability in Social Tagging Streams
WWW2014 Semantic Stability in Social Tagging Streams
 
Welcome 1st Computational Social Science Workshop 2013 at GESIS
Welcome 1st Computational Social Science Workshop 2013 at GESISWelcome 1st Computational Social Science Workshop 2013 at GESIS
Welcome 1st Computational Social Science Workshop 2013 at GESIS
 
The Impact of Socialbots in Online Social Networks
The Impact of Socialbots in Online Social NetworksThe Impact of Socialbots in Online Social Networks
The Impact of Socialbots in Online Social Networks
 
Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...
Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...
Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...
 
SDOW (ISWC2011)
SDOW (ISWC2011)SDOW (ISWC2011)
SDOW (ISWC2011)
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic Models
 
Topic Models
Topic ModelsTopic Models
Topic Models
 
Knowledge Acquisition from Social Awareness Streams
Knowledge Acquisition from Social Awareness StreamsKnowledge Acquisition from Social Awareness Streams
Knowledge Acquisition from Social Awareness Streams
 
The wisdom in Tweetonomies
The wisdom in TweetonomiesThe wisdom in Tweetonomies
The wisdom in Tweetonomies
 

Dernier

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Dernier (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

When socialbots attack: Modeling susceptibility of users in online social networks

  • 1. When socialbots attack: Modeling susceptibility of users in online social networks Claudia Wagner, Silvia Mitter, Christian Körner, Markus Strohmaier Lyon, 16.4.2012
  • 2. What are socialbots? A socialbot is a piece of software that controls a user account in an online social network and passes itself of as a human being
  • 3. 3 Danger of socialbots Social Engineering Gaining access to secure objects by exploiting human psychology rather than using hacking techniques Harvest private user data such as email addresses, phone numbers, and other personal data that have monetary value Spread Misinformation Ratkiewicz et al. describe the use of Twitter bots to run smear campaigns during the 2010 U.S. midterm elections. J. Ratkiewicz, M. Conover, M. Meiss, B. Goncalves, S. Patil, A. Flammini, and F. Menczer. Truthy: mapping the spread of astroturf in microblog streams. In Proceedings of the 20th international conference companion on World wide web, WWW '11, pages
  • 4. Danger of socialbots Snowball effects Boshmaf et al. show that Facebook can be infiltrated by social bots sending friend requests. 102 socialbots, 6 weeks, 3.517 friend requests and 2.079 infections Average reported acceptance rate: 59,1% up to 80% depending on how many mutual friends the social bots had with the infiltrated users Y. Boshmaf, I. Muslukhov, K. Beznosov, and M. Ripeanu. The socialbot network. In Proceedings of the 27th Annual Computer Security Applications Conference, page 93. ACM Press, Dec 2011.
  • 5. How likely will she be infected by a bot Experimental Setup ? Whom shall we protect to avoid large scale infiltration due to snowball effects? Who is a bot? Whom shall we eliminate? Is she a bot? src: http://adobeairstream.com/green/a-natural-predicament-sustainability-in-the-21st-century/
  • 6. Experimental Setup Two-stage approach Predict Infections (binary classification task) Who is susceptible for bot attacks – i.e. who gets infected? Predict Infection level (regression task) How susceptible is a user – i.e. how often does a user interact with bots? Dataset: Social Bot Challenge 2011
  • 7. Social Bot Challenge 2011 Competition organized by Tim Hwang Aim was to develop socialbots that persuade 500 randomly Twitter users (targets) to interact with them Targets have a topic in common: cats Teams got points if targets replied to, mentioned, retweeted or followed their lead bot 14 days during which teams were allowed to develop their social bots. Game started on the Jan 23rd 2011 (day 1) and ended Feb 5th 2011 (day 14) At the 30th of January (day 8) the teams were allowed to update their codebase
  • 8. #users susceptible 0 20 40 60 80 2 4 6 8 days 10 12 14 Dataset
  • 9. Feature Engineering How likely will this user become infected? User Network Behavior Content
  • 10. Network Features 3 directed networks: Follow, retweet and interaction (retweet, reply, mention and follow) network Hub and Authority Score (HITS) High authority score node has many incoming edges from nodes with a high hub score High hub score node has many outgoing edges to nodes with a high authority score In-degree and Out-degree Clustering Coefficient number of actual links between the neighbors of a node divided by the number of possible links between them
  • 11. Behavioral Features Informational Coverage Conversational Coverage Question Coverage Social Diversity Informational Diversity Temporal Diversity Lexical Diversity Topical Diversity C. Wagner and M. Strohmaier. The wisdom in tweetonomies: Acquiring latent conceptual structures From social awareness streams. In Proc. of the Semantic Search 2010 Workshop, April 2010.
  • 12. Linguistic Features LIWC uses a word count strategy searching for over 2300 words Words have previously been categorized into over 70 linguistic dimensions. standard language categories (e.g., articles, prepositions, pronouns including first person singular, first person plural, etc.) psychological processes (e.g., positive and negative emotion categories, cognitive processes such as use of causation words, self-discrepancies), relativity-related words (e.g., time, verb tense, motion, space) traditional content dimensions (e.g., sex, death, home, occupation). J. Pennebaker, M. Mehl, and K. Niederhoer. Psychological aspects of natural language use: Our words, our selves. Annual review of psychology, 54(1):547-577, 2003.
  • 13. Feature Computation For all targets we computed the features by using all tweets they authored during the challenge (up to the point in time where they become infected) and a snapshot of the follow network which was as recorded at the 26th of January (day 4) We only used targets which became susceptible at day 7 or later Features do not contain any future information (such as tweets or social relations which were created after a user became infected)
  • 14. Predict Infections Binary Classification of users into susceptible and non- susceptible Train 6 classifiers 97 Features Compare classifiers via 10 cross-fold validation Balanced dataset
  • 15. Feature Ranking AUC value as ranking criterion
  • 16. Top 10 Features out−degree verb conv variety conv coverage present 1.5 2.0 2.0 1.5 Social and active 1.5 1.5 2 1.0 1.0 1.0 1.0 0.5 1 0.5 0.5 0.5 0.0 0.0 Meformer 0 0.0 −0.5 0.0 −0.5 −0.5 −1 −1.0 −0.5 −1.0 −1.5 Communicative 0 1 0 1 0 1 0 1 0 1 and open affect personal pronoun i conv balance motion 2.0 2 2.0 Emotional 1.5 2 1.5 0.5 1.0 1 1.0 0.5 1 0.5 0.0 0.0 0 −1.5 −1.0 −0.5 0.0 0 −1.0 −0.5 −0.5 −1 −1 0 1 0 1 0 1 0 1 0 1
  • 17. Predict Level of Infection Which factors are correlated with users‘ susceptibility score? Susceptibility score counts number of interactions between a target and any lead bot Method: Regression Trees can handle strongly nonlinear relationships with high order interactions and different variable types Fit the model to our 75% of the susceptible users
  • 18. Users who • use more negation words (e.g. not, never, no), • tweet more regularly 1 (i.e. have a high temporal balance) Predicting Levels of Susceptibility • use more words related with the topic death negemo (e.g. bury, con, kill) < 0.40068 >= 0.40068 tend to interact more often with bots 2 temp_bal < 0.37025 >= 0.37025 3 death < −0.16389 >= −0.16389 Node 4 (n = 25) Node 5 (n = 7) Node 6 (n = 9) Node 7 (n = 15) 8 8 8 8 6 6 6 6 4 4 4 4 2 2 2 2
  • 19. Predicting Levels of Susceptibility Rank correlation of hold-out users given their real susceptibility level and their predicted susceptibility level (Kendall τ up to 0.45) Goodness of fit (R2 up to 0.3) Potential Reasons: Dataset is too small (we only had 81 susceptible users and 61% of them had level 1, 17% had level 2, 10% had level 3, very few users had more than 3 interactions)
  • 20. Summary & Conclusions Approach to identify susceptible users Features of all three types contributed to the identification Users are more likely to be susceptible if they are emotional Meformers they use Twitter mainly for communicating their communications are not focused to a small circle of friends they are social and active (i.e., interact with many others)
  • 21. Summary & Conclusions Active Twitter users are more susceptible They are more likely to see the messages/requests of social bots But we expected that they develop some skills to distinguish social bots from human by using Twitter frequently Predicting users’ susceptibility score is difficult More data and further experiments are required
  • 22. Future Work Repeating experiments on larger datasets Taxonomy of social bot strategies Massive numbers of con-messages (brute force) Manipulation of messages through false retweets (changing pro- to con messages) Diverting attention by adding con-hashtags to pro-hashtags Susceptibility of users for different strategies
  • 23. Emotional Meformers which are active, communicative and social Experimental Setup are more likely to be infected THANK YOU claudia.wagner@joanneum.at http://claudiawagner.info src: http://adobeairstream.com/green/a-natural-predicament-sustainability-in-the-21st-century/

Notes de l'éditeur

  1. What makes a socialbot different from self-declared bots is that hide the fact that they&apos;re robots and usually try to pursue a variety of latent goals, such as to spread information or influence users. Tim Hang defined a socialbot as a machine with social impact.
  2. And finally, recent research has shown that socialbots are extremely dangerous due to snowball effects. The more users a bot has infected in a network, the easier he can infect new users in that network. Boshmaf et al conducted in a very controversial experiment where they setup a network of 102 fb-bots which sent friend requests to others within a time period of 6 weeks. Their results show how a network of bots can infect fb user. Interestingly the average acceptance rate of friend requests was 59:1%, which, depends on howmany mutual friends the socialbots had with the inflltrated users, and can increase up to 80%.
  3. So whatcanwe do toprevent large scaleinfilitrations due tosocial bot attacks? The traditional thingistotrytoidentifybotsandeliminatethem. In ourworkwesuggest a complementaryappraochwhichaimstoidentifyuserswhoaremostsuscepibleforsocial bot attacks. Wewantedtoknowiftheseusersshowspecialcharacteristicsand
  4. Toanswerthisquestionweuse a 2-stage approach. First weaimtoidentifyuserswhoaresusceptibleto bot attacks in general– i.e., userswhobecameaffected–Wewereinterested in iftheseuserswhoanyspecificcharacteristicsoriftheseusersaraverageuserslikeyouandme.
  5. In ourexperimentweuseddatafromthesocial bot challenge 2011 –whichis a competionthatwasorganizedby...
  6. The dataset which we got contained all tweets which were published by the targets and bots during the challange and snapshots of the follow network between these users at different points in time. The figure shows how many users became susceptible at which day. One can see that most targets became susceptible at day 1. One possible explanation is the auto-follow feature which some of the targets might have used.
  7. Sincewewereinterested in thefactorsthatimpactwhether a usergetsinfectedor not, wefirsthadto design featuresthatdescribe potential factors. In ourworkweused 3 different typesoffeature: featuresthatarebased on usernetworks, featuresthatarebased on users‘ tweetingbehaviorandfeaturesthatarebased on thelinguisticsofusers‘ tweetcontent.
  8. Forthenetworkfeatureswecreated 3 different typesofusernetworksfromourdatasetandcomputedthefollowingmeasures on these 3 networks.
  9. Coveragebasedmeasuresdescribe e.g. howmanymessagesof a usercontain links orareconversationalorcontainquestionmarks.Diversitybasedmeasuresdescribe e.g. withhowmany different users‘ a usercommunicatesandhowevenlydistributed a users‘ communicationeffortsareacrosstheseusers. A userwhocommunicateswithmanyusersequallymuchwouldhave a high socialdiversitywhile a userwhotendstocommunicatewith a smallcirceloffriendshas a lowsocialdiversity.
  10. Linguistic Inquiry and Word By mapping words in tweets to these 2300 words one gets linguistic annotations of tweets which we used as features.
  11. Wecomputedourfeaturesforeachtargetuserbased on all tweetsthetargetuserhasauthoredduringthechallangeuptothepointwhen he becameinfected. Thatmeanswedid not takeanyinformationintoaccountwhichhappened after a user was alreadyinfectedwhichisimportantsincewewanttopredictinfections. Thereforeweneedtoensurethatwe do not takeanyfutureinfromationintoaccountwhichcouldfalsifyourresults. Forthefollownetworkbasedfeaturesweused a snapshotfromday 4 –allsour sample usersbecamesusceptibelatday 7 orlater.
  12. Soourfirstaim was toidentifyuserswhoarelikelytobecomeinfected. Thatmeanswehad a binaryclassificationproblemandouraim was todiffersusceptiblefrom non-susceptibleusers. Webalancedourdataset, compared 6 classifiersandconducted a 10 corss-foldvalidation. Ourresultsshowthat a generalizedboostedregressionclassifierperformed best. Thereforeweusedthisclassifiertofurtherinspectwhich variables were kost usefulfordifferentiatingbetween...
  13. weusedthebestperformingclassificationmodeltofurtherinspectwhichfeaturesweremostusefulfordifferentiatingbetween...Onecanseefromthisslidethatthe most important features is the out-degree of a user node in the interaction network.It is interesting to note that the top 3 features contain one network feature, one linguistic feature and one behavioral feature which shows that all 3 types of features seem to contribute to our task.ROC curve plots the true positive rate vs. false positive rate. Idea would be if the Area under the ROC curve would be 1.
  14. Wefurtherinspectedthefeaturedistributionsofthe top 20 featuresforeach user-class (i.e. suscepand non-suscept) togainfurtherinsightsintohowfeaturesofsusceptibleusersaredistributedandhow different theirdistributionsarefromthedistributionof non-.susceptibleusers.Best networkfeature: outdegreeofinteractionnetwork– i.e. userswhoactivlycreateinteractionswithothersaremorelikelytobecomeinfected. Best linguisticfeature: verbsandpresenttenseBest behavioralfeature: conversationalvarietyandcoverage
  15. After havingidentifiesuserswho will becomeinfectedduring an attackwe also wanttopredicttheirlevelofinfection: i.e. doestheuserinteract just oncewiththe bot or do theydevelop a closedfrienshiprelation. Thatmeanstheaimofoursecondtaskistopredicthowoftentheyinteractedwith a bot. Toadressthisquestionweusedregressiontreessincetheycan handle...
  16. By fitting the model to our dataset we learned the following tree structure which shows which features and thresholds are used internally by the model. The leaves show the distribution of the suscept score of users who were used as samples for this branch. From this tree structure we can see that…
  17. Toassessthequalityofthismodelwemeasuredthe rank correlationof hold-out usersgiventheir real suscept score andgiventheirpredictedsusceptscores. The correlationcoefficient was prettylowand also the R-squaredvalueofthemodel was prettylow. One potential reasonforthatisthesizeofourdatasetandthatwedid not havemanysamplesofuserswhohadlengthydiscussionswithbots.
  18. So letmestartconcludingmytalk. What I haveyoupresentedtodayis an approachtoidentifysuscepibleuser. Wehaveintroduced a varietyoffeatureswhichcancapturecharacteristicsofuserswhoareindeedmoresuscepibleto bot attacksthanothers.
  19. The factthatactiveTwitterusersaremoresusceptibleis on thehand not reallysurprisingsince...But on theotherhanditissurprisingsinceonewouldexpectthatactiveusersdevelopsomesortofskillytodifferbetween...
  20. Wehopethatourresearch will not onlyinform modern socialmediasecuritysystems but also supportthedevelopmentofgoodsocialbotswhichare e.g. usedtoincreasethefitnesslevelof a community.