SlideShare une entreprise Scribd logo
1  sur  15
Télécharger pour lire hors ligne
Machine Classification and Analysis of Suicide-
Related Communication on Twitter
Presentation @ ACM Hypertext 2015
Pete Burnap, Gualtiero (Walter) Colombo & Jonathan
Scourfield
Social Data Science Lab
School of Computer Science and Informatics & School of Social
Sciences
Cardiff University
@pbFeed @socdatalab
Social Data Science Lab - @socdatalab
•  Formed in 2015 out of the Collaborative Online Social Media
Observatory (COSMOS) programme of work (cosmosproject.net)
•  Mission is to continue the work of COSMOS in democratising access
to big social data (e.g. Twitter, Foursquare, Instagram) amongst the
academic, private, public and third sectors.
•  A significant proportion of research funds have been awarded to
collect and analyse social media data in the contexts of Societal
Safety and Security e.g. social tension, hate speech, crime
reporting and fear of crime, suicidal ideation
•  Working with Metropolitan Police, Department of Health,
Food Standards Agency
The Problem
•  Our previous research has studied online social
networks as “social machines” that enable spread of
malicious or potentially dangerous information (e.g.
rumour, hate speech, malware)
•  Concern about suicide and Internet has moved from
dedicated suicide websites to general social media
platforms
•  Previous research has shown spikes in recorded suicide
rates due to increased risk factors (e.g. celebrity suicide)
The Problem
•  Normalisation of suicidal language (Daine et al., 2013)
•  To date research has tended to rely on human coding of
online content – difficult to scale to ‘volume’, or suicide
notes (different state of mind?)
•  Social media analysis has yet to distinguish between
different types of suicidal communication
Research Aims
•  To explore the potential of natural language processing and
machine learning for automated identification and
differentiation of suicide-related communication in very large
social media data sets
•  This would enable those responsible for supporting safety and
wellbeing (e.g. samaritans) to establish a more realistic idea of
the volume of suicidal information online and possibly identify
emerging ‘clusters’
•  While computation is essential, the work was driven from the
s tart by a strong understanding of suicidal
communication/language with established suicide
researchers
Developing a classifier for suicide-related social media
content
•  Anonymised data from suicide discussion fora
•  Human annotated – ‘is this person suicidal?’
•  Identify (TF.IDF) terms & phrases from ‘suicidal texts’
•  Automated collection of data from Twitter & Tumblr using TF.IDF
terms
•  Human annotated sample (n=2000 1k Twitter + 1k Tumblr) –
coding frame
•  c1: Evidence of possible suicidal intent
•  c2: Campaigning (i.e. petitions etc.)
•  c3: Flippant reference to suicide
•  c4: Information or support
•  c5: Memorial or condolence
•  c6: Reporting news of someone’s suicide (not bombing)
•  c7: None of the above
Features
(Set 1) Lexical characteristics of sentences used, such as the Parts of
Speech (POS), and other language structural features, such as the
the most frequently used words and phrases. References to self
and others are also captured with POS – these terms have been
identified in previous research as being evident within suicidal
communication
(Set 2) Sentiment, affective and emotional features and levels of the
terms used within the text. Emotions such as fear, anger and
general aggressiveness are particularly prominent in suicidal
communication (WordNet Affect)
(Set 3) Language expressed in short, informal text such as social media
posts within a limited number of characters. These were
extracted from annotated Tumblr posts
Machine Classification
•  Key question here is: what are the features of suicidal
ideation, and what are the features of the other classes?
•  Accuracy important but explanatory value also crucial
•  Methods used for the classifier
• Probabilistic (Naïve Bayes), non-probabilistic linear (linear
SVM) and rule-based (Decision Tree) machine classifier
• Principal Components Analysis (1444 to 255 features)
• Improvement with ‘ensemble’ classifier designed to
incorporate diverse principal components (Rotation Forest]
Results (all)
Results (suicidal ideation)
Classifier accuracy
PCA
P 0.321 0.345 0.762 0.507
(combined)
R 0.641 0.385 0.205 0.436
F 0.427 0.364 0.323 0.469
Table 3: Confusion matrix for the best performing
classification model
classi.
c1 c2 c3 c4 c5 c6 c7
as
c1 57 0 16 0 0 0 5
c2 0 19 2 4 0 3 0
c3 13 1 142 0 0 5 16
c4 0 4 5 20 0 3 3
c5 1 1 1 0 31 1 1
c6 0 6 7 6 2 80 3
c7 18 0 20 1 2 4 98
6. DISCUSSION
In this section we analyse the main feature components pro-
duced by running the PCA procedure on the combined set
that resulted in the best set of results, as shown in Tables 1
Exam
regex
ing’ .
ideati
Other
tainin
when
that
verbs
words
and ‘
pear
a↵ect
c2: F
we ca
regula
minol
cific t
to thi
c3: A
conce
prese
F-measure: c1 = 0.690, all classes: 0.728
Predictive Features
d to suicide
information
enting sources
ws (research
of the name
lated to the
d of the ‘TV’
memorial, in-
are the com-
in the tweets
tive features
ot related to
such as gen-
hat’s wrong
tes (such as
es that could
but are also
Table 5: Principal components per class
c1 - Evidence of possible suicidal intent
0.185word list1 end it all 521+0.185end it all+0.179it all now
+0.179all now+0.175it all
0.149word list1 want to be dead 554-0.133 -0.129i think
+0.125word list1 to commit suicide 547+0.114really
0.149word list1 want to be dead 554+0.145wn a↵ect11 alarm
496-0.123number of adverb superlative 211-0.121word list7
relationship 780+0.118regEx class6 +.+report.+ 701
0.153thinking about killing+0.153about killing myself
+0.153about killing+0.147so im+0.147wn a↵ect11 misery 314
0.119number of predeterminers 206+0.117regEx class1 +.+
((cutting|depres|sui)|these|bad|sad).+(thoughts|feel)
.+ 667+0.115wn domain astrology 160-0.106bombing
0.231regEx class1 +.+(bdie).+(bmy).+bsleep.+0.177word
list want to be dead 554-0.155wn domain dentistry 113
-0.146wn a↵ect11 security 277-0.129wn a↵ect11 admiration
c2 - Campaigning (i.e. petitions etc.)
0.25 word list2 support 746-0.134wn domain racing 84
Explanatory features
•  Word-lists and regular expressions (regex) extracted from online
suicide-related discussion forums and other microblogging Web
sites provide ‘clues’ effective for the suicidal ideation class
•  Lexical and grammar features such as POSs appear mostly
ineffective
•  ‘Affective’ language very relevant (such as those represented by the
WordNet library of ‘cognitive synonyms’) and able to well represent
the affective and emotional states associated to this particular type
of language.
•  Sentiment Scores generated by software tools for sentiment
analysis appear also ineffective and either scarcely or not at all
included within the principal components predictive of each
class
Networks of Suicidal Ideation
“…shortest path of retweets of suicidal ideation
was higher than previous studies that reported
on general retweet path length. Our results
found an average of 5, while other research
reported metrics between 2 and 4.8.”
Colombo, G., Burnap, P., Hodorog, A. and Scourfield, J. (2015) ‘Analysing the connectivity and
communication of suicidal users on Twitter’, Computer Communications - available open
access http://tinyurl.com/suicidenetworks
Thanks
Questions?
@pbFeed

Contenu connexe

Tendances

Who to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanationsWho to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanationsNicola Barbieri
 
Microposts2015 - Social Spam Detection on Twitter
Microposts2015 - Social Spam Detection on TwitterMicroposts2015 - Social Spam Detection on Twitter
Microposts2015 - Social Spam Detection on Twitterazubiaga
 
Conversation Practices and Network Structure in Twitter
Conversation Practices and Network Structure in TwitterConversation Practices and Network Structure in Twitter
Conversation Practices and Network Structure in TwitterLuca Rossi
 
Epidemiological Modeling of News and Rumors on Twitter
Epidemiological Modeling of News and Rumors on TwitterEpidemiological Modeling of News and Rumors on Twitter
Epidemiological Modeling of News and Rumors on TwitterParang Saraf
 
Slides: Epidemiological Modeling of News and Rumors on Twitter
Slides: Epidemiological Modeling of News and Rumors on TwitterSlides: Epidemiological Modeling of News and Rumors on Twitter
Slides: Epidemiological Modeling of News and Rumors on TwitterParang Saraf
 

Tendances (6)

Who to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanationsWho to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanations
 
Microposts2015 - Social Spam Detection on Twitter
Microposts2015 - Social Spam Detection on TwitterMicroposts2015 - Social Spam Detection on Twitter
Microposts2015 - Social Spam Detection on Twitter
 
Conversation Practices and Network Structure in Twitter
Conversation Practices and Network Structure in TwitterConversation Practices and Network Structure in Twitter
Conversation Practices and Network Structure in Twitter
 
Epidemiological Modeling of News and Rumors on Twitter
Epidemiological Modeling of News and Rumors on TwitterEpidemiological Modeling of News and Rumors on Twitter
Epidemiological Modeling of News and Rumors on Twitter
 
Social media analysis project
Social media analysis projectSocial media analysis project
Social media analysis project
 
Slides: Epidemiological Modeling of News and Rumors on Twitter
Slides: Epidemiological Modeling of News and Rumors on TwitterSlides: Epidemiological Modeling of News and Rumors on Twitter
Slides: Epidemiological Modeling of News and Rumors on Twitter
 

Similaire à Machine Classification and Analysis of Suicide-Related Communication on Twitter

Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and SharingData-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and SharingAlex Pinto
 
Language of Politics on Twitter - 03 Analysis
Language of Politics on Twitter - 03 AnalysisLanguage of Politics on Twitter - 03 Analysis
Language of Politics on Twitter - 03 AnalysisYelena Mejova
 
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...Alexandre Sieira
 
Social Media Analytics
Social Media AnalyticsSocial Media Analytics
Social Media AnalyticsMuhammad Rifqi
 
CansecWest2019: Infosec Frameworks for Misinformation
CansecWest2019: Infosec Frameworks for MisinformationCansecWest2019: Infosec Frameworks for Misinformation
CansecWest2019: Infosec Frameworks for Misinformationbodaceacat
 
Terp breuer misinfosecframeworks_cansecwest2019
Terp breuer misinfosecframeworks_cansecwest2019Terp breuer misinfosecframeworks_cansecwest2019
Terp breuer misinfosecframeworks_cansecwest2019bodaceacat
 
Misinfosec frameworks Cansecwest 2019
Misinfosec frameworks Cansecwest 2019Misinfosec frameworks Cansecwest 2019
Misinfosec frameworks Cansecwest 2019bodaceacat
 
Future of AI-powered automation in business
Future of AI-powered automation in businessFuture of AI-powered automation in business
Future of AI-powered automation in businessLouis Dorard
 
Predicting user demographics in social networks - Invited Talk at University ...
Predicting user demographics in social networks - Invited Talk at University ...Predicting user demographics in social networks - Invited Talk at University ...
Predicting user demographics in social networks - Invited Talk at University ...Nikolaos Aletras
 
Sharing is Caring: Medindo a Eficácia de Comunidades de Compartilhamento de T...
Sharing is Caring: Medindo a Eficácia de Comunidades de Compartilhamento de T...Sharing is Caring: Medindo a Eficácia de Comunidades de Compartilhamento de T...
Sharing is Caring: Medindo a Eficácia de Comunidades de Compartilhamento de T...Alexandre Sieira
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language ModelsLeon Dohmen
 
1. Choose a Case and Complete the Project PlanHospital to Research.docx
1. Choose a Case and Complete the Project PlanHospital to Research.docx1. Choose a Case and Complete the Project PlanHospital to Research.docx
1. Choose a Case and Complete the Project PlanHospital to Research.docxjeremylockett77
 
Matthew_Davis_Slides.pptx
Matthew_Davis_Slides.pptxMatthew_Davis_Slides.pptx
Matthew_Davis_Slides.pptxreenarocky
 
Team CDTW Capstone Presentation
Team CDTW Capstone Presentation Team CDTW Capstone Presentation
Team CDTW Capstone Presentation Todd Rutherford
 
CDTW Capstone Presentation
CDTW Capstone Presentation CDTW Capstone Presentation
CDTW Capstone Presentation Todd Rutherford
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
bsides NOVA 2017 So You Want to Be a Cyber Threat Analyst eh?
bsides NOVA 2017 So You Want to Be a Cyber Threat Analyst eh?bsides NOVA 2017 So You Want to Be a Cyber Threat Analyst eh?
bsides NOVA 2017 So You Want to Be a Cyber Threat Analyst eh?Anthony Melfi
 
Tim Estes - Generating dynamic social networks from large scale unstructured ...
Tim Estes - Generating dynamic social networks from large scale unstructured ...Tim Estes - Generating dynamic social networks from large scale unstructured ...
Tim Estes - Generating dynamic social networks from large scale unstructured ...Digital Reasoning
 

Similaire à Machine Classification and Analysis of Suicide-Related Communication on Twitter (20)

Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and SharingData-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
 
Language of Politics on Twitter - 03 Analysis
Language of Politics on Twitter - 03 AnalysisLanguage of Politics on Twitter - 03 Analysis
Language of Politics on Twitter - 03 Analysis
 
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
 
Social Media Analytics
Social Media AnalyticsSocial Media Analytics
Social Media Analytics
 
CansecWest2019: Infosec Frameworks for Misinformation
CansecWest2019: Infosec Frameworks for MisinformationCansecWest2019: Infosec Frameworks for Misinformation
CansecWest2019: Infosec Frameworks for Misinformation
 
Terp breuer misinfosecframeworks_cansecwest2019
Terp breuer misinfosecframeworks_cansecwest2019Terp breuer misinfosecframeworks_cansecwest2019
Terp breuer misinfosecframeworks_cansecwest2019
 
Misinfosec frameworks Cansecwest 2019
Misinfosec frameworks Cansecwest 2019Misinfosec frameworks Cansecwest 2019
Misinfosec frameworks Cansecwest 2019
 
Future of AI-powered automation in business
Future of AI-powered automation in businessFuture of AI-powered automation in business
Future of AI-powered automation in business
 
Predicting user demographics in social networks - Invited Talk at University ...
Predicting user demographics in social networks - Invited Talk at University ...Predicting user demographics in social networks - Invited Talk at University ...
Predicting user demographics in social networks - Invited Talk at University ...
 
Sharing is Caring: Medindo a Eficácia de Comunidades de Compartilhamento de T...
Sharing is Caring: Medindo a Eficácia de Comunidades de Compartilhamento de T...Sharing is Caring: Medindo a Eficácia de Comunidades de Compartilhamento de T...
Sharing is Caring: Medindo a Eficácia de Comunidades de Compartilhamento de T...
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models
 
1. Choose a Case and Complete the Project PlanHospital to Research.docx
1. Choose a Case and Complete the Project PlanHospital to Research.docx1. Choose a Case and Complete the Project PlanHospital to Research.docx
1. Choose a Case and Complete the Project PlanHospital to Research.docx
 
Matthew_Davis_Slides.pptx
Matthew_Davis_Slides.pptxMatthew_Davis_Slides.pptx
Matthew_Davis_Slides.pptx
 
Team CDTW Capstone Presentation
Team CDTW Capstone Presentation Team CDTW Capstone Presentation
Team CDTW Capstone Presentation
 
CDTW Capstone Presentation
CDTW Capstone Presentation CDTW Capstone Presentation
CDTW Capstone Presentation
 
Cyber Portents and Precursors
Cyber Portents and PrecursorsCyber Portents and Precursors
Cyber Portents and Precursors
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
bsides NOVA 2017 So You Want to Be a Cyber Threat Analyst eh?
bsides NOVA 2017 So You Want to Be a Cyber Threat Analyst eh?bsides NOVA 2017 So You Want to Be a Cyber Threat Analyst eh?
bsides NOVA 2017 So You Want to Be a Cyber Threat Analyst eh?
 
Tim Estes - Generating dynamic social networks from large scale unstructured ...
Tim Estes - Generating dynamic social networks from large scale unstructured ...Tim Estes - Generating dynamic social networks from large scale unstructured ...
Tim Estes - Generating dynamic social networks from large scale unstructured ...
 

Dernier

2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge GraphsEleniIlkou
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdfMatthew Sinclair
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...tanu pandey
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查ydyuyu
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...SUHANI PANDEY
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)Delhi Call girls
 
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceDelhi Call girls
 
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...SUHANI PANDEY
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtrahman018755
 
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...nirzagarg
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"growthgrids
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...singhpriety023
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...Neha Pandey
 
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...roncy bisnoi
 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftAanSulistiyo
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Delhi Call girls
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrHenryBriggs2
 

Dernier (20)

2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
 
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
 
Thalassery Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call G...
Thalassery Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call G...Thalassery Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call G...
Thalassery Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call G...
 
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
 
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
 
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck Microsoft
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
 

Machine Classification and Analysis of Suicide-Related Communication on Twitter

  • 1. Machine Classification and Analysis of Suicide- Related Communication on Twitter Presentation @ ACM Hypertext 2015 Pete Burnap, Gualtiero (Walter) Colombo & Jonathan Scourfield Social Data Science Lab School of Computer Science and Informatics & School of Social Sciences Cardiff University @pbFeed @socdatalab
  • 2. Social Data Science Lab - @socdatalab •  Formed in 2015 out of the Collaborative Online Social Media Observatory (COSMOS) programme of work (cosmosproject.net) •  Mission is to continue the work of COSMOS in democratising access to big social data (e.g. Twitter, Foursquare, Instagram) amongst the academic, private, public and third sectors. •  A significant proportion of research funds have been awarded to collect and analyse social media data in the contexts of Societal Safety and Security e.g. social tension, hate speech, crime reporting and fear of crime, suicidal ideation •  Working with Metropolitan Police, Department of Health, Food Standards Agency
  • 3. The Problem •  Our previous research has studied online social networks as “social machines” that enable spread of malicious or potentially dangerous information (e.g. rumour, hate speech, malware) •  Concern about suicide and Internet has moved from dedicated suicide websites to general social media platforms •  Previous research has shown spikes in recorded suicide rates due to increased risk factors (e.g. celebrity suicide)
  • 4. The Problem •  Normalisation of suicidal language (Daine et al., 2013) •  To date research has tended to rely on human coding of online content – difficult to scale to ‘volume’, or suicide notes (different state of mind?) •  Social media analysis has yet to distinguish between different types of suicidal communication
  • 5. Research Aims •  To explore the potential of natural language processing and machine learning for automated identification and differentiation of suicide-related communication in very large social media data sets •  This would enable those responsible for supporting safety and wellbeing (e.g. samaritans) to establish a more realistic idea of the volume of suicidal information online and possibly identify emerging ‘clusters’ •  While computation is essential, the work was driven from the s tart by a strong understanding of suicidal communication/language with established suicide researchers
  • 6. Developing a classifier for suicide-related social media content •  Anonymised data from suicide discussion fora •  Human annotated – ‘is this person suicidal?’ •  Identify (TF.IDF) terms & phrases from ‘suicidal texts’ •  Automated collection of data from Twitter & Tumblr using TF.IDF terms •  Human annotated sample (n=2000 1k Twitter + 1k Tumblr) – coding frame •  c1: Evidence of possible suicidal intent •  c2: Campaigning (i.e. petitions etc.) •  c3: Flippant reference to suicide •  c4: Information or support •  c5: Memorial or condolence •  c6: Reporting news of someone’s suicide (not bombing) •  c7: None of the above
  • 7. Features (Set 1) Lexical characteristics of sentences used, such as the Parts of Speech (POS), and other language structural features, such as the the most frequently used words and phrases. References to self and others are also captured with POS – these terms have been identified in previous research as being evident within suicidal communication (Set 2) Sentiment, affective and emotional features and levels of the terms used within the text. Emotions such as fear, anger and general aggressiveness are particularly prominent in suicidal communication (WordNet Affect) (Set 3) Language expressed in short, informal text such as social media posts within a limited number of characters. These were extracted from annotated Tumblr posts
  • 8. Machine Classification •  Key question here is: what are the features of suicidal ideation, and what are the features of the other classes? •  Accuracy important but explanatory value also crucial •  Methods used for the classifier • Probabilistic (Naïve Bayes), non-probabilistic linear (linear SVM) and rule-based (Decision Tree) machine classifier • Principal Components Analysis (1444 to 255 features) • Improvement with ‘ensemble’ classifier designed to incorporate diverse principal components (Rotation Forest]
  • 11. Classifier accuracy PCA P 0.321 0.345 0.762 0.507 (combined) R 0.641 0.385 0.205 0.436 F 0.427 0.364 0.323 0.469 Table 3: Confusion matrix for the best performing classification model classi. c1 c2 c3 c4 c5 c6 c7 as c1 57 0 16 0 0 0 5 c2 0 19 2 4 0 3 0 c3 13 1 142 0 0 5 16 c4 0 4 5 20 0 3 3 c5 1 1 1 0 31 1 1 c6 0 6 7 6 2 80 3 c7 18 0 20 1 2 4 98 6. DISCUSSION In this section we analyse the main feature components pro- duced by running the PCA procedure on the combined set that resulted in the best set of results, as shown in Tables 1 Exam regex ing’ . ideati Other tainin when that verbs words and ‘ pear a↵ect c2: F we ca regula minol cific t to thi c3: A conce prese F-measure: c1 = 0.690, all classes: 0.728
  • 12. Predictive Features d to suicide information enting sources ws (research of the name lated to the d of the ‘TV’ memorial, in- are the com- in the tweets tive features ot related to such as gen- hat’s wrong tes (such as es that could but are also Table 5: Principal components per class c1 - Evidence of possible suicidal intent 0.185word list1 end it all 521+0.185end it all+0.179it all now +0.179all now+0.175it all 0.149word list1 want to be dead 554-0.133 -0.129i think +0.125word list1 to commit suicide 547+0.114really 0.149word list1 want to be dead 554+0.145wn a↵ect11 alarm 496-0.123number of adverb superlative 211-0.121word list7 relationship 780+0.118regEx class6 +.+report.+ 701 0.153thinking about killing+0.153about killing myself +0.153about killing+0.147so im+0.147wn a↵ect11 misery 314 0.119number of predeterminers 206+0.117regEx class1 +.+ ((cutting|depres|sui)|these|bad|sad).+(thoughts|feel) .+ 667+0.115wn domain astrology 160-0.106bombing 0.231regEx class1 +.+(bdie).+(bmy).+bsleep.+0.177word list want to be dead 554-0.155wn domain dentistry 113 -0.146wn a↵ect11 security 277-0.129wn a↵ect11 admiration c2 - Campaigning (i.e. petitions etc.) 0.25 word list2 support 746-0.134wn domain racing 84
  • 13. Explanatory features •  Word-lists and regular expressions (regex) extracted from online suicide-related discussion forums and other microblogging Web sites provide ‘clues’ effective for the suicidal ideation class •  Lexical and grammar features such as POSs appear mostly ineffective •  ‘Affective’ language very relevant (such as those represented by the WordNet library of ‘cognitive synonyms’) and able to well represent the affective and emotional states associated to this particular type of language. •  Sentiment Scores generated by software tools for sentiment analysis appear also ineffective and either scarcely or not at all included within the principal components predictive of each class
  • 14. Networks of Suicidal Ideation “…shortest path of retweets of suicidal ideation was higher than previous studies that reported on general retweet path length. Our results found an average of 5, while other research reported metrics between 2 and 4.8.” Colombo, G., Burnap, P., Hodorog, A. and Scourfield, J. (2015) ‘Analysing the connectivity and communication of suicidal users on Twitter’, Computer Communications - available open access http://tinyurl.com/suicidenetworks