Computational Social Science as the Ultimate Web Intelligence

Computational Social Science
as the Ultimate Web Intelligence
Kno.e.sis Projects at the Intersection of Big Data, AI, Social Good and Health
Panel at Web Intelligence 2018
Prof. Amit Sheth
LexisNexis Ohio Eminent Scholar
Executive Director, Kno.e.sis - Ohio Center of Excellence in
Knowledge-enabled Computing & BioHealth Innovation
Presentation template by SlidesCarnival
Photographs by Unsplash
Icons by thenounproject

Big Data | Social Media | AI
2
Harnessing Twitter ‘Big Data’ for
Automatic Emotion Identification
2.5 M Tweets with Machine
Learning algorithms
Trends
Emotions
eDrugTrends - Identify emerging trends in
cannabis and synthetic cannabinoid use in the
U.S.
Web Forum Data & Tweets with
NLP, ML & Semantic Web
Technologies
Intents
Sentiments
Hazards SEES - Cross-modal aggregation
of Multi-modal & Multi-disciplinary
Data to support human efforts in disaster
management
Extracting Diverse Sentiment Expressions
with Target-Dependent Polarity from
Twitter
Opinions
400 000 Tweets with an
Optimization Model
People
Places
Times

Gender-Based Violence in
140 Characters or Fewer: A
#BigData Case Study of
Twitter
14 million tweets
collected from Twitter
over a period of 10
months
3
1. Gender-based violence in 140 characters or fewer: A #BigData case study of Twitter, Hemant Purohit, Tanvi Banerjee, Andrew Hampton, Valerie L. Shalin, Nayanesh Bhandutia, and Amit
Sheth, First Monday, Volume 21, Number 1 - 4 January 2016

Outcomes of Analysis
◎ Trends of GBV tweets across 5 countries; USA,
India, Philippines, Nigeria, South Africa.
4
◎ Three thematic groups of GBV tweets: physical
violence, sexual violence, and harmful practices.
◎ Nigeria has the highest percentage of tweets with URLs in
comparison to other countries.
◎ Numerous explanations;
○ Literacy,
○ Credibility of the public press
○ Possibility that reliance on external resources somehow reduces
the threat of being identified as the responsible party.

Context-Aware
Harassment Detection
on Social Media
24 000 tweets collected
Supervised ML methods
used
5
1. Mohammadreza Rezvan, Saeedeh Shekarpour, Lakshika Balasuriya, Krishnaprasad Thirunarayan, Valerie L. Shalin, Amit Sheth. A Quality Type-aware Annotated Corpus and
Lexicon for Harassment Research. Web Science, WebSci 2018, Amsterdam, The Netherlands, May 27-30, 2018
2. Mohammadreza Rezvan, Saeedeh Shekarpour, Thirunarayan, K., Valerie L. Shalin, Sheth, A. (2018). Analyzing and learning the languagefor different types of harassment
Knoesis wiki for Context-Aware Harassment Detection on Social
Media

Outcomes and Insights
Lexicon
Covering different types of harassment content
● Sexual
● Political
● Racial
Tweets
24 000 non-redundant annotated
tweets with 3000 are labeled as
harassing
Features
Combination of features resulted in best
accuracy
○ TFIDF
○ word2vec
○ paragraph2vec
○ LIWC vector
ML Methods
Gradient Boosting Machine (GBM)
outperformed SVM, KNN and NB
6
● Intellectuel
● Appearance - related
● General

7
1. Gaur, Manas, Ugur Kursuncu, Amanuel Alambo, Amit Sheth, Raminta Daniulaityte, Krishnaprasad Thirunarayan, and Jyotishman Pathak. "Let Me Tell You About Your
Mental Health!: Contextualized Classification of Reddit Posts to DSM-5 for Web-based Intervention." In Proceedings of the 27th ACM CIKM 2018.
Patient
ClinicianEMR
Insight
DSM-5 & Drug Abuse
Ontology
Improved
Healthcare
Classification of Reddit
Content to DSM-5 for
Web-based
Intervention
3 Million Posts from 270K
Reddit Users collected From
2005-2015 with zero shot
learning
Provide clinicians, insights of their patients
Knoesis wiki for Modeling Social Behavior for Healthcare
Utilization in Depression

Outcomes & Insights
9
Our sophisticated methods have
reduced the false alarm rate to 3%
- 5% by incorporating domain
knowledge and slang terms in
social media data

Views: People - Content - Network
Information in tweets by a user displays
an intent based on the user type:
Personal accounts share opinions, Retail
accounts promote related products for
sale, Media accounts disseminate
information.
Proper incorporation
of each view is
essential to
better represent
characteristics
of users.
User Modeling in Marijuana-related Communications
11
Multimodality
- The information shared in different
formats contributes to the meaning:
Text, Image, Emoji, Interactions
- Translation of image and emoji to textual
representation using state-of-the-art tools
such as EmojiNet.
People: user description, emoji,
profile pictures.
Content: text, emoji
Network: interactions with other
users: retweets and mentions.
🏈
😉
🍔
1. Ugur Kursuncu, Manas Gaur, Usha Lokala, Anurag Illendula, Krishnaprasad Thirunarayan, Raminta Daniulaityte, Amit Sheth, and I. Budak Arpinar. "" What's ur type?"
Contextualized Classification of User Types in Marijuana-related Communications using Compositional Multiview Embedding." In Proceedings of IEEE International
Conference on Web Intelligence, 2018
Knoesis wiki for eDrugTrends

Outcomes & Insights
◎ Incorporation of multimodal data,
specifically profile pictures and network
interactions, significantly contributes into
the classification of users.
◎ Multimodality significantly improves the
classification performance in the case of
imbalanced dataset, e.g., profile pictures
of users.
◎ Compositional of embeddings of views
(e.g., person, content, network) provide
more coherent representation of users.
12
Features Personal Media Retail
1 Tweet + Desc 0.95 0.42 0.73
2 w/ Composition 0.94 0.18 0.71
3 w/ Metadata 0.94 0.17 0.72
4 w/ Image 0.97 0.72 0.87
5 w/ Network 0.98 0.73 0.91
F-Scores for each user type

Fusing Visual, Textual and
Connectivity Clues for Studying
Mental Health
Knoesis wiki for Modeling Social Behavior for Healthcare Utilization in Depression
Develop a multimodal framework and
employing statistical techniques for
fusing heterogeneous sets of features
obtained by processing visual, textual
and user interaction data to identify
depressive behavior and demographic
inference.
13
1. Amir Hossein Yazdavar, Mohammad Saied Mahdavinejad, Goonmeet Bajaj, Krishnaprasad Thirunarayan, Jyotishman Pathak and Amit Sheth. Fusing Visual, Textual and
Connectivity Clues for Studying Mental Health in Population. In: 30th International Conference on World Wide Web (Submitted WWW-2019)
◎ How well do the content of posted images (colors,
aesthetic and facial presentation) reflect depressive
behavior?
◎ Does the choice of profile picture show any psychological
traits of depressed online persona? Are they reliable
enough to represent the demographic information such as
age and gender?
◎ Are there any underlying common themes among
depressed individuals generated using multimodal
content that can be used to detect depression reliably?

Outcomes & Insights
14
Characterizing Linguistic Patterns in two aspects:
Depressive-behavior and Age Distribution
Gender Biases
and Depressive
Behavior
Association (Chi-
square test: color-
code:
(blue:association),
(red: repulsion),
size: amount of
each cell’s
contribution)
The age
distribution for
depressed and
control users
in ground-truth
dataset

Outcomes & Insights
15
The explanation of the log-odds prediction of outcome (0.31) for
a sample user (y-axis shows the outcome probability (depressed
or control), the bar labels indicate the log-odds impact of each
feature)
Ranking Features obtained from Different Modalities with
Boruta Algorithm

Create value from data that supports action
Big Data & AI
16
What can we do that
is unique?
Emotions
Sentiments
Intentions Derive Insights
Scale to identify important & relevant
issues to human kind
Floods Earthquake
Wildfires Tsunami
Derive insights from data
Do more exercises
Reduce sugar intake
Increase water intake
More at: http://knoesis.org/projects, http://bit.ly/Kapproach

Computational Social Science as the Ultimate Web Intelligence

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (16)

Similaire à Computational Social Science as the Ultimate Web Intelligence

Similaire à Computational Social Science as the Ultimate Web Intelligence (20)

Dernier

Dernier (20)

Computational Social Science as the Ultimate Web Intelligence

Notes de l'éditeur