The Italian Hate Map: semantic content analytics for social good - Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis (Università degli Studi di Bari ‘Aldo Moro’, Italy - SWAP Research Group) - I-CiTies 2015
2015 CINI Annual Workshop on ICT for Smart Cities and Communities Palermo (Italy) - October 29-30, 2015
The Italian Hate Map: semantic content analytics for social good
1. The Italian Hate Map:
semantic content analytics for social good
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
(Università degli Studi di Bari ‘Aldo Moro’, Italy - SWAP Research Group)
I-CiTies 2015
2015 CINI Annual Workshop on ICT
for Smart Cities and Communities
Palermo (Italy) - October 29-30, 2015
2. 2Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
3. 3
The Italian HateMap
http://users.humboldt.edu/mstephens/hate/hate_map.html
Inspired by the
Hate Map built by
the Humboldt
University
joint research with a
psychologists team of
Rome University and a
no-profit agency
focused on human
rights
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
5. 5
(Not a new idea) Map of cholera in London, 1854
red = cholera cases
blue = water
The Italian HateMap
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
6. 6
Research Question:
Is it possible to extract and process social media
to detect intolerant content posted on social
networks and identify the most at-risk areas of the
Italian country?
The Italian HateMap
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
7. 7
A framework for real-time
Semantic Analysis of Social Streams
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
8. 8
CrowdPulse
Social Data Extraction
features
Semantic Tagging
Sentiment Analysis Processing & Visualization
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
9. 9
workflow
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
10. 10
Step 1: Social Data Extraction
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
11. 11
Step 1: Social Data Extraction
Extraction
Source
Heuristics
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
12. 12
Step 1: Social Data Extraction
Extraction
Source
Heuristics
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
13. 13
Step 1: Social Data Extraction
Extraction
Source
Heuristics
Content
User
Geo
Content+Geo
#icities2015
#democrats
#traffic
@barack_obama
@comunepalermo
#earthquake
Page
Group
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
14. 14
Step 1: Social Data Extraction
Extraction
Source
Heuristics
Content
User
Geo
Content+Geo
#www2015
#democrats
#traffic
@barack_obama
@comunefi
#earthquake
Page
Group
We only extract public content
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
15. 15
Use Case
Heuristics: Twitter content
- 76 intolerant seed terms, defined by the psychologists teams
- 5 intolerance dimensions: violence (against women), racism,
homophobia, disability, anti-semitism
CROWDPULSE SETTINGS
The Italian Hate Map
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
16. 16
Use Case
Extracted content (seed term: nano/midget)
Tweet about an Italian ministry
CROWDPULSE SETTINGS
Tweet about iPod nano
Tweet about an Italian football player
The Italian Hate Map
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
17. 17
Use Case
Tweet about an Italian ministry
CROWDPULSE SETTINGS
Tweet about iPod nano
Tweet about an Italian football player
The Italian Hate Map
Many non-intolerant Tweets are extracted!
X
X
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
18. 18
Use Case
Sentiment Analysis and Semantic Tagging of the content
CROWDPULSE SETTINGS
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
The Italian Hate Map
19. Keyword-based representation introduces
a lot of noise in the analysis
nano
?
(midget)
(ipod nano)
Semantic Tagging
Motivations
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015 19
20. “E’inutile, il mio nano non segnerà mai”
?
Semantic Tagging
Motivations
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
INTOLERANT
NOT INTOLERANT
?
20
21. • Entity Linking Algorithms
• Input: textual content
• Output: identification and
disambiguation of the
entities mentioned in the text.
(1) http://tagme.di.unipi.it
(2) http://spotlight.dbpedia.org
21
Step 2: Semantic Tagging
Solution: semantic processing of extracted content
Algorithms
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
22. 22
Use Case
Non-intolerant Tweets are detected and filtered out.
CROWDPULSE SETTINGS
The Italian Hate Map
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
23. 23
CrowdPulse
Step 3: Sentiment Analysis
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
24. 24
Sentiment Analysis
Motivations
Is this content conveying any opinion?
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
25. 25
Sentiment Analysis
Motivations
Is this content conveying any opinion?
This is a crucial issue if people-based findings have to be generated
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
26. 26
Sentiment Analysis
Definition
“It is the field of study that
analyzes people’s
opinions, sentiments,
evaluations, appraisals,
attitudes, and emotions
towards entities such as
products, services,
organizations, individuals,
issues, events, topics, and
their attributes “ (*)
(Pang, Bo, and Lillian Lee. "Opinion mining and sentiment analysis." Foundations and trends in information retrieval, 2008)
We concentrated on the polarity detection task
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
27. 27
Use Case
Tweets with positive or neutral sentiment are detected and filtered out.
CROWDPULSE SETTINGS
The Italian Hate Map
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
28. 28
Use Case
CROWDPULSE SETTINGS
The Italian Hate Map
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
29. 29
CrowdPulse
Step 4: Processing
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
30. 30
Use Case
We have to build a map, so we
only need geotagged content
CROWDPULSE SETTINGS
The Italian Hate Map
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
31. 31
Use Case
CROWDPULSE SETTINGS
The Italian Hate Map
Definition of heuristics to increase the
number of geotagged Tweets
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
32. 32
Use Case
The Italian Hate Map
Dimension #Tweets #Geo %Geo
Homophobia 110,774 8,501 7,66%
Racism 154,170 1,940 1,24%
Violence 1,102,494 28,886 2,62%
Disability 479,654 3,410 0,75%
Anti-Semitism 6,000 1,150 18,03%
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
33. 33
CrowdPulse
Step 4: Data Visualization
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
34. 34
Use Case
CROWDPULSE OUTPUT
The Italian Hate Map
Violence against women Disability
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
based on OpenStreetMap
35. 35
Use Case
CROWDPULSE OUTPUT
The Italian Hate Map
Racism Homophobia
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
based on OpenStreetMap
36. Conclusions
36
Crowdsourcing-based approach
Social content
containing the seed terms is
extracted and processed in
real-time
Semantic Processing
exploited to delete non-intolerant
Tweets
Sentiment Analysis
used to filter out Tweet
with irony
1. 2.
3. 4. Analytics Console used
to build real-time hate
maps
Almost 2,000,000 social content extracted and analyzed.
The Italian Hate Map
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
37. Lessons Learned
37
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
38. 38
Lessons Learned
The Italian Hate Map
Given the maps and given the output of the linguistic
analysis of intolerant Tweets (co-occurrences between terms,
time lapse, etc.), the psychologists team defined some
guidelines to tackle and prevent intolerant behaviors.
These guidelines have been freely distributed to public
administration on early 2015.
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
39. Lessons Learned
39
Pipeline of state of the art techniques
Semantic Processing, Sentiment Analysis, Machine Learning, Data Visualization
Use Case: The Italian Hate Map
DEFINITION OF A FRAMEWORK FOR
REAL-TIME SEMANTIC CONTENT ANALYSIS
Thanks to the huge availability of
textual data very complex
phenomena can be analyzed in a
totally new way
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015