Geographic knowledge discovery (PhD Theme) by Roberto Zagal

Geographical Knowledge
Discovery
applied to the
Social Perception of Pollution
in Mexico City
Roberto Zagal,Instituto Politecnico Nacional, ESCOM-IPN
Felix Mata, Instituto Politecnico Nacional, UPIITA-IPN
Christophe Claramunt, Naval Academy Research Institute
1

Introduction (1)
•
Traditionally Pollution Data has been produced by
institutions, government and vendors
•
But now… the Pollution Data is produced by persons, too
2

Information about Pollution topic is expressed in
different ways by:
− Government,
− News media
− People in social networks
Introduction (2)

Introduction (3)
But…
What about the certainty of this
information?

Introduction (4)

What about ...

inconsistency?
Id Type Description
1 Tweet
newspaper1
The index of IMECAS is 135 #CDMX
2 Tweet
Newspaper2
@ the #contamination of air is 127 IMECAS
#CDMX #bad #new 

Related work
•
The social data problem has been faced:
1. KDD and Social Mining
2. Formal publications (news media) guide the classification
of the interests of social media users [1]
3. Opinion mining and topic modeling [2].
But not using a GKD with an approach of crossing data
layers
6

Goal
Know how to:

Discover the certainty level of information
by

Crossing geographic and social information
7

8
Solution proposed:
GKD Framework
For
Data Air Polluttion
Phase 1
Phase 2
Phase 3

Data extraction: Sample tweet (Phase 1)
9
Id Type Description
1 Tweet
newspaper1
TheThe index of IMECAS is 135 #CDMX
2 Tweet
Newspaper2
#CDMX #bad #news 
We consider tweets from accounts that periodically
reports data of air pollution

Data extraction: Domain Detection
(Phase 1)
10
Id Type Description
2 Tweet
Newspape
r2
@ #contamination air is
127 IMECAS #CDMX #bad
#new
The post is related to a pollution topic

Preprocessing (Phase 2)
•
Emotion detection [3]
•
Location extraction
11
Id Type Description
2 Tweet
Newspaper2
@ #contamination air is 127 IMECAS #CDMX
#bad #new 

•
If we detect to which category belongs each set of data:
•
Health and Pollution, Transport and Pollution
Then, we can select which data sources should beThen, we can select which data sources should be
crossed with the tweet , in order to discovercrossed with the tweet , in order to discover
KnowledgeKnowledge
12
Classification C5 algorithm (Phase 3)
Id Description Category
2 @ #contamination air is 127 IMECAS
#CDMX #bad #new 
Health and
pollution

Crossing data (Phase 4)
•
Example 1:
•
Inconsistencies in tweet 1 and 2?
13
Id Type Description
1 Tweet
Newspaper1
The index of IMECAS is 135 #CDMX
2 Tweet
Newspaper2
#CDMX 
What is correct?

How to know what tweet is correct?
Answer:
It was classified in the domain of:
Health and pollution ( In Phase 3 )
Then
The official data from Healt reports and pollution reports are
selected to be crosssed with the Tweet (in Phase 4)
28/10/16

• Data are crossed considering different attributes,
from the tweet is taken the date and hour of
publication
• When is crossed with the date and hour from
official reports of air quality: a match is found
28/10/16

We discovered the tweets are correct but with
different location (the location is not include in
the original tweet)
28/10/16
1 Tweet
newspaper1
The index of IMECAS is in
135 #CDMX
#Taxqueña 10:00
hours
2 Tweet
Newspaper2
The #contaminación of air
is in 127 IMECAS #CDMX

#Indios
Verdes
15:00
hours
Knowledge
Discovered!

Other preliminary results
•
Following the same approach
•
Knowledge discovered: what topic are talked by region
17
Topic Geographic Period
Health
South , West March-June
Transport
North, East January
December
Policy and
programs
Center January
December
Pollution
Surrounding Mexico City January-June
Public roads
Surrounding Mexico City January-
December

Conclusions and Future work
•
The integration of the geographical and temporal
dimensions allow us to discover data correlations
knowledge can increase certainty of some
information in social networks .
•
The main contribution is the domain discovery and
classification of information is a key element of news
aproaches for to discover geographic information.
18

Conclusions and future work
•
Future work
•
Use of clustering or deep learning approaches to improve the
classification process
•
The location detection is a hard problem. It can be test another
machine learning methods for social media [4, 5]
•
¿How can we improve the geographic discovery knowledge
considering no explicit links between traditional data sources and
social sources?
19

Many Thanks!
Questions?
Roberto Zagal
zagalmmx@gmail.com
IPN, México
28/10/16

References
[1] Jonghyun Han, Hyunju Lee, Characterizing the interests of social media users: Refinement of a topic model for
incorporating heterogeneous media, Information Sciences, Volumes 358–359, 1 September 2016, Pages 112-128, ISSN
0020-0255.
[2] Schubert, E., Weiler, M., & Kriegel, H. P. (2014, August). Signitrend: scalable detection of emerging topics in textual
streams by hashed significance thresholds. In Proceedings of the 20th ACM SIGKDD international conference on
Knowledge discovery and data mining (pp. 871-880). ACM.
architecture for analysis of feelings in Facebook with semantic approach (Spanish), pp. 59–69; rec. 2014-06-22; acc.
2014-07-21 59 Research in Computing Science 75 (2014). http://www.rcs.cic.ipn.mx/rcs/2014_75/
[4] Ting Hua, Liang Zhao, Feng Chen, Chang-Tien Lu, and Naren Ramakrishnan. 2016. How events unfold: spatiotemporal
mining in social media. SIGSPATIAL Special 7, 3 (January 2016), 19-25. DOI=http://dx.doi.org/10.1145/2876480.2876485
[5] Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. Earthquake shakes twitter users: real-time event detection by social
sensors. In Proceedings of the 19th International Conference on World Wide Web, pages 851–860. ACM, 2010.
28/10/16

Geographic knowledge discovery (PhD Theme) by Roberto Zagal

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (19)

En vedette

En vedette (19)

Similaire à Geographic knowledge discovery (PhD Theme) by Roberto Zagal

Similaire à Geographic knowledge discovery (PhD Theme) by Roberto Zagal (20)

Dernier

Dernier (17)

Geographic knowledge discovery (PhD Theme) by Roberto Zagal

Notes de l'éditeur