SlideShare une entreprise Scribd logo
1  sur  21
Geographical Knowledge
Discovery
applied to the
Social Perception of Pollution
in Mexico City
Roberto Zagal,Instituto Politecnico Nacional, ESCOM-IPN
Felix Mata, Instituto Politecnico Nacional, UPIITA-IPN
Christophe Claramunt, Naval Academy Research Institute
1
Introduction (1)
•
Traditionally Pollution Data has been produced by
institutions, government and vendors
•
But now… the Pollution Data is produced by persons, too
2
Information about Pollution topic is expressed in
different ways by:
− Government,
− News media
− People in social networks
Introduction (2)
Introduction (3)
But…
What about the certainty of this
information?
Introduction (4)

What about ...

inconsistency?
Id Type Description
1 Tweet
newspaper1
The index of IMECAS is 135 #CDMX
2 Tweet
Newspaper2
@ the #contamination of air is 127 IMECAS
#CDMX #bad #new 
Related work
•
The social data problem has been faced:
1. KDD and Social Mining
2. Formal publications (news media) guide the classification
of the interests of social media users [1]
3. Opinion mining and topic modeling [2].
But not using a GKD with an approach of crossing data
layers
6
Goal
Know how to:

Discover the certainty level of information
by

Crossing geographic and social information
7
8
Solution proposed:
GKD Framework
For
Data Air Polluttion
Phase 1
Phase 2
Phase 3
Data extraction: Sample tweet (Phase 1)
9
Id Type Description
1 Tweet
newspaper1
TheThe index of IMECAS is 135 #CDMX
2 Tweet
Newspaper2
@ the #contamination of air is 127 IMECAS
#CDMX #bad #news 
We consider tweets from accounts that periodically
reports data of air pollution
Data extraction: Domain Detection
(Phase 1)
10
Id Type Description
2 Tweet
Newspape
r2
@ #contamination air is
127 IMECAS #CDMX #bad
#new
The post is related to a pollution topic
Preprocessing (Phase 2)
•
Emotion detection [3]
•
Location extraction
11
Id Type Description
2 Tweet
Newspaper2
@ #contamination air is 127 IMECAS #CDMX
#bad #new 
•
If we detect to which category belongs each set of data:
•
Health and Pollution, Transport and Pollution
Then, we can select which data sources should beThen, we can select which data sources should be
crossed with the tweet , in order to discovercrossed with the tweet , in order to discover
KnowledgeKnowledge
12
Classification C5 algorithm (Phase 3)
Id Description Category
2 @ #contamination air is 127 IMECAS
#CDMX #bad #new 
Health and
pollution
Crossing data (Phase 4)
•
Example 1:
•
Inconsistencies in tweet 1 and 2?
13
Id Type Description
1 Tweet
Newspaper1
The index of IMECAS is 135 #CDMX
2 Tweet
Newspaper2
@ the #contamination of air is 127 IMECAS
#CDMX 
What is correct?
How to know what tweet is correct?
Answer:
It was classified in the domain of:
Health and pollution ( In Phase 3 )
Then
The official data from Healt reports and pollution reports are
selected to be crosssed with the Tweet (in Phase 4)
28/10/16
Crossing data (Phase 4)
Crossing data (Phase 4)
• Data are crossed considering different attributes,
from the tweet is taken the date and hour of
publication
• When is crossed with the date and hour from
official reports of air quality: a match is found
28/10/16
We discovered the tweets are correct but with
different location (the location is not include in
the original tweet)
28/10/16
1 Tweet
newspaper1
The index of IMECAS is in
135 #CDMX
#Taxqueña 10:00
hours
2 Tweet
Newspaper2
The #contaminación of air
is in 127 IMECAS #CDMX

#Indios
Verdes
15:00
hours
Knowledge
Discovered!
Crossing data (Phase 4)
Other preliminary results
•
Following the same approach
•
Knowledge discovered: what topic are talked by region
17
Topic Geographic Period
Health
South , West March-June
Transport
North, East January
December
Policy and
programs
Center January
December
Pollution
Surrounding Mexico City January-June
Public roads
Surrounding Mexico City January-
December
Conclusions and Future work
•
The integration of the geographical and temporal
dimensions allow us to discover data correlations
knowledge can increase certainty of some
information in social networks .
•
The main contribution is the domain discovery and
classification of information is a key element of news
aproaches for to discover geographic information.
18
Conclusions and future work
•
Future work
•
Use of clustering or deep learning approaches to improve the
classification process
•
The location detection is a hard problem. It can be test another
machine learning methods for social media [4, 5]
•
¿How can we improve the geographic discovery knowledge
considering no explicit links between traditional data sources and
social sources?
19
Many Thanks!
Questions?
Roberto Zagal
zagalmmx@gmail.com
IPN, México
28/10/16
References
[1] Jonghyun Han, Hyunju Lee, Characterizing the interests of social media users: Refinement of a topic model for
incorporating heterogeneous media, Information Sciences, Volumes 358–359, 1 September 2016, Pages 112-128, ISSN
0020-0255.
[2] Schubert, E., Weiler, M., & Kriegel, H. P. (2014, August). Signitrend: scalable detection of emerging topics in textual
streams by hashed significance thresholds. In Proceedings of the 20th ACM SIGKDD international conference on
Knowledge discovery and data mining (pp. 871-880). ACM.
architecture for analysis of feelings in Facebook with semantic approach (Spanish), pp. 59–69; rec. 2014-06-22; acc.
2014-07-21 59 Research in Computing Science 75 (2014). http://www.rcs.cic.ipn.mx/rcs/2014_75/
[4] Ting Hua, Liang Zhao, Feng Chen, Chang-Tien Lu, and Naren Ramakrishnan. 2016. How events unfold: spatiotemporal
mining in social media. SIGSPATIAL Special 7, 3 (January 2016), 19-25. DOI=http://dx.doi.org/10.1145/2876480.2876485
[5] Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. Earthquake shakes twitter users: real-time event detection by social
sensors. In Proceedings of the 19th International Conference on World Wide Web, pages 851–860. ACM, 2010.
28/10/16

Contenu connexe

Tendances

ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...IJCSEA Journal
 
No misunderstandings during Earthquakes
No misunderstandings during EarthquakesNo misunderstandings during Earthquakes
No misunderstandings during EarthquakesISCRAM 2015
 
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...Sameera Horawalavithana
 
Research @ the Social Media Lab (@SMLabTO)
Research @ the Social Media Lab (@SMLabTO)Research @ the Social Media Lab (@SMLabTO)
Research @ the Social Media Lab (@SMLabTO)Philip Mai
 
A method to evaluate the reliability of social media data for social network ...
A method to evaluate the reliability of social media data for social network ...A method to evaluate the reliability of social media data for social network ...
A method to evaluate the reliability of social media data for social network ...Derek Weber
 
Information propagation in a social network site
Information propagation in a social network siteInformation propagation in a social network site
Information propagation in a social network siteMatteo Magnani
 
Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...Marco Brambilla
 
Big data analysis of news and social media content
Big data analysis of news and social media contentBig data analysis of news and social media content
Big data analysis of news and social media contentFiras Husseini
 
Individual project 2.20
Individual project 2.20Individual project 2.20
Individual project 2.20Monisha100
 
Automatic Hate Speech Detection: A Literature Review
Automatic Hate Speech Detection: A Literature ReviewAutomatic Hate Speech Detection: A Literature Review
Automatic Hate Speech Detection: A Literature ReviewDr. Amarjeet Singh
 
Analysis Tweets Korea Politicians(25 Sep2009)Sj
Analysis Tweets Korea Politicians(25 Sep2009)SjAnalysis Tweets Korea Politicians(25 Sep2009)Sj
Analysis Tweets Korea Politicians(25 Sep2009)SjWCU Webometrics Institute
 
CATEGORIZING 2019-N-COV TWITTER HASHTAG DATA BY CLUSTERING
CATEGORIZING 2019-N-COV TWITTER HASHTAG DATA BY CLUSTERINGCATEGORIZING 2019-N-COV TWITTER HASHTAG DATA BY CLUSTERING
CATEGORIZING 2019-N-COV TWITTER HASHTAG DATA BY CLUSTERINGijaia
 
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGESOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGEcscpconf
 
Isi 2017 presentation on Big Data and bias
Isi 2017 presentation on Big Data and biasIsi 2017 presentation on Big Data and bias
Isi 2017 presentation on Big Data and biasPiet J.H. Daas
 
Dealing with Information Overload When Using Social Media for Emergency Manag...
Dealing with Information Overload When Using Social Media for Emergency Manag...Dealing with Information Overload When Using Social Media for Emergency Manag...
Dealing with Information Overload When Using Social Media for Emergency Manag...Mirjam-Mona
 
Rapid identification of new drugs through online monitoring tools: The case o...
Rapid identification of new drugs through online monitoring tools: The case o...Rapid identification of new drugs through online monitoring tools: The case o...
Rapid identification of new drugs through online monitoring tools: The case o...Australian Drug Foundation
 
On How the Darknet and its Access to SCADA is a Threat to National Critical I...
On How the Darknet and its Access to SCADA is a Threat to National Critical I...On How the Darknet and its Access to SCADA is a Threat to National Critical I...
On How the Darknet and its Access to SCADA is a Threat to National Critical I...Matthew Kurnava
 
Fake News Detection using Machine Learning
Fake News Detection using Machine LearningFake News Detection using Machine Learning
Fake News Detection using Machine Learningijtsrd
 

Tendances (19)

ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
 
No misunderstandings during Earthquakes
No misunderstandings during EarthquakesNo misunderstandings during Earthquakes
No misunderstandings during Earthquakes
 
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
 
Research @ the Social Media Lab (@SMLabTO)
Research @ the Social Media Lab (@SMLabTO)Research @ the Social Media Lab (@SMLabTO)
Research @ the Social Media Lab (@SMLabTO)
 
A method to evaluate the reliability of social media data for social network ...
A method to evaluate the reliability of social media data for social network ...A method to evaluate the reliability of social media data for social network ...
A method to evaluate the reliability of social media data for social network ...
 
Information propagation in a social network site
Information propagation in a social network siteInformation propagation in a social network site
Information propagation in a social network site
 
Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...
 
Big data analysis of news and social media content
Big data analysis of news and social media contentBig data analysis of news and social media content
Big data analysis of news and social media content
 
Individual project 2.20
Individual project 2.20Individual project 2.20
Individual project 2.20
 
Automatic Hate Speech Detection: A Literature Review
Automatic Hate Speech Detection: A Literature ReviewAutomatic Hate Speech Detection: A Literature Review
Automatic Hate Speech Detection: A Literature Review
 
Analysis Tweets Korea Politicians(25 Sep2009)Sj
Analysis Tweets Korea Politicians(25 Sep2009)SjAnalysis Tweets Korea Politicians(25 Sep2009)Sj
Analysis Tweets Korea Politicians(25 Sep2009)Sj
 
CATEGORIZING 2019-N-COV TWITTER HASHTAG DATA BY CLUSTERING
CATEGORIZING 2019-N-COV TWITTER HASHTAG DATA BY CLUSTERINGCATEGORIZING 2019-N-COV TWITTER HASHTAG DATA BY CLUSTERING
CATEGORIZING 2019-N-COV TWITTER HASHTAG DATA BY CLUSTERING
 
Useful by Piet Daas
Useful by Piet DaasUseful by Piet Daas
Useful by Piet Daas
 
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGESOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
 
Isi 2017 presentation on Big Data and bias
Isi 2017 presentation on Big Data and biasIsi 2017 presentation on Big Data and bias
Isi 2017 presentation on Big Data and bias
 
Dealing with Information Overload When Using Social Media for Emergency Manag...
Dealing with Information Overload When Using Social Media for Emergency Manag...Dealing with Information Overload When Using Social Media for Emergency Manag...
Dealing with Information Overload When Using Social Media for Emergency Manag...
 
Rapid identification of new drugs through online monitoring tools: The case o...
Rapid identification of new drugs through online monitoring tools: The case o...Rapid identification of new drugs through online monitoring tools: The case o...
Rapid identification of new drugs through online monitoring tools: The case o...
 
On How the Darknet and its Access to SCADA is a Threat to National Critical I...
On How the Darknet and its Access to SCADA is a Threat to National Critical I...On How the Darknet and its Access to SCADA is a Threat to National Critical I...
On How the Darknet and its Access to SCADA is a Threat to National Critical I...
 
Fake News Detection using Machine Learning
Fake News Detection using Machine LearningFake News Detection using Machine Learning
Fake News Detection using Machine Learning
 

En vedette

Research Methodology
Research MethodologyResearch Methodology
Research Methodologysh_neha252
 
Civic Exchange - 2009 The Air We Breathe Conference - Application of Studies ...
Civic Exchange - 2009 The Air We Breathe Conference - Application of Studies ...Civic Exchange - 2009 The Air We Breathe Conference - Application of Studies ...
Civic Exchange - 2009 The Air We Breathe Conference - Application of Studies ...Civic Exchange
 
AIR AND NOISE POLLUTION RESEARCH
AIR AND NOISE POLLUTION RESEARCHAIR AND NOISE POLLUTION RESEARCH
AIR AND NOISE POLLUTION RESEARCHkl university
 
Air Pollution, Asthma, Triggers & Health - Research and Remediation Strategies
Air Pollution, Asthma, Triggers & Health - Research and Remediation StrategiesAir Pollution, Asthma, Triggers & Health - Research and Remediation Strategies
Air Pollution, Asthma, Triggers & Health - Research and Remediation StrategiesSean McCormick
 
Air pollutionand its effects and causes
Air pollutionand its effects and causesAir pollutionand its effects and causes
Air pollutionand its effects and causesSRINIVASULU N V
 
Academic Stress For Management Students
Academic Stress For Management StudentsAcademic Stress For Management Students
Academic Stress For Management StudentsLatha setna
 
AIR POLLUTION MONITORING USING RS
AIR POLLUTION MONITORING USING RSAIR POLLUTION MONITORING USING RS
AIR POLLUTION MONITORING USING RSAbhiram Kanigolla
 
Research methodology ppt babasab
Research methodology ppt babasab Research methodology ppt babasab
Research methodology ppt babasab Babasab Patil
 
Air pollution: its causes,effects and pollutants
Air pollution: its causes,effects and pollutantsAir pollution: its causes,effects and pollutants
Air pollution: its causes,effects and pollutantsMaliha Eesha
 
A RESEARCH ON EFFECT OF STRESS AMONG KMPh STUDENTS
 A RESEARCH ON EFFECT OF STRESS AMONG KMPh STUDENTS  A RESEARCH ON EFFECT OF STRESS AMONG KMPh STUDENTS
A RESEARCH ON EFFECT OF STRESS AMONG KMPh STUDENTS Natrah Abd Rahman
 
Pollution.Ppt
Pollution.PptPollution.Ppt
Pollution.PptSVS
 
Pollution its types, causes and effects by naveed.m
Pollution its types, causes and effects by naveed.mPollution its types, causes and effects by naveed.m
Pollution its types, causes and effects by naveed.mNaveed Abbas Malik
 

En vedette (19)

Research Methodology
Research MethodologyResearch Methodology
Research Methodology
 
Civic Exchange - 2009 The Air We Breathe Conference - Application of Studies ...
Civic Exchange - 2009 The Air We Breathe Conference - Application of Studies ...Civic Exchange - 2009 The Air We Breathe Conference - Application of Studies ...
Civic Exchange - 2009 The Air We Breathe Conference - Application of Studies ...
 
Svs R
Svs RSvs R
Svs R
 
AIR AND NOISE POLLUTION RESEARCH
AIR AND NOISE POLLUTION RESEARCHAIR AND NOISE POLLUTION RESEARCH
AIR AND NOISE POLLUTION RESEARCH
 
Air Pollution, Asthma, Triggers & Health - Research and Remediation Strategies
Air Pollution, Asthma, Triggers & Health - Research and Remediation StrategiesAir Pollution, Asthma, Triggers & Health - Research and Remediation Strategies
Air Pollution, Asthma, Triggers & Health - Research and Remediation Strategies
 
Air pollutionand its effects and causes
Air pollutionand its effects and causesAir pollutionand its effects and causes
Air pollutionand its effects and causes
 
Academic Stress For Management Students
Academic Stress For Management StudentsAcademic Stress For Management Students
Academic Stress For Management Students
 
Impact of academic stress on students
Impact of academic stress on studentsImpact of academic stress on students
Impact of academic stress on students
 
AIR POLLUTION MONITORING USING RS
AIR POLLUTION MONITORING USING RSAIR POLLUTION MONITORING USING RS
AIR POLLUTION MONITORING USING RS
 
Air Pollution
Air PollutionAir Pollution
Air Pollution
 
Air Pollution.Ppt
Air Pollution.PptAir Pollution.Ppt
Air Pollution.Ppt
 
Air pollution
Air pollutionAir pollution
Air pollution
 
Research Methodology Lecture for Master & Phd Students
Research Methodology  Lecture for Master & Phd StudentsResearch Methodology  Lecture for Master & Phd Students
Research Methodology Lecture for Master & Phd Students
 
Research methodology ppt babasab
Research methodology ppt babasab Research methodology ppt babasab
Research methodology ppt babasab
 
Air pollution: its causes,effects and pollutants
Air pollution: its causes,effects and pollutantsAir pollution: its causes,effects and pollutants
Air pollution: its causes,effects and pollutants
 
A RESEARCH ON EFFECT OF STRESS AMONG KMPh STUDENTS
 A RESEARCH ON EFFECT OF STRESS AMONG KMPh STUDENTS  A RESEARCH ON EFFECT OF STRESS AMONG KMPh STUDENTS
A RESEARCH ON EFFECT OF STRESS AMONG KMPh STUDENTS
 
Pollution.Ppt
Pollution.PptPollution.Ppt
Pollution.Ppt
 
Pollution its types, causes and effects by naveed.m
Pollution its types, causes and effects by naveed.mPollution its types, causes and effects by naveed.m
Pollution its types, causes and effects by naveed.m
 
Air pollution final.ppt
Air pollution final.pptAir pollution final.ppt
Air pollution final.ppt
 

Similaire à Geographic knowledge discovery (PhD Theme) by Roberto Zagal

A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...Paolo Missier
 
IRJET- Sentiment Analysis using Machine Learning
IRJET- Sentiment Analysis using Machine LearningIRJET- Sentiment Analysis using Machine Learning
IRJET- Sentiment Analysis using Machine LearningIRJET Journal
 
Analyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-TweetsAnalyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-TweetsRESHAN FARAZ
 
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...University of Groningen (The Netherlands)
 
From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?Yiannis Kompatsiaris
 
DigiCCurr 2013 PhD Workshop - Citizen Science and Data Curation: Who needs what?
DigiCCurr 2013 PhD Workshop - Citizen Science and Data Curation: Who needs what?DigiCCurr 2013 PhD Workshop - Citizen Science and Data Curation: Who needs what?
DigiCCurr 2013 PhD Workshop - Citizen Science and Data Curation: Who needs what?Todd Suomela
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisFarida Vis
 
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...IRJET Journal
 
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...IJCSEA Journal
 
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...IJCSEA Journal
 
CS322 Network Analysis.docx
CS322 Network Analysis.docxCS322 Network Analysis.docx
CS322 Network Analysis.docxwrite31
 
Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...Axel Bruns
 
A framework for real time semantic social media analysis
A framework for real time semantic social media analysis A framework for real time semantic social media analysis
A framework for real time semantic social media analysis Zelia Blaga
 
Combating propaganda texts using transfer learning
Combating propaganda texts using transfer learningCombating propaganda texts using transfer learning
Combating propaganda texts using transfer learningIAESIJAI
 
Use of ICT in Research Writing: Tools and Technology
Use of ICT in Research Writing: Tools and TechnologyUse of ICT in Research Writing: Tools and Technology
Use of ICT in Research Writing: Tools and Technologyssuser1310d0
 
understanding the pandemic through mining covid news using natural language p...
understanding the pandemic through mining covid news using natural language p...understanding the pandemic through mining covid news using natural language p...
understanding the pandemic through mining covid news using natural language p...Kishor Datta Gupta
 
The evolution of research on social media
The evolution of research on social mediaThe evolution of research on social media
The evolution of research on social mediaFarida Vis
 
Event detection in twitter using text and image fusion
Event detection in twitter using text and image fusionEvent detection in twitter using text and image fusion
Event detection in twitter using text and image fusioncsandit
 

Similaire à Geographic knowledge discovery (PhD Theme) by Roberto Zagal (20)

A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
 
IRJET- Sentiment Analysis using Machine Learning
IRJET- Sentiment Analysis using Machine LearningIRJET- Sentiment Analysis using Machine Learning
IRJET- Sentiment Analysis using Machine Learning
 
Analyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-TweetsAnalyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-Tweets
 
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
 
From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?
 
DigiCCurr 2013 PhD Workshop - Citizen Science and Data Curation: Who needs what?
DigiCCurr 2013 PhD Workshop - Citizen Science and Data Curation: Who needs what?DigiCCurr 2013 PhD Workshop - Citizen Science and Data Curation: Who needs what?
DigiCCurr 2013 PhD Workshop - Citizen Science and Data Curation: Who needs what?
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
 
Critically Assembling Data, Processes & Things: Toward and Open Smart City
Critically Assembling Data, Processes & Things: Toward and Open Smart CityCritically Assembling Data, Processes & Things: Toward and Open Smart City
Critically Assembling Data, Processes & Things: Toward and Open Smart City
 
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
 
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
 
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
 
CS322 Network Analysis.docx
CS322 Network Analysis.docxCS322 Network Analysis.docx
CS322 Network Analysis.docx
 
Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...
 
A framework for real time semantic social media analysis
A framework for real time semantic social media analysis A framework for real time semantic social media analysis
A framework for real time semantic social media analysis
 
Combating propaganda texts using transfer learning
Combating propaganda texts using transfer learningCombating propaganda texts using transfer learning
Combating propaganda texts using transfer learning
 
Use of ICT in Research Writing: Tools and Technology
Use of ICT in Research Writing: Tools and TechnologyUse of ICT in Research Writing: Tools and Technology
Use of ICT in Research Writing: Tools and Technology
 
s00146-014-0549-4.pdf
s00146-014-0549-4.pdfs00146-014-0549-4.pdf
s00146-014-0549-4.pdf
 
understanding the pandemic through mining covid news using natural language p...
understanding the pandemic through mining covid news using natural language p...understanding the pandemic through mining covid news using natural language p...
understanding the pandemic through mining covid news using natural language p...
 
The evolution of research on social media
The evolution of research on social mediaThe evolution of research on social media
The evolution of research on social media
 
Event detection in twitter using text and image fusion
Event detection in twitter using text and image fusionEvent detection in twitter using text and image fusion
Event detection in twitter using text and image fusion
 

Dernier

SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxeditsforyah
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationLinaWolf1
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
NSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationNSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationMarko4394
 

Dernier (17)

SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptx
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 Documentation
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
NSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationNSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentation
 
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 

Geographic knowledge discovery (PhD Theme) by Roberto Zagal

  • 1. Geographical Knowledge Discovery applied to the Social Perception of Pollution in Mexico City Roberto Zagal,Instituto Politecnico Nacional, ESCOM-IPN Felix Mata, Instituto Politecnico Nacional, UPIITA-IPN Christophe Claramunt, Naval Academy Research Institute 1
  • 2. Introduction (1) • Traditionally Pollution Data has been produced by institutions, government and vendors • But now… the Pollution Data is produced by persons, too 2
  • 3. Information about Pollution topic is expressed in different ways by: − Government, − News media − People in social networks Introduction (2)
  • 4. Introduction (3) But… What about the certainty of this information?
  • 5. Introduction (4)  What about ...  inconsistency? Id Type Description 1 Tweet newspaper1 The index of IMECAS is 135 #CDMX 2 Tweet Newspaper2 @ the #contamination of air is 127 IMECAS #CDMX #bad #new 
  • 6. Related work • The social data problem has been faced: 1. KDD and Social Mining 2. Formal publications (news media) guide the classification of the interests of social media users [1] 3. Opinion mining and topic modeling [2]. But not using a GKD with an approach of crossing data layers 6
  • 7. Goal Know how to:  Discover the certainty level of information by  Crossing geographic and social information 7
  • 8. 8 Solution proposed: GKD Framework For Data Air Polluttion Phase 1 Phase 2 Phase 3
  • 9. Data extraction: Sample tweet (Phase 1) 9 Id Type Description 1 Tweet newspaper1 TheThe index of IMECAS is 135 #CDMX 2 Tweet Newspaper2 @ the #contamination of air is 127 IMECAS #CDMX #bad #news  We consider tweets from accounts that periodically reports data of air pollution
  • 10. Data extraction: Domain Detection (Phase 1) 10 Id Type Description 2 Tweet Newspape r2 @ #contamination air is 127 IMECAS #CDMX #bad #new The post is related to a pollution topic
  • 11. Preprocessing (Phase 2) • Emotion detection [3] • Location extraction 11 Id Type Description 2 Tweet Newspaper2 @ #contamination air is 127 IMECAS #CDMX #bad #new 
  • 12. • If we detect to which category belongs each set of data: • Health and Pollution, Transport and Pollution Then, we can select which data sources should beThen, we can select which data sources should be crossed with the tweet , in order to discovercrossed with the tweet , in order to discover KnowledgeKnowledge 12 Classification C5 algorithm (Phase 3) Id Description Category 2 @ #contamination air is 127 IMECAS #CDMX #bad #new  Health and pollution
  • 13. Crossing data (Phase 4) • Example 1: • Inconsistencies in tweet 1 and 2? 13 Id Type Description 1 Tweet Newspaper1 The index of IMECAS is 135 #CDMX 2 Tweet Newspaper2 @ the #contamination of air is 127 IMECAS #CDMX  What is correct?
  • 14. How to know what tweet is correct? Answer: It was classified in the domain of: Health and pollution ( In Phase 3 ) Then The official data from Healt reports and pollution reports are selected to be crosssed with the Tweet (in Phase 4) 28/10/16 Crossing data (Phase 4)
  • 15. Crossing data (Phase 4) • Data are crossed considering different attributes, from the tweet is taken the date and hour of publication • When is crossed with the date and hour from official reports of air quality: a match is found 28/10/16
  • 16. We discovered the tweets are correct but with different location (the location is not include in the original tweet) 28/10/16 1 Tweet newspaper1 The index of IMECAS is in 135 #CDMX #Taxqueña 10:00 hours 2 Tweet Newspaper2 The #contaminación of air is in 127 IMECAS #CDMX  #Indios Verdes 15:00 hours Knowledge Discovered! Crossing data (Phase 4)
  • 17. Other preliminary results • Following the same approach • Knowledge discovered: what topic are talked by region 17 Topic Geographic Period Health South , West March-June Transport North, East January December Policy and programs Center January December Pollution Surrounding Mexico City January-June Public roads Surrounding Mexico City January- December
  • 18. Conclusions and Future work • The integration of the geographical and temporal dimensions allow us to discover data correlations knowledge can increase certainty of some information in social networks . • The main contribution is the domain discovery and classification of information is a key element of news aproaches for to discover geographic information. 18
  • 19. Conclusions and future work • Future work • Use of clustering or deep learning approaches to improve the classification process • The location detection is a hard problem. It can be test another machine learning methods for social media [4, 5] • ¿How can we improve the geographic discovery knowledge considering no explicit links between traditional data sources and social sources? 19
  • 21. References [1] Jonghyun Han, Hyunju Lee, Characterizing the interests of social media users: Refinement of a topic model for incorporating heterogeneous media, Information Sciences, Volumes 358–359, 1 September 2016, Pages 112-128, ISSN 0020-0255. [2] Schubert, E., Weiler, M., & Kriegel, H. P. (2014, August). Signitrend: scalable detection of emerging topics in textual streams by hashed significance thresholds. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 871-880). ACM. architecture for analysis of feelings in Facebook with semantic approach (Spanish), pp. 59–69; rec. 2014-06-22; acc. 2014-07-21 59 Research in Computing Science 75 (2014). http://www.rcs.cic.ipn.mx/rcs/2014_75/ [4] Ting Hua, Liang Zhao, Feng Chen, Chang-Tien Lu, and Naren Ramakrishnan. 2016. How events unfold: spatiotemporal mining in social media. SIGSPATIAL Special 7, 3 (January 2016), 19-25. DOI=http://dx.doi.org/10.1145/2876480.2876485 [5] Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In Proceedings of the 19th International Conference on World Wide Web, pages 851–860. ACM, 2010. 28/10/16

Notes de l'éditeur

  1. SLIDE 1: 1.- Good morning. 2.- My name is Roberto. I'm PHD student of National Polytechnic Institute in Mexico City. 3.- Thanks for the invitation to be here today 4.- I’m talking about of“Geographical Knowledge Discovery applied to the Social Perception of Pollution in Mexico City” 5.- This research has the advice of Dr. Felix and Dr. Christophe Claramunt 7.- in recent years, air pollution in Mexico City has increased considerably 8.- The air pollution, it is a problem that requires analysis of multiple domains of knowledge because actually we have more information in data sources more complex.
  2. SLIDE 2: Currently, social networks become increasingly relevant as a means of diffusion and sharing of citizen views. In order to discover new knowledge in air pollution, We need to consider data from diferent sources, like: Government, Social groups, social media and other web data. In social media, the people make comments and observations, they might reflect important on different topics related in air pollution.
  3. SLIDE 3. 1.-We reviewed three representative and heterogenous data sources: 2.- Government of Mexico City, because it generates information in traditional databases about pollution. The informaiton is trustworthy 3.- News media, it is an important element, because it provide a valuable source for deriving on-the-fly citizens opinions. 4.- For example, people in social networks express complaints, opinions, reports of problems and observations regarding air pollution topic, 5.- We consider the social networks as a instantaneous picture of the social perception of air pollution. 6.- Now, the question is: How can we cross this information to discover new confidence knowledge about pollution?
  4. SLIDE 4: 1. Information produced by institutions has degree of certainty and veracity, It is assumed that it is true. 2. But. 3. All information produced in social networks ¿can be trustworthy?. 4.- What is the level of certainty in the information produced in social networks related to others sources?. 5.- This is the statement problem of this preliminary investigation.
  5. SLIDE 5: 1.The information, sometimes needs to be verified to KNOW if it is correct or not 2. For example: 3. We have an inconsistency in the following two tweets about air quality 4. The IMECAS is the acronym of The Metropolitan Index of Air Quality in the city of Mexico. 5. In tweet 1: newspaper report that the imecas index is one hundred thirty five (135). 6. In tweet 2: newspaper report that the imecas index is one hundred twenty seven (127). 7. Which one have the correct information?. 8. How can we detect and resolve the inconsistency in the information?.
  6. SLIDE 6: 1.- The papers have not a explicit relation with the geographic dimension 2.- And they don’t explore the certainty of information.
  7. SLIDE 7: 1. It means, that we can discover the level of certain of the publications that appear in social media 2. by crossing these data with other additional formal of . 4. The geographic information can be used as a linker to different data sources.
  8. SLIDE 8: 1.- We propose a GKD Framework for Air Polluttion that includes four Phases: 2.- Data extraction: is oriented to get information from social sources and newspapers. 3.- The processing phase: includes locations and sentiment detection. 4.- The Classification categoriza los datos en topicos especificos. 5.- Crossing data, helps to detect of level of information certainty.
  9. SLIDE 9: 1.- For extraction, we consider tweets from accounts that periodically report data of air pollution, for example digital newspapers of Mexico 2.- Extraction continues using initial key phrases and hashtags, like #CDMX or #AirPollution. 4.- After, a data cleaning is developed: that includes tokenization, removing of stop words and stemming.
  10. SLIDE 10: 1. Domain detection is pre-classify semantically tweets to a category of pollution, for example: 2. In tweet 2 the term “contamination" matches with the “pollution” class, by synonymy 3. Next, the word IMECAS matches with the class “IMECAS” that is a subclass of “IndexOfAirQuality”. 4. We can say, that the post is related to a pollution topic, it is a generic class. 5. it is possible that the tweet belongs to a more specific category that describes the nature of the post.
  11. SLIDE 11: 1. In this part, we detect if the post is related to a positive or negative feeling by words or emoticons. This detection is useful for identifying trends in the social perception of a specific topic of pollution, for example tweets positive to talk about politics and pollution. 2. Regarding the location of the tweet, we assume that each tweet contains the information in the metadata about of its place and time of publication. 3. Sometimes a tweet not contain explicit or implicit information that allows to define its location. In this case only it considered the time of publication for the following phases.
  12. SLIDE 12: 1. If we detect to which specific category belongs each set of data: we can select the data sources which should be crossed with the tweet , in order to discover new Knowledge and certainty . 2. The Tweet 2 is classified in a more specific category; health and pollution. 3. We choose C5 because, is one of the algorithms that have shown good performance in knolewge discovery in data bases.
  13. Slide 13. At this stage quantitative values and qualitative values are separated. 1) Using the ontology we can identify and separate the terms like: IMECAS, Air and Pollution. 2) The a numerical value IMECA is separated. 3) Now, we know that this value must be in a range from 0 to 201 according to definition of index IMECA. If this happens, we can say that we have found a valid value of air quality. 4) Is this possible that this approach does not work in some cases. 5) The Tweets do not contain information about its location but we consider the time of publication. 6) Using the IMECA value and time of Tweet, we proceed to search for matches in government data sources on air quality
  14. Slide 14: 1. Through the categorization of the tweet, we know that we can exchange information with the database of air quality, because it is related to pollution and public health topics.
  15. SLIDE 15: 1.- The Air Quality Data is provided by: Environmental monitoring ministry of CDMX goverment
  16. SLIDE 16: 1. The tweet have not no location, but using its time component 2. We find in official data a match using the value of IMECA 3. Then, the official data help us to discover the tweet location
  17. SLIDE 17: 1. In these additional results, we can see the classification of tweets by topic and location. 2. These results show the trend of social perception in certain subjects and geographical areas.
  18. Slide 18. 1.- The information of dimensions. 2.- The domain discovery.