SlideShare a Scribd company logo
1 of 13
Download to read offline
Geo-statistical Exploration of 
Milano Datasets 
Irene Celino and Gloria Re Calegari 
CEFRIEL 
1 
The 13th International Semantic Web Conference 2014 
Riva del Garda, Italy 
19 – 23 October 2014 
Semantic Statistics workshop 
19th October 2014
Research goal 
Long term goal 
• Compare data related to the same city but obtained from heterogeneous 
sources. Do they provide the same ‘picture’ of the city? 
• Can we update a dataset (expensive and time consuming, updated once every 
5/10 years) with another dataset (cheaper and always up to date)? 
Presentation target 
• Comparison of two heterogeneous datasets referring to the same city (Milan) to 
discover if they have any intrinsic correlation 
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 2
Datasets available 
• Telecom (phone activity data) provided for their “Big Data Challenge” 
• Milan + surroundings 
• Nov-dic 2013 
• Grid of 10.000 cells 
• Activity recorded every ten minutes 
• A footprint for each cell 
• ISTAT - Italian National Statistical Institute 
• demographic data of 2011 and 2001 
• divided by sex, age and nationality 
Analysis performed using data in their original format (GeoJSON, csv) and serialization in RDF format 
at the end of the analysis. 
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 3
Datasets available – different spatial granularity 
ISTAT – 50 surrounding towns + 9 
internal Milano districts 
Telecom – grid of 10.000 cells (250 x 
250 m each) 
Pre-processing of 
data required. 
Mapping of 
Telecom data into 
municipality 
granularity. 
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 4
Methodology of analysis 
Goal of analysis: do the two datasets have the same intrinsic meaning? 
Unsupervised clustering to group data in each dataset: 
• k-means (euclidean distance) 
• Adaptive k-means (cosine distance) 
Comparison of the two clustering results to find correlation between the two datasets. 
Validation of the comparison: 
• Rand Index 
• Kappa Index 
(the closer to 1 the index is, the stronger the correlation between the data clusterings) 
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 5
Experiments- Telecom vs ISTAT (range on total poulation) 
Telecom - Kmeans 6 classes 
Rand 
Index 
Kappa 
Index 
0,23 0,23 
ISTAT 2011 - range 6 classes 
Only partial 
correlation 
Try to add 
information to total 
population (feature 
vector with 
distribution of 
populatin divided by 
sex, age and 
nationality) 
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 6
Experiments - Telecom vs ISTAT (2011 fine-grained population) 
Telecom - Kmeans 4 classes 
Rand 
Index 
Kappa 
Index 
0,77 0,81 
ISTAT 2011 – Kmeans 4 classes 
Good correlation 
Try to add historical 
information (2001 
datasets). 
Does adding 
temporal dynamic 
improve corrrelation? 
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 7
Telecom - Kmeans 4 classes 
Rand 
Index 
Kappa 
Index 
0,77 0,81 
ISTAT 2011 + 2001 – Kmeans 4 classes 
Good correlation. 
The same as the 
previous test. 
Experiments- Telecom vs ISTAT (2011+2001 fine-grained population) 
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 8
Clustering results serialization 
Results of 
clustering 
analysis 
Geographic format 
RDF format 
Geo + RDF format 
GeoJSON file 
RDF Data Cube vocabulary (adding in the 
Observation definition concepts of clustering 
algo and cluster number) 
Enrich GeoJSON with power of 
JSON-LD -> ‘’GeoJSON-LD’’ 
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 9
‘’GeoJSON-LD’’ 
‘’GeoJSON-LD’’ is obtained by adding the ‘@context’ prefix to the GeoJSON file. 
The prefix specify how to interpret GeoJSON tags as RDF resources 
@context prefix GeoJSON file 
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 10
‘’GeoJSON-LD’’ pros and cons 
Correct interpretation of the geographic information 
in the GeoJSON-LD file 
Problems in the interpretation of the RDF information. 
‘’GeoJSON-LD’’ does not interpret correctly a lists of lists 
(polygon coordinates representation) 
GeoJSON-LD is correctly interpreted as GeoJSON by 
GIS 
GeoJSON list of lists 
RDF serialization 
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 11
Conclusion and future works 
• Identification of a correlation between phone activity data and 
demographic information 
• Further tests using different datasets (land use, Point of interests) 
• Find efficient method for handling the different granularity levels of the 
datasets 
• Method for handling temporal aspect of phone activity data (we lost temporal 
information during clustering process) 
• Find a method to overcome ‘’GeoJSON-LD’’ serialization problem 
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 12
Thank you! Any question? 
Further details at: http://swa.cefriel.it/geo/ 
Irene Celino and Gloria Re Calegari 
CEFRIEL 
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 13

More Related Content

What's hot

The RIPE NCC, Internet Measurements and IXPs
The RIPE NCC, Internet Measurements and IXPsThe RIPE NCC, Internet Measurements and IXPs
The RIPE NCC, Internet Measurements and IXPsRIPE NCC
 
Dr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GISDr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GISShaun Lewis
 
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...Evangelos Kalampokis
 
RIPE Atlas
RIPE AtlasRIPE Atlas
RIPE AtlasRIPE NCC
 
A Data Scientist Exploration in the World of Heterogeneous Open Geospatial Data
A Data Scientist Exploration in the World of Heterogeneous Open Geospatial DataA Data Scientist Exploration in the World of Heterogeneous Open Geospatial Data
A Data Scientist Exploration in the World of Heterogeneous Open Geospatial DataGloria Re Calegari
 
From Data to City Indicators: A Knowledge Graph for Supporting Automatic Gene...
From Data to City Indicators: A Knowledge Graph for Supporting Automatic Gene...From Data to City Indicators: A Knowledge Graph for Supporting Automatic Gene...
From Data to City Indicators: A Knowledge Graph for Supporting Automatic Gene...Henrique O. Santos
 
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy LansleyUsing R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy LansleyGuy Lansley
 
So much data so many uses
So much data so many usesSo much data so many uses
So much data so many usesGailBordenTech
 

What's hot (8)

The RIPE NCC, Internet Measurements and IXPs
The RIPE NCC, Internet Measurements and IXPsThe RIPE NCC, Internet Measurements and IXPs
The RIPE NCC, Internet Measurements and IXPs
 
Dr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GISDr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GIS
 
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
 
RIPE Atlas
RIPE AtlasRIPE Atlas
RIPE Atlas
 
A Data Scientist Exploration in the World of Heterogeneous Open Geospatial Data
A Data Scientist Exploration in the World of Heterogeneous Open Geospatial DataA Data Scientist Exploration in the World of Heterogeneous Open Geospatial Data
A Data Scientist Exploration in the World of Heterogeneous Open Geospatial Data
 
From Data to City Indicators: A Knowledge Graph for Supporting Automatic Gene...
From Data to City Indicators: A Knowledge Graph for Supporting Automatic Gene...From Data to City Indicators: A Knowledge Graph for Supporting Automatic Gene...
From Data to City Indicators: A Knowledge Graph for Supporting Automatic Gene...
 
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy LansleyUsing R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
 
So much data so many uses
So much data so many usesSo much data so many uses
So much data so many uses
 

Similar to Geo-statistical Exploration of Milano Datasets

City Data Dating: emerging affinities between diverse urban datasets
City Data Dating: emerging affinities between diverse urban datasetsCity Data Dating: emerging affinities between diverse urban datasets
City Data Dating: emerging affinities between diverse urban datasetsGloria Re Calegari
 
2013 GISCO Track, Quality Assessment and Improvement for Addressed Locations ...
2013 GISCO Track, Quality Assessment and Improvement for Addressed Locations ...2013 GISCO Track, Quality Assessment and Improvement for Addressed Locations ...
2013 GISCO Track, Quality Assessment and Improvement for Addressed Locations ...GIS in the Rockies
 
Lemmens kessler-agile-linked data v3-slideshare
Lemmens kessler-agile-linked data v3-slideshareLemmens kessler-agile-linked data v3-slideshare
Lemmens kessler-agile-linked data v3-slideshareRob Lemmens
 
Data Ecosystems for Geospatial Data
Data Ecosystems for Geospatial DataData Ecosystems for Geospatial Data
Data Ecosystems for Geospatial DataSlim Turki, Dr.
 
Hotspot Analysis with QGIS - FOSS4G-IT 2017
Hotspot Analysis with QGIS  - FOSS4G-IT 2017Hotspot Analysis with QGIS  - FOSS4G-IT 2017
Hotspot Analysis with QGIS - FOSS4G-IT 2017Daniele Oxoli
 
Approaches to representing and delivering geospatial data in the semantic Web...
Approaches to representing and delivering geospatial data in the semantic Web...Approaches to representing and delivering geospatial data in the semantic Web...
Approaches to representing and delivering geospatial data in the semantic Web...Paul Box
 
An evaluation of SimRank and Personalized PageRank to build a recommender sys...
An evaluation of SimRank and Personalized PageRank to build a recommender sys...An evaluation of SimRank and Personalized PageRank to build a recommender sys...
An evaluation of SimRank and Personalized PageRank to build a recommender sys...Paolo Tomeo
 
Egu2013 final
Egu2013 finalEgu2013 final
Egu2013 finalsirf13
 
02 -how-will-inspire-influence-local-authorities-and-spatial-planning
02 -how-will-inspire-influence-local-authorities-and-spatial-planning02 -how-will-inspire-influence-local-authorities-and-spatial-planning
02 -how-will-inspire-influence-local-authorities-and-spatial-planningKarel Charvat
 
Towards emergency vehicle routing using Geolinked Open Data: the case study o...
Towards emergency vehicle routing using Geolinked Open Data: the case study o...Towards emergency vehicle routing using Geolinked Open Data: the case study o...
Towards emergency vehicle routing using Geolinked Open Data: the case study o...Sergio Consoli
 
Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...
Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...
Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...Data Con LA
 
Murphy, P. - Defining functional areas in Canada through open source packages
Murphy, P. - Defining functional areas in Canada through open source packagesMurphy, P. - Defining functional areas in Canada through open source packages
Murphy, P. - Defining functional areas in Canada through open source packagesOECDregions
 
Individual movements and geographical data mining. Clustering algorithms for ...
Individual movements and geographical data mining. Clustering algorithms for ...Individual movements and geographical data mining. Clustering algorithms for ...
Individual movements and geographical data mining. Clustering algorithms for ...Beniamino Murgante
 
Online Index Extraction from Linked Open Data Sources
Online Index Extraction from Linked Open Data SourcesOnline Index Extraction from Linked Open Data Sources
Online Index Extraction from Linked Open Data SourcesFabio Benedetti
 
03 SensLog – solution for sensors and VGI
03 SensLog – solution for sensors and VGI03 SensLog – solution for sensors and VGI
03 SensLog – solution for sensors and VGIplan4all
 
Location Intelligence for Italian and UK Justice and Public Safety - BIWASumm...
Location Intelligence for Italian and UK Justice and Public Safety - BIWASumm...Location Intelligence for Italian and UK Justice and Public Safety - BIWASumm...
Location Intelligence for Italian and UK Justice and Public Safety - BIWASumm...Iconsulting
 
TEAMS 6, 7 and 8
TEAMS 6, 7 and 8TEAMS 6, 7 and 8
TEAMS 6, 7 and 8plan4all
 
LokeshShanmuganandam_BigData_FinalProjectReport
LokeshShanmuganandam_BigData_FinalProjectReportLokeshShanmuganandam_BigData_FinalProjectReport
LokeshShanmuganandam_BigData_FinalProjectReportlokesh shanmuganandam
 

Similar to Geo-statistical Exploration of Milano Datasets (20)

City Data Dating: emerging affinities between diverse urban datasets
City Data Dating: emerging affinities between diverse urban datasetsCity Data Dating: emerging affinities between diverse urban datasets
City Data Dating: emerging affinities between diverse urban datasets
 
2013 GISCO Track, Quality Assessment and Improvement for Addressed Locations ...
2013 GISCO Track, Quality Assessment and Improvement for Addressed Locations ...2013 GISCO Track, Quality Assessment and Improvement for Addressed Locations ...
2013 GISCO Track, Quality Assessment and Improvement for Addressed Locations ...
 
Lemmens kessler-agile-linked data v3-slideshare
Lemmens kessler-agile-linked data v3-slideshareLemmens kessler-agile-linked data v3-slideshare
Lemmens kessler-agile-linked data v3-slideshare
 
Data Ecosystems for Geospatial Data
Data Ecosystems for Geospatial DataData Ecosystems for Geospatial Data
Data Ecosystems for Geospatial Data
 
Hotspot Analysis with QGIS - FOSS4G-IT 2017
Hotspot Analysis with QGIS  - FOSS4G-IT 2017Hotspot Analysis with QGIS  - FOSS4G-IT 2017
Hotspot Analysis with QGIS - FOSS4G-IT 2017
 
Approaches to representing and delivering geospatial data in the semantic Web...
Approaches to representing and delivering geospatial data in the semantic Web...Approaches to representing and delivering geospatial data in the semantic Web...
Approaches to representing and delivering geospatial data in the semantic Web...
 
An evaluation of SimRank and Personalized PageRank to build a recommender sys...
An evaluation of SimRank and Personalized PageRank to build a recommender sys...An evaluation of SimRank and Personalized PageRank to build a recommender sys...
An evaluation of SimRank and Personalized PageRank to build a recommender sys...
 
03 --metadata
03 --metadata03 --metadata
03 --metadata
 
Egu2013 final
Egu2013 finalEgu2013 final
Egu2013 final
 
02 -how-will-inspire-influence-local-authorities-and-spatial-planning
02 -how-will-inspire-influence-local-authorities-and-spatial-planning02 -how-will-inspire-influence-local-authorities-and-spatial-planning
02 -how-will-inspire-influence-local-authorities-and-spatial-planning
 
GISrock1_DG_REwork
GISrock1_DG_REworkGISrock1_DG_REwork
GISrock1_DG_REwork
 
Towards emergency vehicle routing using Geolinked Open Data: the case study o...
Towards emergency vehicle routing using Geolinked Open Data: the case study o...Towards emergency vehicle routing using Geolinked Open Data: the case study o...
Towards emergency vehicle routing using Geolinked Open Data: the case study o...
 
Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...
Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...
Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...
 
Murphy, P. - Defining functional areas in Canada through open source packages
Murphy, P. - Defining functional areas in Canada through open source packagesMurphy, P. - Defining functional areas in Canada through open source packages
Murphy, P. - Defining functional areas in Canada through open source packages
 
Individual movements and geographical data mining. Clustering algorithms for ...
Individual movements and geographical data mining. Clustering algorithms for ...Individual movements and geographical data mining. Clustering algorithms for ...
Individual movements and geographical data mining. Clustering algorithms for ...
 
Online Index Extraction from Linked Open Data Sources
Online Index Extraction from Linked Open Data SourcesOnline Index Extraction from Linked Open Data Sources
Online Index Extraction from Linked Open Data Sources
 
03 SensLog – solution for sensors and VGI
03 SensLog – solution for sensors and VGI03 SensLog – solution for sensors and VGI
03 SensLog – solution for sensors and VGI
 
Location Intelligence for Italian and UK Justice and Public Safety - BIWASumm...
Location Intelligence for Italian and UK Justice and Public Safety - BIWASumm...Location Intelligence for Italian and UK Justice and Public Safety - BIWASumm...
Location Intelligence for Italian and UK Justice and Public Safety - BIWASumm...
 
TEAMS 6, 7 and 8
TEAMS 6, 7 and 8TEAMS 6, 7 and 8
TEAMS 6, 7 and 8
 
LokeshShanmuganandam_BigData_FinalProjectReport
LokeshShanmuganandam_BigData_FinalProjectReportLokeshShanmuganandam_BigData_FinalProjectReport
LokeshShanmuganandam_BigData_FinalProjectReport
 

Recently uploaded

Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsCEPTES Software Inc
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxDilipVasan
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdfvyankatesh1
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonPayment Village
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理pyhepag
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictJack Cole
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyRafigAliyev2
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp onlinebalibahu1313
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Calllward7
 
Data analytics courses in Nepal Presentation
Data analytics courses in Nepal PresentationData analytics courses in Nepal Presentation
Data analytics courses in Nepal Presentationanshikakulshreshtha11
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理pyhepag
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfMichaelSenkow
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理cyebo
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理pyhepag
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Jon Hansen
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfscitechtalktv
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group MeetingAlison Pitt
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理cyebo
 

Recently uploaded (20)

Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
Data analytics courses in Nepal Presentation
Data analytics courses in Nepal PresentationData analytics courses in Nepal Presentation
Data analytics courses in Nepal Presentation
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 

Geo-statistical Exploration of Milano Datasets

  • 1. Geo-statistical Exploration of Milano Datasets Irene Celino and Gloria Re Calegari CEFRIEL 1 The 13th International Semantic Web Conference 2014 Riva del Garda, Italy 19 – 23 October 2014 Semantic Statistics workshop 19th October 2014
  • 2. Research goal Long term goal • Compare data related to the same city but obtained from heterogeneous sources. Do they provide the same ‘picture’ of the city? • Can we update a dataset (expensive and time consuming, updated once every 5/10 years) with another dataset (cheaper and always up to date)? Presentation target • Comparison of two heterogeneous datasets referring to the same city (Milan) to discover if they have any intrinsic correlation Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 2
  • 3. Datasets available • Telecom (phone activity data) provided for their “Big Data Challenge” • Milan + surroundings • Nov-dic 2013 • Grid of 10.000 cells • Activity recorded every ten minutes • A footprint for each cell • ISTAT - Italian National Statistical Institute • demographic data of 2011 and 2001 • divided by sex, age and nationality Analysis performed using data in their original format (GeoJSON, csv) and serialization in RDF format at the end of the analysis. Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 3
  • 4. Datasets available – different spatial granularity ISTAT – 50 surrounding towns + 9 internal Milano districts Telecom – grid of 10.000 cells (250 x 250 m each) Pre-processing of data required. Mapping of Telecom data into municipality granularity. Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 4
  • 5. Methodology of analysis Goal of analysis: do the two datasets have the same intrinsic meaning? Unsupervised clustering to group data in each dataset: • k-means (euclidean distance) • Adaptive k-means (cosine distance) Comparison of the two clustering results to find correlation between the two datasets. Validation of the comparison: • Rand Index • Kappa Index (the closer to 1 the index is, the stronger the correlation between the data clusterings) Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 5
  • 6. Experiments- Telecom vs ISTAT (range on total poulation) Telecom - Kmeans 6 classes Rand Index Kappa Index 0,23 0,23 ISTAT 2011 - range 6 classes Only partial correlation Try to add information to total population (feature vector with distribution of populatin divided by sex, age and nationality) Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 6
  • 7. Experiments - Telecom vs ISTAT (2011 fine-grained population) Telecom - Kmeans 4 classes Rand Index Kappa Index 0,77 0,81 ISTAT 2011 – Kmeans 4 classes Good correlation Try to add historical information (2001 datasets). Does adding temporal dynamic improve corrrelation? Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 7
  • 8. Telecom - Kmeans 4 classes Rand Index Kappa Index 0,77 0,81 ISTAT 2011 + 2001 – Kmeans 4 classes Good correlation. The same as the previous test. Experiments- Telecom vs ISTAT (2011+2001 fine-grained population) Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 8
  • 9. Clustering results serialization Results of clustering analysis Geographic format RDF format Geo + RDF format GeoJSON file RDF Data Cube vocabulary (adding in the Observation definition concepts of clustering algo and cluster number) Enrich GeoJSON with power of JSON-LD -> ‘’GeoJSON-LD’’ Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 9
  • 10. ‘’GeoJSON-LD’’ ‘’GeoJSON-LD’’ is obtained by adding the ‘@context’ prefix to the GeoJSON file. The prefix specify how to interpret GeoJSON tags as RDF resources @context prefix GeoJSON file Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 10
  • 11. ‘’GeoJSON-LD’’ pros and cons Correct interpretation of the geographic information in the GeoJSON-LD file Problems in the interpretation of the RDF information. ‘’GeoJSON-LD’’ does not interpret correctly a lists of lists (polygon coordinates representation) GeoJSON-LD is correctly interpreted as GeoJSON by GIS GeoJSON list of lists RDF serialization Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 11
  • 12. Conclusion and future works • Identification of a correlation between phone activity data and demographic information • Further tests using different datasets (land use, Point of interests) • Find efficient method for handling the different granularity levels of the datasets • Method for handling temporal aspect of phone activity data (we lost temporal information during clustering process) • Find a method to overcome ‘’GeoJSON-LD’’ serialization problem Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 12
  • 13. Thank you! Any question? Further details at: http://swa.cefriel.it/geo/ Irene Celino and Gloria Re Calegari CEFRIEL Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 13