SlideShare une entreprise Scribd logo
1  sur  36
Integrating EO with Official
Statistics using Machine
Learning in Mexico
(Work in progress)
November 5
INEGI
abel.coronado@inegi.org.mx
UNECE Machine Learning Project
UNECE Machine Learning Project
This is one of the pilot projects in the Machine Learning Project of the UNECE High-Level
Group on Modernization of Official Statistics.
UNECE Machine Learning Project
UNECE Machine Learning Project
Some Objectives
• Investigate and demonstrate the value added of ML in the production of official statistics,
where "value added" is increase in relevance, better overall quality or reduction in costs.
• Advance the capability of national statistical organisations to use ML in the production of
official statistics.
• Enhance collaboration between statistical organisations in the development and application
of ML.
Imagery Pilot Project
Imagery Pilot Project
Objective of this Imagery Pilot Project (Practical Application)
Expand the use of imagery data in the production of official statistics through the further
development of knowledge and sharing of ML solutions and practices.
Training
2010 Census
(Random Sample)
Annual Satellite Image
Summary Machine Learning
Annual classifications in
Non-Census years
Urban-Rural Classes2010 Imagery
Imagery Pilot Project
Imagery Pilot Project
• This pilot project proposes the use of Landsat satellite data for the mapping of urban and rural areas
in non-census years, using as ground truth an urban and rural grid of 1km x 1km generated from the
field data of the 2010 Population Census at block level and the updated Georeferenced Business
Register to date 2010.
• This pilot project seeks to take advantage of the knowledge, data, and technologies available to show
an example of the use of satellite images in the generation of geographic information that can be
used to monitor the change in urban extent and if the final product meets the appropriate quality,
explore the path to incorporate it into the continuous urban-rural monitoring processes of INEGI.
• Monitoring the growth of the urban and rural areas of a country would enable the adjustments of
models and samples of the household and business surveys as well as to plan censuses. As the
application matures and improves, it could produce new sets of official statistics on its own.
Urban and Rural Locations in Mexico
4, 525 Urban Locations
187, 722 Rural Locations
Locations in Mexico
Draft of Imagery Pipeline
Draft of Imagery Pipeline
Imagery Pipeline
In order to have a road map.
We build an abstract machine learning pipeline for Imagery
Based on:
IBM CRISP-DM
Cross Industry Standard Process for Data Mining
&
Microsoft Data Science lifecycle
InKyung Choi; Refactoring
Draft Imagery Pipeline
Imagery Pipeline
Pilot Project Pipeline
Establish the problem to be solved
Establish the problem to be solved
Monitor the growth of locations of Mexico (Urban and Rural), which would generate a more
timely input for :
• Cartography update
• Estimation of the population in non-census years
• Related with:
• SDG Indicator 11.3.1: Ratio of land consumption rate to population growth rate
• SDG Indicator 15.3.1: Proportion of land that is degraded over total land area
Identify satellite data to address the problem
Identify satellite data to address the problem
In our case, the data to monitor the growth of cities can be Landsat images (NASA & USGS):
• They are open data.
• There is a constant record since the 70s, although they are available from 1985 to date.
• The spatial resolution of Landsat images is 30 meters.
• Temporal resolution is 16 days and 8 days with combined satellites.
• Spectral resolution, in this pilot project we use 6 spectral bands
• All the data we use is Analysis Ready Data (ARD)
• We take advantage that we have just built the Mexican Geospatial Data Cube, with all this
information.
Translate the problem into statistical problem
Translate the problem into statistical problem
We define that is a classification problem:
• Unit of analysis: 1km x 1km squares covering the whole country: 1’ 975, 719 .
• 3 classes were designated:
• Urban
• Rural
• None of the above
Obtain ground-truth data; 2010 Census Data
Obtain ground-truth data
Obtain layers of geographical information:
• Georeferenced Population Census 2010 (Block Level Aggregation)
• Georeferenced Economic Census (Economic Unit Level)
• Visual Inspection
Obtain ground-truth data; 1km x 1km Grid
Obtain ground-truth data
Build the polygons (1 km – 1 km)
1’ 975, 719 Polygons
Obtain ground-truth data; Urban-Rural Grid
Obtain ground-truth data
Assign Labels
Obtain satellite imagery data
Obtain satellite imagery data
• 32 TB of Images in external discs.
• 90 TB decompressed
• March 4, 2019
• The images are ARD, Analysis Ready Data.
• In essence ARD means that the pixels of the entire time series are aligned and
comparable.
Obtain satellite imagery data
Obtain satellite imagery data
• 36 Years
Obtain satellite imagery data
Obtain satellite imagery data
• 2010 same year of the Census
Use the ODC to generate annual summaries
Obtain satellite imagery data
• 2010 same year of the Census
2,737,273,075 pixels – 35 Gb
Geomedians
Geomedians
Geomedians
Integrate ground-truth data with satellite data
Integrate ground-truth data with satellite data
1 km x 1 km tagged polygon
33 x 33 pixels
30 meters of resolution
33 Pixels
6Bands
6 swir 2------
5 swir 1------
4 nir---------
3 red---------
2 green-------
1 blue--------
Python
Function
Multidimensional Array
NumPy
Take a Random Sample
Take a Random Sample
Define features and generate a dataset to be used for analysis
Define features and generate a dataset to be used for analysis
• Feature Engineering
30,000 tagged regions
1km x 1km
33 pixels x 33 pixels x 6 Bands
30,000
4,305
30,000 rows in a data matrix
?
Define features and generate a dataset to be used for analysis
Define features and generate a dataset to be used for analysis
• Feature Engineering
30,000 tagged regions
1km x 1km
33 pixels x 33 pixels x 6 Bands
30,000
4,305
30,000 rows in a data matrix
Auto Machine Learning
https://epistasislab.github.io/tpot/
Evaluate ML
Random Sample
30,000 Elements
Machine Learning
Result of TPOT
2010 National
Classification
78% Accuracy
Training
2010 Geomedian
Classification
30,000
4,305
30,000 rows in a data matrix
Detailed results
Detailed results; First Iteration & Second Iteration
Next Steps
Finish Third Iteration
At this moment finishing third iteration
we seek to improve improve 2 or 3 %
Next Steps
Next Steps
• Finish a third iteration, no later than one month.
• The grid is also an important innovation and is evolving, it is likely
that we will have a new version very soon and we should update
the classification.
• Apply the model to un-labelled years, for example 2019 first
semester.
• Validate a sample of the un-labelled year, requesting support for
the area of visual interpretation of the geography division, to have
a measure of product quality.
• Hold more meetings with the area of sociodemographic statistics,
involve them in the exercise to improve the potential benefit in the
population estimate.
• Hold more meetings with the area of statistical research, to
receive feedback and observations that can improve the product.
• Hold meetings with the cartographic update area, to receive
feedback.
GRACIAS!
@abxda
Integrating eo with official statistics using machine learning in mexico geo week

Contenu connexe

Tendances

Luo-IGARSS2011-2385.ppt
Luo-IGARSS2011-2385.pptLuo-IGARSS2011-2385.ppt
Luo-IGARSS2011-2385.pptgrssieee
 
Identifying Land Patterns from Satellite Images using Deep Learning
Identifying Land Patterns from Satellite Images using Deep LearningIdentifying Land Patterns from Satellite Images using Deep Learning
Identifying Land Patterns from Satellite Images using Deep LearningSoumyadeep Debnath
 
Processing of raw astronomical data of large volume by map reduce model
Processing of raw astronomical data of large volume by map reduce modelProcessing of raw astronomical data of large volume by map reduce model
Processing of raw astronomical data of large volume by map reduce modelSergey Gerasimov
 
Swiss Territorial Data Lab - geo Data Science - colloque FHNW
Swiss Territorial Data Lab - geo Data Science - colloque FHNWSwiss Territorial Data Lab - geo Data Science - colloque FHNW
Swiss Territorial Data Lab - geo Data Science - colloque FHNWRaphael Rollier
 
GEM Risk: main achievements during the first implementation phase
GEM Risk: main achievements during the first implementation phaseGEM Risk: main achievements during the first implementation phase
GEM Risk: main achievements during the first implementation phaseGlobal Earthquake Model Foundation
 
Using Deep Learning to Derive 3D Cities from Satellite Imagery
Using Deep Learning to Derive 3D Cities from Satellite ImageryUsing Deep Learning to Derive 3D Cities from Satellite Imagery
Using Deep Learning to Derive 3D Cities from Satellite ImageryAstraea, Inc.
 
A new generation of data to monitor landscapes across the tropics
A new generation of data to monitor landscapes across the tropicsA new generation of data to monitor landscapes across the tropics
A new generation of data to monitor landscapes across the tropicsCIFOR-ICRAF
 
State of the Map US 2018: Analytic Support to Mapping Contributors
State of the Map US 2018: Analytic Support to Mapping ContributorsState of the Map US 2018: Analytic Support to Mapping Contributors
State of the Map US 2018: Analytic Support to Mapping Contributorsrlewis48
 
Spectral_classification_of_WorldView2_multiangle_sequence.pptx
Spectral_classification_of_WorldView2_multiangle_sequence.pptxSpectral_classification_of_WorldView2_multiangle_sequence.pptx
Spectral_classification_of_WorldView2_multiangle_sequence.pptxgrssieee
 
The Seismic Hazard Modeller’s Toolkit: An Open-Source Library for the Const...
The Seismic Hazard Modeller’s Toolkit:  An Open-Source Library for the Const...The Seismic Hazard Modeller’s Toolkit:  An Open-Source Library for the Const...
The Seismic Hazard Modeller’s Toolkit: An Open-Source Library for the Const...Global Earthquake Model Foundation
 
Automated Summarisation of Big Data, useR! 2018
Automated Summarisation of Big Data, useR! 2018  Automated Summarisation of Big Data, useR! 2018
Automated Summarisation of Big Data, useR! 2018 Amy Stringer
 
2nd e-ROSA Stakeholder Workshop: By, EO Based Global Public goods
2nd e-ROSA Stakeholder Workshop: By, EO Based Global Public goods2nd e-ROSA Stakeholder Workshop: By, EO Based Global Public goods
2nd e-ROSA Stakeholder Workshop: By, EO Based Global Public goodse-ROSA
 
How to set achievable tree canopy goals
How to set achievable tree canopy goalsHow to set achievable tree canopy goals
How to set achievable tree canopy goalsJosh Behounek
 
07 a70110 remotesensingandgisapplications
07 a70110 remotesensingandgisapplications07 a70110 remotesensingandgisapplications
07 a70110 remotesensingandgisapplicationsimaduddin91
 
Land Use/Land Cover Detection
Land Use/Land Cover DetectionLand Use/Land Cover Detection
Land Use/Land Cover DetectionWansoo Im
 
Bayesian Hilbert Maps for Dynamic Continuous Occupancy Mapping
Bayesian Hilbert Maps for Dynamic Continuous Occupancy MappingBayesian Hilbert Maps for Dynamic Continuous Occupancy Mapping
Bayesian Hilbert Maps for Dynamic Continuous Occupancy MappingRansalu Senanayake
 

Tendances (18)

Luo-IGARSS2011-2385.ppt
Luo-IGARSS2011-2385.pptLuo-IGARSS2011-2385.ppt
Luo-IGARSS2011-2385.ppt
 
Identifying Land Patterns from Satellite Images using Deep Learning
Identifying Land Patterns from Satellite Images using Deep LearningIdentifying Land Patterns from Satellite Images using Deep Learning
Identifying Land Patterns from Satellite Images using Deep Learning
 
Processing of raw astronomical data of large volume by map reduce model
Processing of raw astronomical data of large volume by map reduce modelProcessing of raw astronomical data of large volume by map reduce model
Processing of raw astronomical data of large volume by map reduce model
 
Swiss Territorial Data Lab - geo Data Science - colloque FHNW
Swiss Territorial Data Lab - geo Data Science - colloque FHNWSwiss Territorial Data Lab - geo Data Science - colloque FHNW
Swiss Territorial Data Lab - geo Data Science - colloque FHNW
 
GEM Risk: main achievements during the first implementation phase
GEM Risk: main achievements during the first implementation phaseGEM Risk: main achievements during the first implementation phase
GEM Risk: main achievements during the first implementation phase
 
Using Deep Learning to Derive 3D Cities from Satellite Imagery
Using Deep Learning to Derive 3D Cities from Satellite ImageryUsing Deep Learning to Derive 3D Cities from Satellite Imagery
Using Deep Learning to Derive 3D Cities from Satellite Imagery
 
A new generation of data to monitor landscapes across the tropics
A new generation of data to monitor landscapes across the tropicsA new generation of data to monitor landscapes across the tropics
A new generation of data to monitor landscapes across the tropics
 
Casos de uso do FME no IBGE
Casos de uso do FME no IBGECasos de uso do FME no IBGE
Casos de uso do FME no IBGE
 
State of the Map US 2018: Analytic Support to Mapping Contributors
State of the Map US 2018: Analytic Support to Mapping ContributorsState of the Map US 2018: Analytic Support to Mapping Contributors
State of the Map US 2018: Analytic Support to Mapping Contributors
 
Spectral_classification_of_WorldView2_multiangle_sequence.pptx
Spectral_classification_of_WorldView2_multiangle_sequence.pptxSpectral_classification_of_WorldView2_multiangle_sequence.pptx
Spectral_classification_of_WorldView2_multiangle_sequence.pptx
 
The Seismic Hazard Modeller’s Toolkit: An Open-Source Library for the Const...
The Seismic Hazard Modeller’s Toolkit:  An Open-Source Library for the Const...The Seismic Hazard Modeller’s Toolkit:  An Open-Source Library for the Const...
The Seismic Hazard Modeller’s Toolkit: An Open-Source Library for the Const...
 
Automated Summarisation of Big Data, useR! 2018
Automated Summarisation of Big Data, useR! 2018  Automated Summarisation of Big Data, useR! 2018
Automated Summarisation of Big Data, useR! 2018
 
2nd e-ROSA Stakeholder Workshop: By, EO Based Global Public goods
2nd e-ROSA Stakeholder Workshop: By, EO Based Global Public goods2nd e-ROSA Stakeholder Workshop: By, EO Based Global Public goods
2nd e-ROSA Stakeholder Workshop: By, EO Based Global Public goods
 
How to set achievable tree canopy goals
How to set achievable tree canopy goalsHow to set achievable tree canopy goals
How to set achievable tree canopy goals
 
Big data in GIS Environment
Big data in GIS Environment Big data in GIS Environment
Big data in GIS Environment
 
07 a70110 remotesensingandgisapplications
07 a70110 remotesensingandgisapplications07 a70110 remotesensingandgisapplications
07 a70110 remotesensingandgisapplications
 
Land Use/Land Cover Detection
Land Use/Land Cover DetectionLand Use/Land Cover Detection
Land Use/Land Cover Detection
 
Bayesian Hilbert Maps for Dynamic Continuous Occupancy Mapping
Bayesian Hilbert Maps for Dynamic Continuous Occupancy MappingBayesian Hilbert Maps for Dynamic Continuous Occupancy Mapping
Bayesian Hilbert Maps for Dynamic Continuous Occupancy Mapping
 

Similaire à Integrating eo with official statistics using machine learning in mexico geo week

Smart Urban Planning Support through Web Data Science on Open and Enterprise ...
Smart Urban Planning Support through Web Data Science on Open and Enterprise ...Smart Urban Planning Support through Web Data Science on Open and Enterprise ...
Smart Urban Planning Support through Web Data Science on Open and Enterprise ...Gloria Re Calegari
 
Singapore 3D Map for a Smart Nation
Singapore 3D Map for a Smart NationSingapore 3D Map for a Smart Nation
Singapore 3D Map for a Smart NationMonica Moran
 
NextGEOSS: The Next Generation European Data Hub and Cloud Platform for Earth...
NextGEOSS: The Next Generation European Data Hub and Cloud Platform for Earth...NextGEOSS: The Next Generation European Data Hub and Cloud Platform for Earth...
NextGEOSS: The Next Generation European Data Hub and Cloud Platform for Earth...Wolfgang Ksoll
 
Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2University of Salerno
 
Introduction to Computational Statistics
Introduction to Computational StatisticsIntroduction to Computational Statistics
Introduction to Computational StatisticsSetia Pramana
 
Analysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approachAnalysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approachLorenzo Cesaretti
 
[DSC Europe 22] Land Boundary Delineation using U-net - Theophilus Aidoo
[DSC Europe 22] Land Boundary Delineation using U-net - Theophilus Aidoo[DSC Europe 22] Land Boundary Delineation using U-net - Theophilus Aidoo
[DSC Europe 22] Land Boundary Delineation using U-net - Theophilus AidooDataScienceConferenc1
 
12 SuperAI on Supercomputers
12 SuperAI on Supercomputers12 SuperAI on Supercomputers
12 SuperAI on SupercomputersRCCSRENKEI
 
System for Land-Based Emissions Estimation in Kenya
System for Land-Based Emissions Estimation in KenyaSystem for Land-Based Emissions Estimation in Kenya
System for Land-Based Emissions Estimation in Kenyaipcc-media
 
flat_presentation_time_evolving_OD_matrix_estimation
flat_presentation_time_evolving_OD_matrix_estimationflat_presentation_time_evolving_OD_matrix_estimation
flat_presentation_time_evolving_OD_matrix_estimationLuís Moreira-Matias
 
Open Land Use - the Current Status and Steps Forward
Open Land Use - the Current Status and Steps ForwardOpen Land Use - the Current Status and Steps Forward
Open Land Use - the Current Status and Steps Forwardplan4all
 
Crowd sourcing gis for global urban area mapping
Crowd sourcing gis for global urban area mappingCrowd sourcing gis for global urban area mapping
Crowd sourcing gis for global urban area mappingHiroyuki Miyazaki
 

Similaire à Integrating eo with official statistics using machine learning in mexico geo week (20)

Machine learning and Satellite Images
Machine learning and Satellite ImagesMachine learning and Satellite Images
Machine learning and Satellite Images
 
ONS local presents clustering
ONS local presents clusteringONS local presents clustering
ONS local presents clustering
 
Smart Urban Planning Support through Web Data Science on Open and Enterprise ...
Smart Urban Planning Support through Web Data Science on Open and Enterprise ...Smart Urban Planning Support through Web Data Science on Open and Enterprise ...
Smart Urban Planning Support through Web Data Science on Open and Enterprise ...
 
Singapore 3D Map for a Smart Nation
Singapore 3D Map for a Smart NationSingapore 3D Map for a Smart Nation
Singapore 3D Map for a Smart Nation
 
Multimedia Mining
Multimedia Mining Multimedia Mining
Multimedia Mining
 
Tanay Chaudhari_SIP PPT
Tanay Chaudhari_SIP PPTTanay Chaudhari_SIP PPT
Tanay Chaudhari_SIP PPT
 
NextGEOSS: The Next Generation European Data Hub and Cloud Platform for Earth...
NextGEOSS: The Next Generation European Data Hub and Cloud Platform for Earth...NextGEOSS: The Next Generation European Data Hub and Cloud Platform for Earth...
NextGEOSS: The Next Generation European Data Hub and Cloud Platform for Earth...
 
Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2
 
Introduction to Computational Statistics
Introduction to Computational StatisticsIntroduction to Computational Statistics
Introduction to Computational Statistics
 
ML & Decision Making
ML & Decision MakingML & Decision Making
ML & Decision Making
 
Analysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approachAnalysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approach
 
[DSC Europe 22] Land Boundary Delineation using U-net - Theophilus Aidoo
[DSC Europe 22] Land Boundary Delineation using U-net - Theophilus Aidoo[DSC Europe 22] Land Boundary Delineation using U-net - Theophilus Aidoo
[DSC Europe 22] Land Boundary Delineation using U-net - Theophilus Aidoo
 
mohamed saber c.v
mohamed saber c.vmohamed saber c.v
mohamed saber c.v
 
mohamed saber c.v
mohamed saber c.vmohamed saber c.v
mohamed saber c.v
 
12 SuperAI on Supercomputers
12 SuperAI on Supercomputers12 SuperAI on Supercomputers
12 SuperAI on Supercomputers
 
System for Land-Based Emissions Estimation in Kenya
System for Land-Based Emissions Estimation in KenyaSystem for Land-Based Emissions Estimation in Kenya
System for Land-Based Emissions Estimation in Kenya
 
flat_presentation_time_evolving_OD_matrix_estimation
flat_presentation_time_evolving_OD_matrix_estimationflat_presentation_time_evolving_OD_matrix_estimation
flat_presentation_time_evolving_OD_matrix_estimation
 
Open Land Use - the Current Status and Steps Forward
Open Land Use - the Current Status and Steps ForwardOpen Land Use - the Current Status and Steps Forward
Open Land Use - the Current Status and Steps Forward
 
Resume final upload pdf
Resume final upload pdfResume final upload pdf
Resume final upload pdf
 
Crowd sourcing gis for global urban area mapping
Crowd sourcing gis for global urban area mappingCrowd sourcing gis for global urban area mapping
Crowd sourcing gis for global urban area mapping
 

Plus de Abel Alejandro Coronado Iruegas

Analisis del Sentimiento en el Estado de Animo de los Tuiteros en Mexico
Analisis del Sentimiento en el Estado de Animo de los Tuiteros en MexicoAnalisis del Sentimiento en el Estado de Animo de los Tuiteros en Mexico
Analisis del Sentimiento en el Estado de Animo de los Tuiteros en MexicoAbel Alejandro Coronado Iruegas
 
Ejemplos de Proyectos de Ciencia de Datos y Big Data en el INEGI
Ejemplos de Proyectos de Ciencia de Datos y Big Data en el INEGIEjemplos de Proyectos de Ciencia de Datos y Big Data en el INEGI
Ejemplos de Proyectos de Ciencia de Datos y Big Data en el INEGIAbel Alejandro Coronado Iruegas
 

Plus de Abel Alejandro Coronado Iruegas (20)

Mobility Master Class.pdf
Mobility Master Class.pdfMobility Master Class.pdf
Mobility Master Class.pdf
 
Live UAEMex Cubo de Datos Geoespaciales de Mexico
Live UAEMex Cubo de Datos Geoespaciales de MexicoLive UAEMex Cubo de Datos Geoespaciales de Mexico
Live UAEMex Cubo de Datos Geoespaciales de Mexico
 
Cubo de datos uaemex
Cubo de datos uaemexCubo de datos uaemex
Cubo de datos uaemex
 
Geo Big Data 4 Datalab
Geo Big Data 4 DatalabGeo Big Data 4 Datalab
Geo Big Data 4 Datalab
 
Catedra INEGI Big Data en IBERO
Catedra INEGI Big Data en IBEROCatedra INEGI Big Data en IBERO
Catedra INEGI Big Data en IBERO
 
El Cubo de Datos Geoespaciales de Mexico
El Cubo de Datos Geoespaciales de MexicoEl Cubo de Datos Geoespaciales de Mexico
El Cubo de Datos Geoespaciales de Mexico
 
No Sql
No SqlNo Sql
No Sql
 
Cubo de Datos Geoespaciales de Mexico
Cubo de Datos Geoespaciales de MexicoCubo de Datos Geoespaciales de Mexico
Cubo de Datos Geoespaciales de Mexico
 
Congreso UAA 2018 Animo Tuitero 2 0
Congreso UAA 2018 Animo Tuitero 2 0Congreso UAA 2018 Animo Tuitero 2 0
Congreso UAA 2018 Animo Tuitero 2 0
 
Analisis del Sentimiento en el Estado de Animo de los Tuiteros en Mexico
Analisis del Sentimiento en el Estado de Animo de los Tuiteros en MexicoAnalisis del Sentimiento en el Estado de Animo de los Tuiteros en Mexico
Analisis del Sentimiento en el Estado de Animo de los Tuiteros en Mexico
 
Ejemplos de Proyectos de Ciencia de Datos y Big Data en el INEGI
Ejemplos de Proyectos de Ciencia de Datos y Big Data en el INEGIEjemplos de Proyectos de Ciencia de Datos y Big Data en el INEGI
Ejemplos de Proyectos de Ciencia de Datos y Big Data en el INEGI
 
Big data big opportunities
Big data big opportunitiesBig data big opportunities
Big data big opportunities
 
Big data taller inegi sedesol
Big data taller inegi sedesolBig data taller inegi sedesol
Big data taller inegi sedesol
 
INEGI ESS big data workshop
INEGI ESS big data workshopINEGI ESS big data workshop
INEGI ESS big data workshop
 
Taller de Big Data y Ciencia de Datos en COLMEX dia 2
Taller de Big Data y Ciencia de Datos en COLMEX dia 2Taller de Big Data y Ciencia de Datos en COLMEX dia 2
Taller de Big Data y Ciencia de Datos en COLMEX dia 2
 
Taller de Big Data y Ciencia de Datos en COLMEX dia 1
Taller de Big Data y Ciencia de Datos en COLMEX dia 1 Taller de Big Data y Ciencia de Datos en COLMEX dia 1
Taller de Big Data y Ciencia de Datos en COLMEX dia 1
 
Big data lead colmex
Big data lead colmexBig data lead colmex
Big data lead colmex
 
Explorando Big Data y Ciencia de Datos con GPUs
Explorando Big Data y Ciencia de Datos con GPUsExplorando Big Data y Ciencia de Datos con GPUs
Explorando Big Data y Ciencia de Datos con GPUs
 
Geo Big Data 2015
Geo Big Data 2015 Geo Big Data 2015
Geo Big Data 2015
 
Anatomía de un proyecto de Big Data
Anatomía de un proyecto de Big DataAnatomía de un proyecto de Big Data
Anatomía de un proyecto de Big Data
 

Dernier

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 

Dernier (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 

Integrating eo with official statistics using machine learning in mexico geo week

  • 1. Integrating EO with Official Statistics using Machine Learning in Mexico (Work in progress) November 5 INEGI abel.coronado@inegi.org.mx
  • 2. UNECE Machine Learning Project UNECE Machine Learning Project This is one of the pilot projects in the Machine Learning Project of the UNECE High-Level Group on Modernization of Official Statistics.
  • 3. UNECE Machine Learning Project UNECE Machine Learning Project Some Objectives • Investigate and demonstrate the value added of ML in the production of official statistics, where "value added" is increase in relevance, better overall quality or reduction in costs. • Advance the capability of national statistical organisations to use ML in the production of official statistics. • Enhance collaboration between statistical organisations in the development and application of ML.
  • 5. Imagery Pilot Project Objective of this Imagery Pilot Project (Practical Application) Expand the use of imagery data in the production of official statistics through the further development of knowledge and sharing of ML solutions and practices. Training 2010 Census (Random Sample) Annual Satellite Image Summary Machine Learning Annual classifications in Non-Census years Urban-Rural Classes2010 Imagery
  • 6. Imagery Pilot Project Imagery Pilot Project • This pilot project proposes the use of Landsat satellite data for the mapping of urban and rural areas in non-census years, using as ground truth an urban and rural grid of 1km x 1km generated from the field data of the 2010 Population Census at block level and the updated Georeferenced Business Register to date 2010. • This pilot project seeks to take advantage of the knowledge, data, and technologies available to show an example of the use of satellite images in the generation of geographic information that can be used to monitor the change in urban extent and if the final product meets the appropriate quality, explore the path to incorporate it into the continuous urban-rural monitoring processes of INEGI. • Monitoring the growth of the urban and rural areas of a country would enable the adjustments of models and samples of the household and business surveys as well as to plan censuses. As the application matures and improves, it could produce new sets of official statistics on its own.
  • 7. Urban and Rural Locations in Mexico 4, 525 Urban Locations 187, 722 Rural Locations Locations in Mexico
  • 8. Draft of Imagery Pipeline
  • 9. Draft of Imagery Pipeline Imagery Pipeline In order to have a road map. We build an abstract machine learning pipeline for Imagery Based on: IBM CRISP-DM Cross Industry Standard Process for Data Mining & Microsoft Data Science lifecycle InKyung Choi; Refactoring
  • 12. Establish the problem to be solved Establish the problem to be solved Monitor the growth of locations of Mexico (Urban and Rural), which would generate a more timely input for : • Cartography update • Estimation of the population in non-census years • Related with: • SDG Indicator 11.3.1: Ratio of land consumption rate to population growth rate • SDG Indicator 15.3.1: Proportion of land that is degraded over total land area
  • 13. Identify satellite data to address the problem Identify satellite data to address the problem In our case, the data to monitor the growth of cities can be Landsat images (NASA & USGS): • They are open data. • There is a constant record since the 70s, although they are available from 1985 to date. • The spatial resolution of Landsat images is 30 meters. • Temporal resolution is 16 days and 8 days with combined satellites. • Spectral resolution, in this pilot project we use 6 spectral bands • All the data we use is Analysis Ready Data (ARD) • We take advantage that we have just built the Mexican Geospatial Data Cube, with all this information.
  • 14. Translate the problem into statistical problem Translate the problem into statistical problem We define that is a classification problem: • Unit of analysis: 1km x 1km squares covering the whole country: 1’ 975, 719 . • 3 classes were designated: • Urban • Rural • None of the above
  • 15. Obtain ground-truth data; 2010 Census Data Obtain ground-truth data Obtain layers of geographical information: • Georeferenced Population Census 2010 (Block Level Aggregation) • Georeferenced Economic Census (Economic Unit Level) • Visual Inspection
  • 16. Obtain ground-truth data; 1km x 1km Grid Obtain ground-truth data Build the polygons (1 km – 1 km) 1’ 975, 719 Polygons
  • 17. Obtain ground-truth data; Urban-Rural Grid Obtain ground-truth data Assign Labels
  • 18. Obtain satellite imagery data Obtain satellite imagery data • 32 TB of Images in external discs. • 90 TB decompressed • March 4, 2019 • The images are ARD, Analysis Ready Data. • In essence ARD means that the pixels of the entire time series are aligned and comparable.
  • 19. Obtain satellite imagery data Obtain satellite imagery data • 36 Years
  • 20. Obtain satellite imagery data Obtain satellite imagery data • 2010 same year of the Census
  • 21. Use the ODC to generate annual summaries Obtain satellite imagery data • 2010 same year of the Census 2,737,273,075 pixels – 35 Gb
  • 25. Integrate ground-truth data with satellite data Integrate ground-truth data with satellite data 1 km x 1 km tagged polygon 33 x 33 pixels 30 meters of resolution 33 Pixels 6Bands 6 swir 2------ 5 swir 1------ 4 nir--------- 3 red--------- 2 green------- 1 blue-------- Python Function Multidimensional Array NumPy
  • 26. Take a Random Sample Take a Random Sample
  • 27. Define features and generate a dataset to be used for analysis Define features and generate a dataset to be used for analysis • Feature Engineering 30,000 tagged regions 1km x 1km 33 pixels x 33 pixels x 6 Bands 30,000 4,305 30,000 rows in a data matrix ?
  • 28. Define features and generate a dataset to be used for analysis Define features and generate a dataset to be used for analysis • Feature Engineering 30,000 tagged regions 1km x 1km 33 pixels x 33 pixels x 6 Bands 30,000 4,305 30,000 rows in a data matrix
  • 30. Evaluate ML Random Sample 30,000 Elements Machine Learning Result of TPOT 2010 National Classification 78% Accuracy Training 2010 Geomedian Classification 30,000 4,305 30,000 rows in a data matrix
  • 31. Detailed results Detailed results; First Iteration & Second Iteration
  • 33. Finish Third Iteration At this moment finishing third iteration we seek to improve improve 2 or 3 %
  • 34. Next Steps Next Steps • Finish a third iteration, no later than one month. • The grid is also an important innovation and is evolving, it is likely that we will have a new version very soon and we should update the classification. • Apply the model to un-labelled years, for example 2019 first semester. • Validate a sample of the un-labelled year, requesting support for the area of visual interpretation of the geography division, to have a measure of product quality. • Hold more meetings with the area of sociodemographic statistics, involve them in the exercise to improve the potential benefit in the population estimate. • Hold more meetings with the area of statistical research, to receive feedback and observations that can improve the product. • Hold meetings with the cartographic update area, to receive feedback.