Generating Training Data from Noisy Measrements

•Télécharger en tant que PPTX, PDF•

0 j'aime•40 vues

This document discusses generating training data for machine learning models from noisy measurements of land cover classifications. It describes a workflow that uses Sentinel-2 satellite imagery and GlobeLand30 land cover labels to train a random forests model for land cover classification. Key points include: - Sentinel-2 and GlobeLand30 data are used as input, with GlobeLand30 labels filtered and resampled to the Sentinel-2 grid to create reference labels. - A random forests model is trained separately for each Sentinel-2 scene using stratified samples of pixels. - Initial results show 88.75% average accuracy across scenes, with some classes like water predicting well and others like wetlands being more difficult.

Technologie

Generating Training Data from Noisy
Measurements
HAMED ALEMOHAMMAD
LEAD GEOSPATIAL DATA SCIENTIST

ML Hub Earth
 Machine Learning commons for EO
 Training data
 Models
 Standards and best practices

Global Land Cover Training Dataset
 Human-verified training dataset
 Using open-source Sentinel-2 imagery
 10 m spatial resolution.
 Global and geo-diverse

Workflow
S2 L2A
Reflectance
S2 L2A
Classification
GlobeLand30
Labels (2010)
Filtered Labels
Class
Predictions
Class
Verification
(Human)
Model
Training

Data
 Input Data:
 10 Sentinel-2 bands: Red, Green, Blue, Red-Edge1-3, NIR, Narrow NIR, SWIR1-2
 20 m bands scaled to 10m using bi-cubic interpolation
 Reference/Label Data:
 GlobeLand30 labels for 2010 used as a source
 Classes mapped to REF Land Cover Taxonomy
 Labels re-gridded to Sentinel-2 grid using nearest neighbor
 Labels filtered by agreement with classes from Sentinel-2’s 20m scene classification
(produced as part of atmospheric correction)
 Filtered labels used as reference labels for training

Methodology
 A pixel-based supervised Random Forests model trained for each scene.
 Pixels without valid reflectance are excluded from training.
 Training on class-stratified samples of half the pixels in a scene with one
Sentinel-2 pixel at 10 m for each label pixel at 30 m.
 Predictions are made on all pixels marked with usable classes during Level-2A
processing, including pixels labeled as unclassified.
 Annual labels will be generated by aggregating time series of predictions and
probabilities from the same tile throughout the year.

Results
 88.75% average model accuracy across 4 diverse scenes.
 Some classes, like water and snow/ice, predicted with high accuracy and high
confidence across all scenes.
 Other classes, like wetland and (semi) natural vegetation, are subtler and were
expected to be more difficult to classify.
 Woody vegetation and cultivated vegetation were predicted relatively
accurately and not confused with each other, as a result of including 20 m red
edge bands, resampled to 10 m.
 Artificial bare ground tended to be predicted in unclassified regions (in
reference data), taking over areas of natural bare ground and cultivated
vegetation and suggesting that traces of human activity would lead to pixels
classified as artificial bare ground in off-vegetation season.

What about non-categorical variables?
 True value of categorical variables vs true value of continuous variables:
 Crop Yield
 Soil Moisture
 Temperature
 Precipitation
 All measurements of continuous variables are prone to uncertainty (noise and
bias).
 How to reduce/eliminate these uncertainties in training data?

In-SituModel Satellite
Truth
Noisy and biased measurement systems
slide courtesy of K. McColl

Generating Training Dataset
 Triple collocation (TC) is a technique for estimating the unknown error standard
deviations (or RMSEs) of three mutually independent measurement systems,
without treating any one system as zero-error “truth”.
𝑄𝑖𝑗 ≡ 𝐶𝑜𝑣 𝑋𝑖, 𝑋𝑗 𝜎𝜀𝑖
= 𝑄𝑖𝑖 −
𝑄 𝑖𝑗 𝑄𝑖𝑘
𝑄 𝑗𝑘
 TC-based RMSE estimates at each pixel are used to compute a priori probability
(𝑃𝑖) of selecting a particular dataset:
𝑃𝑖 =
1
𝜎𝜀𝑖
2
𝑖=1
3 1
𝜎𝜀𝑖
2

Sample time series of a pixel
𝑋1 𝑋2 𝑋3
𝑡1
𝑡2
𝑡3
𝑡 𝑁
𝑋 𝑇

Alemohammad, et al., Biogeosciences, 2017

Things to check
 Sentinel-2 L2A classes
 What are the usable classes there?
 Plot actual scene + artificial bare ground

Recommandé

igarss11_2.pptgrssieee

GIS work sampleMarvelous Echeng

Plot-Segmentation-PosterTravis Gray

CFD simulation as a tool for evaluation and optimization of uv reactor decont...Jan Rusås

Andy J Humane Near Real Time Monitoring Of Deforestation Using A Neural Aug...guest121fc9

Andy Jarvis and Louis Reymondin - PARASID Near Real Time Monitoring Of Defo...CIAT

Operational Data Fusion Framework for Building Frequent Land sat-Like ImageryKaashivInfoTech Company

Recommandé

igarss11_2.pptgrssieee

GIS work sampleMarvelous Echeng

Plot-Segmentation-PosterTravis Gray

CFD simulation as a tool for evaluation and optimization of uv reactor decont...Jan Rusås

Andy J Humane Near Real Time Monitoring Of Deforestation Using A Neural Aug...guest121fc9

Andy Jarvis and Louis Reymondin - PARASID Near Real Time Monitoring Of Defo...CIAT

Operational Data Fusion Framework for Building Frequent Land sat-Like ImageryKaashivInfoTech Company

Investigation of Chaotic-Type Features in Hyperspectral Satellite Datacsandit

Fragmentation revisited 050902Niels Nielsen

REMOTE SENSINGmusadoto

Retraining maximum likelihood classifiers using low-rank model.pptgrssieee

Распознавание облаков и теней на спутниковых изображениях с использованием гл...Ontico

Hsc 340 10 14CSULB

Maciej soja l3_posterMaciej Soja

Raster data analysisAbdul Raziq

10008-16.antoine_lefebvre2Antoine Lefebvre

MODELING THE CHLOROPHYLL-A FROM SEA SURFACE REFLECTANCE IN WEST AFRICA BY DEE...ijaia

Robust registration of cloudy satellite images using two step segmentationI3E Technologies

Irrera gold2010grssieee

Digital Elevation Model (DEM)Malla Reddy University

Remote sensing e course (Geohydrology)Fatwa Ramdani

Pulvirenti_IGARSS2011.pptgrssieee

Af33174179IJERA Editor

Poster: MMSP 2008Mahfuzul Haque

Separability Analysis of Integrated Spaceborne Radar and Optical Data: Sudan ...rsmahabir

geographic information system pdfRolan Ben Lorono

DRONES IN HYDROLOGYSalvatore Manfreda

Molinier - Feature Selection for Tree Species Identification in Very High res...grssieee

Copernicus Land Moniotring Service PortfolioCLMS

Contenu connexe

Tendances

Investigation of Chaotic-Type Features in Hyperspectral Satellite Datacsandit

Fragmentation revisited 050902Niels Nielsen

REMOTE SENSINGmusadoto

Retraining maximum likelihood classifiers using low-rank model.pptgrssieee

Распознавание облаков и теней на спутниковых изображениях с использованием гл...Ontico

Hsc 340 10 14CSULB

Maciej soja l3_posterMaciej Soja

Raster data analysisAbdul Raziq

10008-16.antoine_lefebvre2Antoine Lefebvre

MODELING THE CHLOROPHYLL-A FROM SEA SURFACE REFLECTANCE IN WEST AFRICA BY DEE...ijaia

Robust registration of cloudy satellite images using two step segmentationI3E Technologies

Irrera gold2010grssieee

Digital Elevation Model (DEM)Malla Reddy University

Remote sensing e course (Geohydrology)Fatwa Ramdani

Pulvirenti_IGARSS2011.pptgrssieee

Af33174179IJERA Editor

Poster: MMSP 2008Mahfuzul Haque

Separability Analysis of Integrated Spaceborne Radar and Optical Data: Sudan ...rsmahabir

geographic information system pdfRolan Ben Lorono

Tendances (19)

Investigation of Chaotic-Type Features in Hyperspectral Satellite Data

Fragmentation revisited 050902

REMOTE SENSING

Retraining maximum likelihood classifiers using low-rank model.ppt

Распознавание облаков и теней на спутниковых изображениях с использованием гл...

Hsc 340 10 14

Maciej soja l3_poster

Raster data analysis

10008-16.antoine_lefebvre2

MODELING THE CHLOROPHYLL-A FROM SEA SURFACE REFLECTANCE IN WEST AFRICA BY DEE...

Robust registration of cloudy satellite images using two step segmentation

Irrera gold2010

Digital Elevation Model (DEM)

Remote sensing e course (Geohydrology)

Pulvirenti_IGARSS2011.ppt

Af33174179

Poster: MMSP 2008

Separability Analysis of Integrated Spaceborne Radar and Optical Data: Sudan ...

geographic information system pdf

Similaire à Generating Training Data from Noisy Measrements

DRONES IN HYDROLOGYSalvatore Manfreda

Molinier - Feature Selection for Tree Species Identification in Very High res...grssieee

Copernicus Land Moniotring Service PortfolioCLMS

IGARSS_2011_GALLOZA.pptxgrssieee

Atmospheric Correction of Remote Sensing Data_RamaRao.pptxssusercd49c0

Use of UAS for Hydrological MonitoringSalvatore Manfreda

Rb euregeo 2012 poster 2Ricardo Brasil

Yang-IGARSS2011-1082.pptxgrssieee

AT_MB_MM_IGARSS2011.pptgrssieee

SIXTEEN CHANNEL, NON-SCANNING AIRBORNE LIDAR SURFACE TOPOGRAPHY (LIST) SIMULATORgrssieee

Failed handoffs in collaborative Wi-Fi networksTELKOMNIKA JOURNAL

WE1.L09 - GLOBAL BIOMASS ESTIMATES FROM DESDYNIgrssieee

Prediction of soil properties with NIR data and site descriptors using prepro...FAO

2013 ASPRS Track, Ozone Modeling for the Contiguous United States by Michael ...GIS in the Rockies

MODELING THE CHLOROPHYLL-A FROM SEA SURFACE REFLECTANCE IN WEST AFRICA BY DEE...gerogepatton

2_Goodenough_IGARSS11_Final.pptgrssieee

Landsat calibration summary_rseAlejandro González Castillo

Kim_WE3_T05_2.pptxgrssieee

Atmospheric Correction of Remotely Sensed Images in Spatial and Transform DomainCSCJournals

Similaire à Generating Training Data from Noisy Measrements (20)

DRONES IN HYDROLOGY

Molinier - Feature Selection for Tree Species Identification in Very High res...

Copernicus Land Moniotring Service Portfolio

IGARSS_2011_GALLOZA.pptx

Atmospheric Correction of Remote Sensing Data_RamaRao.pptx

Use of UAS for Hydrological Monitoring

Rb euregeo 2012 poster 2

Yang-IGARSS2011-1082.pptx

AT_MB_MM_IGARSS2011.ppt

SIXTEEN CHANNEL, NON-SCANNING AIRBORNE LIDAR SURFACE TOPOGRAPHY (LIST) SIMULATOR

Failed handoffs in collaborative Wi-Fi networks

WE1.L09 - GLOBAL BIOMASS ESTIMATES FROM DESDYNI

Prediction of soil properties with NIR data and site descriptors using prepro...

2013 ASPRS Track, Ozone Modeling for the Contiguous United States by Michael ...

MODELING THE CHLOROPHYLL-A FROM SEA SURFACE REFLECTANCE IN WEST AFRICA BY DEE...

2_Goodenough_IGARSS11_Final.ppt

Landsat calibration summary_rse

Kim_WE3_T05_2.pptx

Atmospheric Correction of Remotely Sensed Images in Spatial and Transform Domain

Plus de Louisa Diggs

Workshop: Quantifying Error in Training Data for Mapping and Monitoring the E...Louisa Diggs

Using Active Learning to Quantify how Training Data Errors Impact Classificat...Louisa Diggs

Machine Learning for Better MapsLouisa Diggs

Cropped Field Boundaries, Food Systems, & FireLouisa Diggs

Challenges to Large Scale Mapping: Can Data Geometry Help?Louisa Diggs

A Random Walk of Issues Related to Training Data and Land Cover MappingLouisa Diggs

Assessing Land Cover Change using Uncertain DataLouisa Diggs

Informal Settlements and Cadastral MappingLouisa Diggs

Sources of Map Error in Public Health Activities and Operations ResearchLouisa Diggs

Measuring the impact of label noise on semantic segmentation using rastervisionLouisa Diggs

Mapping Smallholder Yields Using Micro-Satellite DataLouisa Diggs

Crowdsourcing Land Cover and Land Use Data: Experiences from IIASALouisa Diggs

IMED 2018: The use of remote sensing, geostatistical and machine learning met...Louisa Diggs

IMED 2018: Predicting the environmental suitability of podoconiosis in EthiopiaLouisa Diggs

IMED 2018: Landcover/habitatLouisa Diggs

IMED 2018: Modeled Population Estimates from Satellite Imagery and Microcensu...Louisa Diggs

IMED 2018: An intro to Remote Sensing and Machine LearningLouisa Diggs

IMED 2018: Mapping Monkeypox risk in the Congo Basin using Remote Sensing and...Louisa Diggs

IMED 2018: Predicting spatiotemporal risk of yellow fever using a machine lea...Louisa Diggs

IMED 2018: Innovations and Challenges in the Use of Open-source Remote Sensin...Louisa Diggs

Plus de Louisa Diggs (20)

Workshop: Quantifying Error in Training Data for Mapping and Monitoring the E...

Using Active Learning to Quantify how Training Data Errors Impact Classificat...

Machine Learning for Better Maps

Cropped Field Boundaries, Food Systems, & Fire

Challenges to Large Scale Mapping: Can Data Geometry Help?

A Random Walk of Issues Related to Training Data and Land Cover Mapping

Assessing Land Cover Change using Uncertain Data

Informal Settlements and Cadastral Mapping

Sources of Map Error in Public Health Activities and Operations Research

Measuring the impact of label noise on semantic segmentation using rastervision

Mapping Smallholder Yields Using Micro-Satellite Data

Crowdsourcing Land Cover and Land Use Data: Experiences from IIASA

IMED 2018: The use of remote sensing, geostatistical and machine learning met...

IMED 2018: Predicting the environmental suitability of podoconiosis in Ethiopia

IMED 2018: Landcover/habitat

IMED 2018: Modeled Population Estimates from Satellite Imagery and Microcensu...

IMED 2018: An intro to Remote Sensing and Machine Learning

IMED 2018: Mapping Monkeypox risk in the Congo Basin using Remote Sensing and...

IMED 2018: Predicting spatiotemporal risk of yellow fever using a machine lea...

IMED 2018: Innovations and Challenges in the Use of Open-source Remote Sensin...

Dernier

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10

Corporate and higher education May webinar.pptxRustici Software

DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz

MS Copilot expands with MS Graph connectorsNanddeep Nachan

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub

[BuildWithAI] Introduction to Gemini.pdfSandro Moreira

Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays

ICT role in 21st century education and its challengesrafiqahmad00786416

Architecting Cloud Native ApplicationsWSO2

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer

DBX First Quarter 2024 Investor PresentationDropbox

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Dernier (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

Artificial Intelligence Chap.5 : Uncertainty

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

Corporate and higher education May webinar.pptx

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

MS Copilot expands with MS Graph connectors

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...

[BuildWithAI] Introduction to Gemini.pdf

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

ICT role in 21st century education and its challenges

Architecting Cloud Native Applications

Axa Assurance Maroc - Insurer Innovation Award 2024

presentation ICT roal in 21st century education

AXA XL - Insurer Innovation Award Americas 2024

DBX First Quarter 2024 Investor Presentation

Exploring the Future Potential of AI-Enabled Smartphone Processors

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Generating Training Data from Noisy Measrements

1. Generating Training Data from Noisy Measurements HAMED ALEMOHAMMAD LEAD GEOSPATIAL DATA SCIENTIST

2. ML Hub Earth  Machine Learning commons for EO  Training data  Models  Standards and best practices

3. Global Land Cover Training Dataset  Human-verified training dataset  Using open-source Sentinel-2 imagery  10 m spatial resolution.  Global and geo-diverse

4. Workflow S2 L2A Reflectance S2 L2A Classification GlobeLand30 Labels (2010) Filtered Labels Class Predictions Class Verification (Human) Model Training

5. Data  Input Data:  10 Sentinel-2 bands: Red, Green, Blue, Red-Edge1-3, NIR, Narrow NIR, SWIR1-2  20 m bands scaled to 10m using bi-cubic interpolation  Reference/Label Data:  GlobeLand30 labels for 2010 used as a source  Classes mapped to REF Land Cover Taxonomy  Labels re-gridded to Sentinel-2 grid using nearest neighbor  Labels filtered by agreement with classes from Sentinel-2’s 20m scene classification (produced as part of atmospheric correction)  Filtered labels used as reference labels for training

7. Methodology  A pixel-based supervised Random Forests model trained for each scene.  Pixels without valid reflectance are excluded from training.  Training on class-stratified samples of half the pixels in a scene with one Sentinel-2 pixel at 10 m for each label pixel at 30 m.  Predictions are made on all pixels marked with usable classes during Level-2A processing, including pixels labeled as unclassified.  Annual labels will be generated by aggregating time series of predictions and probabilities from the same tile throughout the year.

8. Results  88.75% average model accuracy across 4 diverse scenes.  Some classes, like water and snow/ice, predicted with high accuracy and high confidence across all scenes.  Other classes, like wetland and (semi) natural vegetation, are subtler and were expected to be more difficult to classify.  Woody vegetation and cultivated vegetation were predicted relatively accurately and not confused with each other, as a result of including 20 m red edge bands, resampled to 10 m.  Artificial bare ground tended to be predicted in unclassified regions (in reference data), taking over areas of natural bare ground and cultivated vegetation and suggesting that traces of human activity would lead to pixels classified as artificial bare ground in off-vegetation season.

9. Results

10.

11. What about non-categorical variables?  True value of categorical variables vs true value of continuous variables:  Crop Yield  Soil Moisture  Temperature  Precipitation  All measurements of continuous variables are prone to uncertainty (noise and bias).  How to reduce/eliminate these uncertainties in training data?

12. In-SituModel Satellite Truth Noisy and biased measurement systems slide courtesy of K. McColl

13. Generating Training Dataset  Triple collocation (TC) is a technique for estimating the unknown error standard deviations (or RMSEs) of three mutually independent measurement systems, without treating any one system as zero-error “truth”. 𝑄𝑖𝑗 ≡ 𝐶𝑜𝑣 𝑋𝑖, 𝑋𝑗 𝜎𝜀𝑖 = 𝑄𝑖𝑖 − 𝑄 𝑖𝑗 𝑄𝑖𝑘 𝑄 𝑗𝑘  TC-based RMSE estimates at each pixel are used to compute a priori probability (𝑃𝑖) of selecting a particular dataset: 𝑃𝑖 = 1 𝜎𝜀𝑖 2 𝑖=1 3 1 𝜎𝜀𝑖 2

14. Sample time series of a pixel 𝑋1 𝑋2 𝑋3 𝑡1 𝑡2 𝑡3 𝑡 𝑁 𝑋 𝑇

15.

16.

17. Backup Slides

18. Alemohammad, et al., Biogeosciences, 2017

19. Alemohammad, et al., Biogeosciences, 2017

20. Things to check  Sentinel-2 L2A classes  What are the usable classes there?  Plot actual scene + artificial bare ground