Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Generating Training Data from Noisy Measrements

16 vues

Publié le

Quantifying Error in Training Data for Mapping and Monitoring the Earth System - A Workshop on “Quantifying Error in Training Data for Mapping and Monitoring the Earth System” was held on January 8-9, 2019 at Clark University, with support from Omidyar Network’s Property Rights Initiative, now PlaceFund.

Publié dans : Technologie
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Generating Training Data from Noisy Measrements

  1. 1. Generating Training Data from Noisy Measurements HAMED ALEMOHAMMAD LEAD GEOSPATIAL DATA SCIENTIST
  2. 2. ML Hub Earth  Machine Learning commons for EO  Training data  Models  Standards and best practices
  3. 3. Global Land Cover Training Dataset  Human-verified training dataset  Using open-source Sentinel-2 imagery  10 m spatial resolution.  Global and geo-diverse
  4. 4. Workflow S2 L2A Reflectance S2 L2A Classification GlobeLand30 Labels (2010) Filtered Labels Class Predictions Class Verification (Human) Model Training
  5. 5. Data  Input Data:  10 Sentinel-2 bands: Red, Green, Blue, Red-Edge1-3, NIR, Narrow NIR, SWIR1-2  20 m bands scaled to 10m using bi-cubic interpolation  Reference/Label Data:  GlobeLand30 labels for 2010 used as a source  Classes mapped to REF Land Cover Taxonomy  Labels re-gridded to Sentinel-2 grid using nearest neighbor  Labels filtered by agreement with classes from Sentinel-2’s 20m scene classification (produced as part of atmospheric correction)  Filtered labels used as reference labels for training
  6. 6. Methodology  A pixel-based supervised Random Forests model trained for each scene.  Pixels without valid reflectance are excluded from training.  Training on class-stratified samples of half the pixels in a scene with one Sentinel-2 pixel at 10 m for each label pixel at 30 m.  Predictions are made on all pixels marked with usable classes during Level-2A processing, including pixels labeled as unclassified.  Annual labels will be generated by aggregating time series of predictions and probabilities from the same tile throughout the year.
  7. 7. Results  88.75% average model accuracy across 4 diverse scenes.  Some classes, like water and snow/ice, predicted with high accuracy and high confidence across all scenes.  Other classes, like wetland and (semi) natural vegetation, are subtler and were expected to be more difficult to classify.  Woody vegetation and cultivated vegetation were predicted relatively accurately and not confused with each other, as a result of including 20 m red edge bands, resampled to 10 m.  Artificial bare ground tended to be predicted in unclassified regions (in reference data), taking over areas of natural bare ground and cultivated vegetation and suggesting that traces of human activity would lead to pixels classified as artificial bare ground in off-vegetation season.
  8. 8. Results
  9. 9. What about non-categorical variables?  True value of categorical variables vs true value of continuous variables:  Crop Yield  Soil Moisture  Temperature  Precipitation  All measurements of continuous variables are prone to uncertainty (noise and bias).  How to reduce/eliminate these uncertainties in training data?
  10. 10. In-SituModel Satellite Truth Noisy and biased measurement systems slide courtesy of K. McColl
  11. 11. Generating Training Dataset  Triple collocation (TC) is a technique for estimating the unknown error standard deviations (or RMSEs) of three mutually independent measurement systems, without treating any one system as zero-error “truth”. 𝑄𝑖𝑗 ≡ 𝐶𝑜𝑣 𝑋𝑖, 𝑋𝑗 𝜎𝜀𝑖 = 𝑄𝑖𝑖 − 𝑄 𝑖𝑗 𝑄𝑖𝑘 𝑄 𝑗𝑘  TC-based RMSE estimates at each pixel are used to compute a priori probability (𝑃𝑖) of selecting a particular dataset: 𝑃𝑖 = 1 𝜎𝜀𝑖 2 𝑖=1 3 1 𝜎𝜀𝑖 2
  12. 12. Sample time series of a pixel 𝑋1 𝑋2 𝑋3 𝑡1 𝑡2 𝑡3 𝑡 𝑁 𝑋 𝑇
  13. 13. Backup Slides
  14. 14. Alemohammad, et al., Biogeosciences, 2017
  15. 15. Alemohammad, et al., Biogeosciences, 2017
  16. 16. Things to check  Sentinel-2 L2A classes  What are the usable classes there?  Plot actual scene + artificial bare ground