Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Using active learning to quantify how training
data errors impact classification accuracy over
smallholder-dominated agric...
AWS Cloud Credits for Research Program
IIASA
Stephanie Debats Ryan Avery Su YeLei Song Sitian Xiong
Problem 1: High spatial variability
Problem 2: High temporal variability
Bing Base Map
PlanetScope Analytic
Problem 3: Interpretation errors in training data
High spatial & temporal resolution imagery
Active Learning
01
00
11
Train
Predict
Select
Re-label
Label
Debats et al, 2017
Study Region
prob 

(%)
prob 

(%)
Growing Season
Dry Season
Labelling component:
Crowdsourcing Platform
True positive (TP) False positive (FP) False negative (FN) True negative (TN)
score = in_accuracy * β0 +
out_accuracy * β1 +
fragmentation * β2 +
edge_accuracy * β3 +
categorical_accuracy * β4
Accuracy assessment and consensus labelling
Probability
Bayesian Model Averaging
Label collection
! " # = % ! &' # !("|#, &')
,
'-.
Bayesian Model Averaging
Heat map
! " # = % ! &' # !("|#, &')
,
'-.
Consensus Label
Probability
Debats et al (2016)
A generalized computer vision approach to mapping crop fields in
heterogeneous agricultural landscapes...
Does Training Data Error Impact Classification Performance?
Next Steps
1. Errors in image atmospheric corrections
2. Increase feature space for classifier
3. Improve label quality
4....
Worker map
Ground truth(y)
Where lies the truth?
8
Circle Bias, many
false positive
identified because
of overreliance on
circular features
https://github.com/ecoh
ydro/Cr...
Probability
score above
.7 deemed a
center pivot
Tested on
never before
seen
512x512 tiles
11
Some center
pivots are
misse...
BAYESIAN MODEL AVERAGING:
! " # = %
&'(
)
! *& # !("|#, *&)
": the ground truth, which will be either ‘field’ or ‘no field...
MAPPER OPINION
In our mapping project, mappers are allowed to only label a crispy category for polygons (either ‘field’ or...
WEIGHT
Weight: ! "# $ ∝ ! $ "# !("#)
(1) !("#): ‘mapper priors’, is our prior belief for mapper '. We can use average scor...
WEIGHT (CONTI.)
Weight: ! "# $ ∝ ! $ "# !("#)
Mapper likelihood: ' ( )* ∝ + ( ,-, )* (Maximum Mapper likelihood)
(1) !(- =...
SUMMARY
! " # = ∑&'(
)
! *& # !("|#, *&)
weight = score ∗ producer′s accuracy ∝ P M8 D
P("|D, M8) = 0 ;< 1
Labeling:
If ! ...
Using Active Learning to Quantify how Training Data Errors Impact Classification Accuracy over Smallholder-Dominated Agric...
Using Active Learning to Quantify how Training Data Errors Impact Classification Accuracy over Smallholder-Dominated Agric...
Using Active Learning to Quantify how Training Data Errors Impact Classification Accuracy over Smallholder-Dominated Agric...
Using Active Learning to Quantify how Training Data Errors Impact Classification Accuracy over Smallholder-Dominated Agric...
Using Active Learning to Quantify how Training Data Errors Impact Classification Accuracy over Smallholder-Dominated Agric...
Prochain SlideShare
Chargement dans…5
×

Using Active Learning to Quantify how Training Data Errors Impact Classification Accuracy over Smallholder-Dominated Agricultural Systems

15 vues

Publié le

Quantifying Error in Training Data for Mapping and Monitoring the Earth System - A Workshop on “Quantifying Error in Training Data for Mapping and Monitoring the Earth System” was held on January 8-9, 2019 at Clark University, with support from Omidyar Network’s Property Rights Initiative, now PlaceFund.

Publié dans : Technologie
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Using Active Learning to Quantify how Training Data Errors Impact Classification Accuracy over Smallholder-Dominated Agricultural Systems

  1. 1. Using active learning to quantify how training data errors impact classification accuracy over smallholder-dominated agricultural systems Stephanie Debats, Lei Song, Su Ye, Sitian Xiong, Kaixi Zhang, Tammy Woodard, Ron Eastman, Ryan Avery, Kelly Caylor, Dennis McRitchie, Lyndon Estes Clark University|Clark Labs University of California Santa Barbara
  2. 2. AWS Cloud Credits for Research Program IIASA
  3. 3. Stephanie Debats Ryan Avery Su YeLei Song Sitian Xiong
  4. 4. Problem 1: High spatial variability
  5. 5. Problem 2: High temporal variability Bing Base Map PlanetScope Analytic
  6. 6. Problem 3: Interpretation errors in training data
  7. 7. High spatial & temporal resolution imagery
  8. 8. Active Learning 01 00 11 Train Predict Select Re-label Label Debats et al, 2017
  9. 9. Study Region
  10. 10. prob (%)
  11. 11. prob (%)
  12. 12. Growing Season
  13. 13. Dry Season
  14. 14. Labelling component: Crowdsourcing Platform
  15. 15. True positive (TP) False positive (FP) False negative (FN) True negative (TN)
  16. 16. score = in_accuracy * β0 + out_accuracy * β1 + fragmentation * β2 + edge_accuracy * β3 + categorical_accuracy * β4
  17. 17. Accuracy assessment and consensus labelling Probability
  18. 18. Bayesian Model Averaging Label collection ! " # = % ! &' # !("|#, &') , '-.
  19. 19. Bayesian Model Averaging Heat map ! " # = % ! &' # !("|#, &') , '-.
  20. 20. Consensus Label
  21. 21. Probability
  22. 22. Debats et al (2016) A generalized computer vision approach to mapping crop fields in heterogeneous agricultural landscapes Remote Sensing Environment 179 Machine Learning component 1. On the fly feature extraction 2. Spark ML RandomForest GeoTrellis/ GeoPySpark
  23. 23. Does Training Data Error Impact Classification Performance?
  24. 24. Next Steps 1. Errors in image atmospheric corrections 2. Increase feature space for classifier 3. Improve label quality 4. Quantify gap between worker and ground
  25. 25. Worker map Ground truth(y) Where lies the truth?
  26. 26. 8 Circle Bias, many false positive identified because of overreliance on circular features https://github.com/ecoh ydro/CropMask_RCNN
  27. 27. Probability score above .7 deemed a center pivot Tested on never before seen 512x512 tiles 11 Some center pivots are missed because of date mismatch between imagery and labels of the reference dataset
  28. 28. BAYESIAN MODEL AVERAGING: ! " # = % &'( ) ! *& # !("|#, *&) ": the ground truth, which will be either ‘field’ or ‘no field’ #: the given data of crowdsourcing opinions for labeling this pixel (e.g., # = {#mapper_1 = field , #mapper_/= no field, …} ) *&: the Mappers considered (1) 012234&’s opinion: how much probability to be " (2) Weight (or evidence): is the probability that we weigh 012234&’s opinion based on their mapping history combining crowdsourcing labels from their mapping history
  29. 29. MAPPER OPINION In our mapping project, mappers are allowed to only label a crispy category for polygons (either ‘field’ or ‘no field’). So ! " #, %& = 0 )* 1 (1) !(" = -./01|#& = -./01, %&) = 1 (2) !(" = 4) -./01|#& = -./01, %&) = 0 (3) !(" = 4) -./01|#& = 4) -./01, %&) = 1 (4) !(" = -./01|#& = 4) -./01, %&) = 0
  30. 30. WEIGHT Weight: ! "# $ ∝ ! $ "# !("#) (1) !("#): ‘mapper priors’, is our prior belief for mapper '. We can use average score (combining geometric and thematic accuracy) to represent our belief (()*) ∝ (∑,-. / 01234,) /7 (2) ! $ "# : ‘mapper likelihood’, ! $ "# ∝ exp(- . 8 9:;#) [1][2] BIC(Bayesian Information Criterion) = ln ? ∗ A − 2 ln D $ ̂F, " ‘BIC simply reduces to maximum likelihood when the number of parameters is equal for the models of interest’ [3] , so 9:; ≈ −2 ln D $ IF, " . After adjustment, ( J )* ∝ K J ̂F, )* (Maximum mapper likelihood) (? is the sample number, A is the parameter number to be estimated (our case has only one, i.e., L), ML is the label that maximizes the likelihood function)
  31. 31. WEIGHT (CONTI.) Weight: ! "# $ ∝ ! $ "# !("#) Mapper likelihood: ' ( )* ∝ + ( ,-, )* (Maximum Mapper likelihood) (1) !(- = 01234| ,-, "#) = ! $ = 01234 - = 01234, "# = (∑8 9 :;< :;<=>?< ) /A (2) !(- = BC 01234| ,-, "#) = ! $ = BC 01234 - = BC 01234, "# = (∑8 9 :?< :?<=>;< ) /A D $ ̂-, " can be computed as: * Maximum mapper likelihood is actually average producer’s accuracy of the mapper
  32. 32. SUMMARY ! " # = ∑&'( ) ! *& # !("|#, *&) weight = score ∗ producer′s accuracy ∝ P M8 D P("|D, M8) = 0 ;< 1 Labeling: If ! " = >?@AB # > ! " = D; >?@AB # (or ! " = >?@AB # > 0.5), we give a consensus label as field; otherwise, we give a label as no field The posterior probability of the pixel label " given the data of mappers’ opinions (#): (*& is the mapper ?) → ! " # = ∑FGH I JK&LMNF∗ O(P|Q,RF) ∑FGH I JK&LMNF , where

×