Mapping and classification of spatial data using machine learning: algorithms and software tools Vadim Timonin – Institute of Geomatics and Risk Analysis (IGAR), University of Lausanne (Switzerland)
This document summarizes the results of a spatial interpolation comparison exercise conducted in 2004. Participants were asked to estimate pollution values at 1008 locations based on observations from 200 random monitoring locations. The best results in the emergency scenario came from methods using neural networks, with one participant achieving a mean absolute error of 14.85. Geostatistical methods also performed well overall, with many participants achieving errors less than 20. The results are presented in a table ranking methods by their performance in the emergency scenario.
Similaire à Mapping and classification of spatial data using machine learning: algorithms and software tools Vadim Timonin – Institute of Geomatics and Risk Analysis (IGAR), University of Lausanne (Switzerland)
Similaire à Mapping and classification of spatial data using machine learning: algorithms and software tools Vadim Timonin – Institute of Geomatics and Risk Analysis (IGAR), University of Lausanne (Switzerland) (20)
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Mapping and classification of spatial data using machine learning: algorithms and software tools Vadim Timonin – Institute of Geomatics and Risk Analysis (IGAR), University of Lausanne (Switzerland)
1. Mapping and classification of spatial data using
Machine Learning Office
software tools
Vadim Timonin
Institute of Geomatics and Analysis of Risk,
University of Lausanne, Switzerland
Vadim.Timonin @UNIL.ch
2. Contents
• Short description of the Machine Learning Office
• SIC 2004:
Application to the automatic cartography of radioactivity
• Case study:
Wind fields mapping with neural network and
regularization technique.
6. Machine Learning Office
Unsupervised
Clustering & density estimation
• K-Means & EM algorithms
• Gaussian Mixture Model (GMM)
• Self-Organizing (Kohonen) Maps (SOM)
7. Machine Learning Office
Mixture of supervised and unsupervised
Joint density estimation
• Mixture Density Networks (MDN)
8. Automatic Mapping of Pollution Data
Procedure should be:
1. Simple, without difficult tuning of the models (can be
used by “non-expert” in machine learning)
3. Result should be unique (does not depend on training
algorithms, initial values, etc.)
9. Automatic Mapping of Pollution Data
Good candidates:
1. KNN
2. GRNN / PNN
Not so good candidates (?):
1. MLP
2. RBFNN
3. SVM / SVR
10. Automatic Mapping
with Prior Knowledge
in situations of
Routine and Emergency
Spatial Interpolation Comparison 2004
http://www.ai-geostats.org/
Official report:
Automatic mapping algorithms for routine and
emergency monitoring data.
EUR 21595 EN EC.
Dubois G. (Ed.), Office for Official Publications of the European
Communities, Luxembourg, 150 p., November 2005.
11. Spatial Interpolation Comparison 2004
Introduction
Description of the concept of SIC 2004
Participants are invited using 200 observations (left, circles) to estimate (predict)
values located at 1008 locations (right, crosses).
12. Spatial Interpolation Comparison 2004
Introduction
Prior data sets
From these 1008 monitoring locations, a single sampling scheme of 200 monitoring stations
was selected randomly and extracted for each of the 10 datasets, in order to allow participants
to train and design their algorithms. These 200 sampling locations have a spatial distribution
that can be considered as nearly random. From the summary statistics, one can see that the
subsets of 200 points are representative of the whole set of 1008 points.
Note that is the choice of participant to use or do not use these prior information for modeling.
Statistics for the training sets (n = 200) Statistics for the full sets (n = 1008)
Set No Min Mean Median Max Std.Dev Min Mean Median Max Std.Dev
1 55.8 97.6 98.0 150.0 19.1 55.0 98.9 99.5 193.0 21.1
2 55.9 97.4 97.9 155.0 19.3 54.9 98.8 99.5 188.0 21.2
3 59.9 98.8 100.0 157.0 18.5 59.9 100.3 101.0 192.0 20.4
4 56.1 93.8 94.8 152.0 16.8 56.1 95.1 95.4 180.0 18.8
5 56.4 92.4 92.0 143.0 16.6 56.1 93.7 94.0 168.0 18.1
6 54.4 89.8 90.4 133.0 15.9 54.4 90.9 91.6 168.0 17.2
7 56.1 91.7 91.7 140.0 16.2 56.1 92.5 92.9 166.0 16.9
8 54.9 92.4 92.5 148.0 16.6 54.9 93.5 94.1 176.0 18.1
9 56.5 96.6 97.0 149.0 18.2 56.5 97.8 98.7 183.0 19.9
10 54.9 95.4 95.7 152.0 17.2 54.9 96.6 97.1 183.0 19.0
13. Results of the GRNN models
with cross-validation tuning
Emergency (joker)
scenario
Routine scenario
Epicentre of accident
(hot spot)
14. Results
In the following table the participants’ results for either of the
two scenarios (routine and emergency) are presented.
The results have been sorted by Minimum Absolute Error
(MAE) obtained in the case of the emergency scenario.
Other statistics shown in this table are the Mean Error (ME)
that allows to assess the bias of the results, the Root Mean
Squared Error (RMSE), as well as Pearson’s Correlation
Coefficient (Ro) between true and estimated values.
• GEOSTATS denotes Geostatistical techniques
• NN Neural Networks
• SVM Support Vector Machine
In each column, the best results have been bolded.
17. Modeling of wind fields with MLP
and regularization technique
(pp 168-172 of the book)
Monitoring network:
111 stations in Switzerland
(80 training + 31 for validation)
Mapping of daily:
• Mean speed
• Maximum gust
• Average direction
18. Modeling of wind fields with MLP
and regularization technique
Monitoring network:
111 stations in Switzerland (80 training + 31 for validation)
Mapping of daily:
• Mean speed
• Maximum gust
• Average direction
Input information:
X,Y geographical coordinates
DEM (resolution 500 m)
23 DEM-based « geo-features »
Total 26 features
Model:
MLP 26-20-20-3
19. Training of the MLP
Model:
MLP 26-20-20-3
Training:
• Random initialization
• 500 iterations of the
RPROP algorithm