Mapping and classification of spatial data using machine learning: algorithms and software tools Vadim Timonin – Institute of Geomatics and Risk Analysis (IGAR), University of Lausanne (Switzerland)

Mapping and classification of spatial data using
Machine Learning Office
software tools

Vadim Timonin
Institute of Geomatics and Analysis of Risk,
University of Lausanne, Switzerland

Vadim.Timonin @UNIL.ch

Contents

• Short description of the Machine Learning Office

• SIC 2004:
Application to the automatic cartography of radioactivity

• Case study:
Wind fields mapping with neural network and
regularization technique.


Part of the book:

EPFL press

June 2009

June 20

09:00 – 12:00

Room T120

Practical work session using
Machine Learning software

Supervised
Regression
• Multilayer Perceptron (MLP)
• General Regression Neural Networks (GRNN)
• Radial Basis Function Neural Networks (RBFNN)
• K-Nearest Neighbour (KNN)
• Support Vector Regression (SVR)

Classification
• Multilayer Perceptron (MLP)
• Probabilistic Neural Networks (PNN)
• K-Nearest Neighbour (KNN)
• Support Vector Machines (SVM)

Unsupervised

Clustering & density estimation

• K-Means & EM algorithms
• Gaussian Mixture Model (GMM)
• Self-Organizing (Kohonen) Maps (SOM)

Mixture of supervised and unsupervised

Joint density estimation

• Mixture Density Networks (MDN)

Automatic Mapping of Pollution Data

Procedure should be:

1. Simple, without difficult tuning of the models (can be
used by “non-expert” in machine learning)

3. Result should be unique (does not depend on training
algorithms, initial values, etc.)

Automatic Mapping of Pollution Data

Good candidates:

1. KNN
2. GRNN / PNN

Not so good candidates (?):

1. MLP
2. RBFNN
3. SVM / SVR

Automatic Mapping
with Prior Knowledge
in situations of
Routine and Emergency
Spatial Interpolation Comparison 2004
http://www.ai-geostats.org/
Official report:
Automatic mapping algorithms for routine and
emergency monitoring data.
EUR 21595 EN EC.
Dubois G. (Ed.), Office for Official Publications of the European
Communities, Luxembourg, 150 p., November 2005.

Introduction
Description of the concept of SIC 2004
Participants are invited using 200 observations (left, circles) to estimate (predict)
values located at 1008 locations (right, crosses).

Introduction
Prior data sets
From these 1008 monitoring locations, a single sampling scheme of 200 monitoring stations
was selected randomly and extracted for each of the 10 datasets, in order to allow participants
to train and design their algorithms. These 200 sampling locations have a spatial distribution
that can be considered as nearly random. From the summary statistics, one can see that the
subsets of 200 points are representative of the whole set of 1008 points.

Note that is the choice of participant to use or do not use these prior information for modeling.

Statistics for the training sets (n = 200) Statistics for the full sets (n = 1008)

Set No Min Mean Median Max Std.Dev Min Mean Median Max Std.Dev

1 55.8 97.6 98.0 150.0 19.1 55.0 98.9 99.5 193.0 21.1

2 55.9 97.4 97.9 155.0 19.3 54.9 98.8 99.5 188.0 21.2

3 59.9 98.8 100.0 157.0 18.5 59.9 100.3 101.0 192.0 20.4

4 56.1 93.8 94.8 152.0 16.8 56.1 95.1 95.4 180.0 18.8

5 56.4 92.4 92.0 143.0 16.6 56.1 93.7 94.0 168.0 18.1

6 54.4 89.8 90.4 133.0 15.9 54.4 90.9 91.6 168.0 17.2

7 56.1 91.7 91.7 140.0 16.2 56.1 92.5 92.9 166.0 16.9

8 54.9 92.4 92.5 148.0 16.6 54.9 93.5 94.1 176.0 18.1

9 56.5 96.6 97.0 149.0 18.2 56.5 97.8 98.7 183.0 19.9

10 54.9 95.4 95.7 152.0 17.2 54.9 96.6 97.1 183.0 19.0

Results of the GRNN models
with cross-validation tuning

Emergency (joker)
scenario

Routine scenario

Epicentre of accident
(hot spot)

Results
In the following table the participants’ results for either of the
two scenarios (routine and emergency) are presented.

The results have been sorted by Minimum Absolute Error
(MAE) obtained in the case of the emergency scenario.
Other statistics shown in this table are the Mean Error (ME)
that allows to assess the bias of the results, the Root Mean
Squared Error (RMSE), as well as Pearson’s Correlation
Coefficient (Ro) between true and estimated values.

• GEOSTATS denotes Geostatistical techniques
• NN Neural Networks
• SVM Support Vector Machine

In each column, the best results have been bolded.

Results of the SIC 2004 exercise
MAE ME RMSE Ro

Participant Method routine joker routine joker routine joker routine joker

Timonin NN 9.40 14.85 -1.25 -0.51 12.59 45.46 0.78 0.84

Fournier GEOSTATS 9.06 16.22 -1.32 -8.58 12.43 81.44 0.79 0.27

Pozdnoukhov SVM 9.22 16.25 -0.04 -6.70 12.47 81.00 0.79 0.28

(authors are Saveliev SPLINES 9.60 17.00 3.00 10.40 13.00 82.20 0.77 0.23

highlighted) Dutta

Ingram
NN

GEOSTATS
9.92

9.10
17.50

18.55
0.20

-1.27
5.10

-4.64
13.10

12.46
80.60

54.22
0.76

0.79
0.29

0.86

Hofierka SPLINES 9.10 18.62 -1.30 0.41 12.51 73.68 0.79 0.50



Fournier OTHERS 9.29 19.44 -1.12 -0.12 12.56 71.87 0.78 0.53

Savelieva GEOSTATS 9.11 19.68 -1.39 -2.18 12.49 69.08 0.78 0.56

Palaseanu GEOSTATS 9.05 19.76 1.40 2.33 12.46 74.54 0.79 0.50

Rigol S. NN 12.10 20.30 -1.20 -9.40 15.80 84.10 0.67 0.12

Pebesma GEOSTATS 9.11 20.83 -1.22 0.92 12.44 73.73 0.79 0.50

Pebesma OTHERS 9.94 21.03 -1.35 4.50 13.32 72.12 0.78 0.51

Ingram GEOSTATS 9.08 21.77 -1.44 0.72 12.47 79.57 0.79 0.35

Lophaven GEOSTATS 9.70 22.20 1.20 -4.10 13.10 71.20 0.76 0.54

Saveliev SPLINES 9.30 22.20 1.60 0.60 12.60 76.40 0.78 0.41

Ingram GEOSTATS 9.47 22.53 -1.15 3.09 12.75 79.16 0.78 0.33


Rigol S. NN 16.00 25.30 -1.70 -11.10 20.80 87.50 0.55 0.02


Dutta NN 9.62 28.20 0.90 -0.22 12.70 80.10 0.78 0.31


Dutta NN 12.20 28.90 1.50 -1.29 15.90 79.90 0.64 0.33

Rigol S. NN 21.40 30.50 5.30 3.80 45.80 96.60 0.24 0.20

Ingram NN 9.72 38.29 -1.54 8.38 13.00 84.24 0.76 0.30

Dutta NN 9.93 38.50 2.18 17.98 13.30 87.30 0.76 0.27

Ingram NN 9.48 48.41 -1.22 -3.01 12.73 90.89 0.78 0.38

Pebesma GEOSTATS 9.11 146.36 -1.22 19.71 12.44 212.10 0.79 -0.27

Results of the SIC 2004 exercise
MAE ME RMSE Ro

Participant Method routine joker routine joker routine joker routine joker

Timonin NN 9.40 14.85 -1.25 -0.51 12.59 45.46 0.78 0.84


Pozdnoukhov SVM 9.22 16.25 -0.04 -6.70 12.47 81.00 0.79 0.28


Dutta NN 9.92 17.50 0.20 5.10 13.10 80.60 0.76 0.29

Ingram GEOSTATS 9.10 18.55 -1.27 -4.64 12.46 54.22 0.79 0.86




Fournier OTHERS 9.29 19.44 -1.12 -0.12 12.56 71.87 0.78 0.53

Savelieva GEOSTATS 9.11 19.68 -1.39 -2.18 12.49 69.08 0.78 0.56

Palaseanu GEOSTATS 9.05 19.76 1.40 2.33 12.46 74.54 0.79 0.50

Rigol S. NN 12.10 20.30 -1.20 -9.40 15.80 84.10 0.67 0.12


Pebesma OTHERS 9.94 21.03 -1.35 4.50 13.32 72.12 0.78 0.51

Ingram GEOSTATS 9.08 21.77 -1.44 0.72 12.47 79.57 0.79 0.35

Lophaven GEOSTATS 9.70 22.20 1.20 -4.10 13.10 71.20 0.76 0.54


Ingram GEOSTATS 9.47 22.53 -1.15 3.09 12.75 79.16 0.78 0.33


Rigol S. NN 16.00 25.30 -1.70 -11.10 20.80 87.50 0.55 0.02


Dutta NN 9.62 28.20 0.90 -0.22 12.70 80.10 0.78 0.31


Dutta NN 12.20 28.90 1.50 -1.29 15.90 79.90 0.64 0.33

Rigol S. NN 21.40 30.50 5.30 3.80 45.80 96.60 0.24 0.20

Ingram NN 9.72 38.29 -1.54 8.38 13.00 84.24 0.76 0.30

Dutta NN 9.93 38.50 2.18 17.98 13.30 87.30 0.76 0.27

Ingram NN 9.48 48.41 -1.22 -3.01 12.73 90.89 0.78 0.38

Pebesma GEOSTATS 9.11 146.36 -1.22 19.71 12.44 212.10 0.79 -0.27

Modeling of wind fields with MLP
and regularization technique
(pp 168-172 of the book)
Monitoring network:
111 stations in Switzerland
(80 training + 31 for validation)

Mapping of daily:
• Mean speed
• Maximum gust
• Average direction

Modeling of wind fields with MLP
and regularization technique
Monitoring network:
111 stations in Switzerland (80 training + 31 for validation)

Mapping of daily:
• Mean speed
• Maximum gust
• Average direction

Input information:
X,Y geographical coordinates
DEM (resolution 500 m)
23 DEM-based « geo-features »
Total 26 features

Model:
MLP 26-20-20-3

Training of the MLP

Model:
MLP 26-20-20-3

Training:
• Random initialization
• 500 iterations of the
RPROP algorithm

Results: Noisy ejection regularization

Results: summary
Noisy ejection regularization

Without regularization (overfitting)

Thank you for your attention!

Next stop is:

June 20

09:00 – 12:00

Room T120

Practical work session using
Machine Learning software

Mapping and classification of spatial data using machine learning: algorithms and software tools Vadim Timonin – Institute of Geomatics and Risk Analysis (IGAR), University of Lausanne (Switzerland)

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (20)

Similaire à Mapping and classification of spatial data using machine learning: algorithms and software tools Vadim Timonin – Institute of Geomatics and Risk Analysis (IGAR), University of Lausanne (Switzerland)

Similaire à Mapping and classification of spatial data using machine learning: algorithms and software tools Vadim Timonin – Institute of Geomatics and Risk Analysis (IGAR), University of Lausanne (Switzerland) (20)

Plus de Beniamino Murgante

Plus de Beniamino Murgante (20)

Dernier

Dernier (20)

Mapping and classification of spatial data using machine learning: algorithms and software tools Vadim Timonin – Institute of Geomatics and Risk Analysis (IGAR), University of Lausanne (Switzerland)