2. More data has been created
since 2005 than in the previous
40,000 years
2
3. Geospatial data timeline 2010
Social networks
Geotag
1992
Internet 2006
explosion GPS
receiver
1993 2000 built into
It is 1997 2005 cell
Civilian
1980 launched Tropical
demand
Google phones
First the 24th Rainfall Earth
for GPS
commercial Navstar Measuring
products
1972 vendors of satellite Mission
Landsat 1, Geographical completing (TRMM)
1st civilian information the Global
Earth Systems (GIS) Positioning
observation software System
satellite
3
4. These data are critical for
decision support, but their
value depends on our ability
to extract useful information
4
5. Challenges
NASA earth observatory • Highly-dimensional
(Information from several missions
e.g. Terra, TRMM, SRTM) • Large quantity of data
• Unlabeled samples (labeling is
expensive and time consuming
process)
Worldclim
(climate data from weather stations)
Derivate variables
Elevation Slope Aspect Moisture
Mean annual
temperature (ºC)
-30.1
30.5
Landscape Solar
Class Exposure Radiation Curvature
Annual
precipitation (mm)
0
12084
5
6. Spatio-temporal challenges
Spatio-temporal representations Variables and clusters evolved in
at several levels a temporal context
Hours
Days
Months
Years
Fuzzy boundaries in Visualization of clusters in geographical
geographical space and feature space
6
7. Thesis
Tree-structured SOM
FGHSON component planes
SOM
Colombia (Ecoregions) GHSOM
South America (Ecoregions)
Colombia
(agroecozones,
ecoregions)
Clustering
Visualization and projection
Spatio-temporal data
7
9. Visualization by using Self-organizing Maps
Data set SOM
training
Visualization
3 2
3 1
9
10. Visualization by using Self-organizing Maps
Exploration
Correlation hunting
Partial
Similar correlations
10
11. A real world problem:
Classification of agro-ecological variables related with
productivity in the sugar cane culture.
Climate variables.
• Average Temperature (TempAvg) Total 54 variables
• Average Relative Humidity (RHAvg)
• Radiation (Rad)
• Precipitation (Prec)
Soil variables.
• Order (Ord)
• Texture (Tex)
• Deep (Dee)
Topographic variables.
• Landscape (Ls)
• Slope (Sl).
Other variables.
• Water Balance (WB)
• Variety (Var)
Production
11
25. Hierarchical Self-organizing Structures
• It combines the advantages of the Hierarchical
representation and Soft Competitive Learning
• In the state of the art all the methods are crisp
approaches
• In geospatial applications crisp memberships are
not the optimal representation of clusters.
25
36. Spatio-Temporal Clustering
Time – When
Space - Where
Homologues places for Colombian coffee
production.
Brazil, Equator, East Africa, and New Guinea.
36
37. Spatio-Temporal Clustering
Space and time – Where and when
Argentina
Maize (Zea maize L.) United States
37
38. Spatio-Temporal Clustering
Objective: to find similar environmental zones trough time in South America.
In these experience we are looking for regions with similar patterns in time
windows of three months.
38
42. Conclusions
1. Original contributions
FGHSON
• Capability to reflect the underlying structure of a dataset in a
hierarchical fuzzy way
• It does not require an a-priory definition of the number of
clusters.
•The algorithm executes self-organizing processes in parallel.
•Only three parameters are necessary to the setup of the
algorithm.
42
43. Conclusions
Tree-structured SOM component planes
• It creates structures that allow the visual exploratory data
analysis of large high-dimensional datasets.
• Similarities on variables’ behavior can be easily
detected (e.g. local correlations, maximal and minimal values
and outliers).
43
44. Conclusions
2. Test of methodologies for clustering and
visualization of georeferenced data
• GHSOM
• SOM
• FGHSON
3. Methodology contributions
• Clustering of spatio-temporal datasets through time by using
FGHSON.
44
45. Conclusions
4. Agroecological knowledge contribution
• In sugar cane productivity
• In sugar cane agroecoregionalizacion
• In Andean blackberry production
The COCH project
45