Since 1972 and the launch of Landsat 1– the first Earth Observation civilian satellite - millions of images have been acquired all over the Earth by a constantly growing fleet of more and more sophisticated satellites. Generally, searching within this huge amount of Earth Observation (EO) images is limited by the description of the acquisition conditions stored in the related metadata files, i.e. Where (footprint), When (time of acquisition) and How (viewing angles, instrument, etc.). Thus the larger community of end users misses the What filter - i.e. a way to filter search in term of image content. RESTo [1] uses the iTag [2] footprint-based tagging system to enhance image metadata and hopefully provides a way to express semantic queries on images content in term of land use. We investigated the performance of RESTo against a 12 millions simulated Sentinel-2 granules database representative of the forthcoming French national mirror site of Sentinel products (PEPS).
Semantic search within Earth Observation products databases based on automatic tagging of image content
1. SEMANTIC SEARCH WITHIN EARTH OBSERVATION PRODUCTS DATABASES
BASED ON AUTOMATIC TAGGING OF IMAGE CONTENT
Jérôme Gasperi
2014 Conference on Big Data from Space
Frascati - Italy - November 12th, 2014
2. Big Data ?
The data deluge
The search paradigm
iTag
An EO tagging library
resto
An EO product search engine
What’s next ?
Conclusion and perspectives
3. The data deluge
Brett Ryder - http://www.economist.com/node/15579717
30. Example
« Images of urban area in Russia acquired in last year with less than 5 % of cloud cover »
31. Example
« Images of urban area in Russia acquired in last year with less than 5 % of cloud cover »
keyword location date acquisition parameter
32. 2. Each search result has an « human readable url » that can
be indexed by web crawler (i.e. google robots)
1. Search parameters are derived from
Natural Language query
3. Keywords on resources are links to search requests :
they can be indexed by web crawler…and so on
33. 2. Each search result has an « human readable url » that can
be indexed by web crawler (i.e. google robots)
http://goo.gl/BCZ3z4
1. Search parameters are derived from
Natural Language query
3. Keywords on resources are links to search requests :
they can be indexed by web crawler…and so on
37. 1 000 000
SPOT DATABASE
New products retrieved every 3 hours from ADS catalog
0.2s
SEARCH
0.5s
Time period of 1 month within a 10x10 km2 box
INGEST
Per product for a ~5000 products ingestion
Order of magnitude compute on a Dual Core 2.6 GHz | 4 Go RAM | HDD 500 To
39. Need for « fresh » tagging reference databases
(e.g. GLC2000 replacement)
40. Enhance metadata with twitter trends hashtags
Add tags #mh370,#plane,#malaysianairline
to resources acquired between 2014, march 8th and 2014, april 14th
in the south of the Indian Ocean
41. « Linked data is the right way to do Semantic Web »
Tim Berners-Lee
42.
43. Update iTag JSON model to follow JSON-LD format
{
"@context": "http://json-ld.org/contexts/person.jsonld",
"@id": "http://dbpedia.org/resource/John_Lennon",
"name": "John Lennon",
"born": "1940-10-09",
"spouse": "http://dbpedia.org/resource/Cynthia_Lennon"
}