Artificial Intelligence and Big Data Techniques for Copernicus Data: the ExtremeEarth project
Manolis Koubarakis (Professor at the National and Kapodistrian University of Athens and Adjunct Researcher at the Institute of the Management of Information Systems)
2. ExtremeEarth
Big data technologies
and extreme scale analytics
This project has received funding from the European Union’s Horizon 2020
research and innovation programme under grant agreement No 825258.
4. 4
Background - Copernicus
• Satellite data is probably the most important digital resource available to mankind
today.
• Europe is a champion in the area of satellite data given its Copernicus program.
• Copernicus data is a paradigmatic case of big data. We have all the Vs:
• Volume (>191*103 users, >11*106 products, >106 PB of data in the Copernicus
Open Access Hub)
• Velocity (as of 2017, 10TB of data were generated and 93TB were disseminated
every day)
• Variety (many kinds of satellite images, many kinds of collateral data)
• Veracity (quality is important)
• Value (13.5 billion and 28.030 job years are projected for 2008-2020)
• Information and knowledge extracted from this wealth of EO data is also voluminous:
1 PB of Sentinel data may contain >750*103 products which will result in >450TB of
information and knowledge (e.g., classes of objects).
5. 5
Background - DIAS
• The five Data and Information Access Service (DIAS) platforms are now operational:
• CREODIAS (https://creodias.eu/)
• Mundi Web Services (https://mundiwebservices.com/)
• SOBLOO (https://sobloo.eu/)
• ONDA (https://www.onda-dias.eu)
• Wekeo (https://www.wekeo.eu/)
• The five DIASs offer computing power close to the data to facilitate the development
of EO applications.
6. 6
Background – ESA TEPs
• ESA thematic exploitation platforms (TEPs): virtual environments for user
communities so they can find relevant EO data and develop applications using this data
by taking advantage of available computing resources. There are 7 TEPs currently:
• Coastal
• Forestry
• Geohazards
• Hydrology
• Polar
• Urban
• Food Security
7. 7
Background – Previous
Projects
• The European Commission has so far funded the following four projects
which UoA led or participated that have dealt partially with the big data
issues arising for Copernicus data:
• TELEIOS
• LEO
• Melodies
• Copernicus App Lab
• These projects have dealt very successfully with the variety dimension
of Copernicus big data.
• This was done by doing original research and development in the areas of
linked geospatial data and ontology-based geospatial data access.
Essentially these projects defined these areas.
8. 8
Background – Projects
(cont’d)
• However, the contribution of the above projects to the volume dimension of Copernicus
data has been only partial.
• Example: the flagship geospatial RDF store Strabon of UoA can only handle up to
100GBs of point data and still be able to answer simple geospatial queries
(selections over a rectangular area) efficiently (in a few seconds).
• Competitor industrial systems like GraphDB perform similarly.
9. 9
Background – Deep Learning for
Satellite Images
• Contrary to multimedia images, for which highly scalable AI techniques based on deep
neural network architectures have been developed by big North American companies
such as Google and Facebook recently, similar architectures for satellite images, that
can manage the extreme scale and characteristics of Copernicus data, do not exist
in Europe or elsewhere today.
• The deep neural network architectures can classify effectively and efficiently multimedia
images because they have been trained using extremely large benchmark datasets
consisting of millions of images (e.g., ImageNet) and have utilized the power of Big
Data, Cloud and GPU technologies.
• Training datasets consisting of millions of data samples in the Copernicus context
do not exist today and published deep learning architectures for Copernicus satellite
images typically run using one GPU and do not take advantage of recent advances
like distributed scale-out deep learning.
10. 10
Example: The
EuroSAT archive
The EuroSAT archive contains Sentinel-2 images with13 spectral bands, 10
classes and 27000 labelled images.
P. Helber, B. Bischke, A. Dengel, and D. Borth. “EuroSAT:A novel dataset and deep learning benchmark for land use
and land cover classification”. IGARSS 2018.
11. 11
Example: The
BigEarthNet archive
The BigEarthNet archive contains 590326 Sentinel-2 image patches with
multiple land cover annotations.
Gender Sumbul, Begum Demir and Volker Markl. “A new large-scale Sentinel-2 benchmark archive and a three branch
CNN for classification of Sentinel-2 images”. BiDS 2019.
12. 12
Background – the Hops data and deep
learning platform
• The European data and deep learning platform Hops developed by
LogicalClocks and KTH.
13. 13
ExtremeEarth Main Objective
• To go beyond the above four projects, DIASs, TEPs and Hops by
developing Artificial Intelligence and Big Data techniques and technologies
that scale to the PBs of big Copernicus data, information and knowledge,
and applying these technologies in two of the ESA TEPs: Food Security and
Polar.
• The technologies to be developed will extend the Hops data platform of
partners LogicalClocks and KTH to offer unprecedented scalability to
extreme data volumes and scale-out distributed deep learning for
Copernicus data.
• The extended Hops data platform will run on CREODIAS and will be
available as open source to enable its adoption by the strong European Earth
Observation downstream services industry.
14. 14
Consortium (11 partners from 6 EU
countries)
1. National and Kapodistrian University of Athens (UoA)
2. VISTA Geowissenschaftliche Fernerkundung GmbH (VISTA)
3. UiT - The Arctic University of Norway (UiT)
4. University of Trento (UNITN)
5. Royal Institute of Technology (KTH)
6. National Center for Scientific Research - Demokritos (NCSR)
7. German Aerospace Center (DLR)
8. Polar View Earth Observation Ltd. (PolarView)
9. Meteorologisk Institutt (METNO)
10. LogicalClocks
11. British Antarctic Survey (UKRI-BAS)
15. 15
ExtremeEarth Detailed Objectives
• To develop scalable deep learning architectures for Sentinel images.
o The developed architectures will target determining crop type (in the Food
Security use case) and sea ice mapping (in the Polar use case).
o The developed architectures will run on the Hops data platform.
• To develop very large training datasets for the above deep learning
architectures.
16. 16
Detailed Objectives (cont’d)
• To develop techniques and tools for linked geospatial data
transformation, interlinking, querying and federation that scale to big
Copernicus data, information and knowledge.
o Reengineer the UoA systems GeoTriples, JedAI and Strabon so that they
scale to the PBs of Copernicus big data.
o Reengineer the NCSR system SemaGrow so that it scales to federations of
big linked geospatial data sources.
o The developed systems will run on the Hops data platform.
17. 17
Detailed Objectives (cont’d)
• To extend the capabilities for EO data discovery and access with
semantic catalogue services that scale to the big data, information and
knowledge of Copernicus.
o Allow the expression of sophisticated queries such as “How many
icebergs were embedded in the Norske Øer Ice Barrier at its maximum
extent in 2017?”
o Implement the catalogue on the Hops data platform and demonstrate it on
CREODIAS.
18. 18
Detailed Objectives (cont’d)
• To integrate the Big Data and Artificial Intelligence technologies of the
previous objectives in the Hops data platform and deploy them on
CREODIAS.
19. 19
Detailed Objectives (cont’d)
• The objective of the Food Security use case is to
provide water availability maps for selected
agricultural areas, allowing field level irrigation
support.
• Information is based on the catchment wide
assessment of the water – including seasonal storage
as snow - and will be made available to farmers and
decision makers in agriculture, using the Food
Security TEP.
• Big EO data processing, crop type information
derivation using deep learning and water-to-plant
modelling are applied.
• The focus will be the Danube and Duero river
catchments.
20. 20
Detailed Objectives (cont’d)
• To produce high resolution ice charts from massive volumes of
heterogeneous Copernicus data.
o The charts will be made available as linked data and will be combined with
other information such as sea surface temperature and wind information for
informing maritime users and Polar TEP users.