TERN's Siddeswara Guru presents on the Australian Ecosystem Science Cloud, which will provide the ecosystem science community improved access to shared data, tools, platforms and computing resources.
The Universal GTM - how we design GTM and dataLayer
Australian Ecosystems Science Cloud
1. TERN is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy.
Australian Ecosystems
Science Cloud
overview
Presentation by Siddeswara Guru
Director, Data Science
2. Ecosystem science
• Inter-relationship among the living organisms, physical features, bio-chemical
processes, natural phenomena, and human activities in ecological communtiies1
• Focusing on Terrestrial Ecosystem
– Terrestrial Ecosystem Research Network
– Atlas of Living Australia
• Data is heterogeneous: wide variety from different domain
– Observation (human, in-situ sensors and satellite remote sensing)
– Variety of scale: spatial and temporal
– Different data formats used in the community
3. Data Use
• Conventional data access
– Need to find data
– Access via services
– copy from source to destination for further for
large datasets
Image from internet
4. Storage and Compute
• Advent of NeCTAR and RDS
– Researchers are moving data and computation to
cloud.
– Building tools (Virtual labs, research tools and
platforms)
– However, easy accessibility of data is still an issue
• Multiple interfaces to search for data
• No clear access mechanism from different nodes
5. Goal
• Offer open data platform: harmonised cloud-enabled data
infrastructure for data interoperability with simplified service
model
• Offer compute next to data to minimise data movement
• Data accessibility to different research platforms and virtual
labs from common platform
• Offer scalable managed computing environment with access
to distributed and data-intensive computation technologies
• develop a support system for a cross-discipline use of data
6. User Stories
• As an ecosystem science continental-scale gridded data user, I wants to query a dataset, perform
spatial and temporal sub-setting of data, access and use that data from a cloud platform as a local
file so that I can work on further analyses.
• As an application developer, I need enough compute and storage for short period of time to run a
distributed large-scale data intensive application so that the output of the analyses are available in
decent amount of time.
• As a regular ecology data user, I need a easily accessible cloud compute platform with common
tools (Rstudio, Jupyter Python, NetCDF viewer, spatial data viewer, CSV file viewer) attached with
the TERN ecology and biophysical data collection so that I can build applications for analysis and
synthesis.
• As a data intensive application developer, I need a flexible approach to create and access to Hadoop
cluster so that I can distribute my computation.
• As a data user, I want an easy access to reference datasets with compute resources so that I can use
them in my analysis and research work.
• As a ecosystem data user, I want a one stop-shop to search, query and access ecosystem data and
use in my analysis so that I don't have to go through multiple portals to access and use data.
• As an application developer, I want a cloud platform to run my simulation with a local access to data
so that I don't move data around or download into my desktop.
8. Current status
• Setup a Technical Advisory Group advice on the scoping and
implementation of the project.
• In the first iteration: reference datasets will be made available
– Remote sensing reference data (fractional Cover)
– Long-term ecological monitoring data
– Climate variables
• Scoping the mediation layer and overall architecture
• Building a coalition of willing for partnership and collaboration