1. A PROPOSED EARTH SCIENCE COLLABORATORY K-S Kuo1,2, Chris Lynnes1, Rahul Ramachandran3 1NASAGoddard Space Flight Center, USA 2Caelum Research Corporation, USA 3University of Alabama-Huntsville, USA 7/27/11 1 IGARSS 2011, Vancouver, Canada
2. Why ESC? 7/27/11 2 IGARSS 2011, Vancouver, Canada Data Intensive Science Many forms and sources of data In situ measurements Remote sensing observations Model simulations Large volumes of data Effectiveness as a scientist Increasing proportion of effort in data management Threatening: Reproducibility Correctness Productivity
3. What is an ESC? Vision of arich model development/simulation and data analysis environment that: Provides access to various Earth Science models Facilitates model and analysis software development Provides access across a wide spectrum of Earth Science data Provides a diverse set of science analysis services and tools Supports the application of services and tools to data Supports collaboration, i.e. sharing of data, tools and results Supports discovery and publication of all science artifacts 7/27/11 3 IGARSS 2011, Vancouver, Canada Basically, a new and natural place for Earth scientists to conduct their work and collaborate with others!
4. 7/27/11 4 IGARSS 2011, Vancouver, Canada The Situation TodayIslands of data and services with selective connectivity Data Center A Data Center C Data Center B
5. High-Level View 7/27/11 5 IGARSS 2011, Vancouver, Canada Cyberinfrastructure Laboratory Notebook Workflow Mediator Tool Library Data Library Data Centers
98. Key Advantages of ESC Tool availability will be a force multiplier More tools will be usable with more datasets More tools will be more available to more users Knowledge sharing evolves from text on paper to a rich mixture of data, tools, workflows and articles A “wikihow” for Earth Science data analysis Incorporating live data, services and workflows ESC maintains a record of the analysis process Share, repeat, build upon analysis techniques Transparency of the process is built in 7/27/11 12 IGARSS 2011, Vancouver, Canada
99. Prior Art Talkoot, myExperiment.org– workflow sharing, virtual notebooks Earth System Grid – provisioned tools, format standards/checkers NASA Earth Exchange (NEX) Land Information System – OPeNDAP as access infrastructure Earth Science Modeling Framework – programmatic approach to integration Giovanni, LAS – community services/tools Canadian Space Science Data Portal (EOS, Feb. 22, 2011) Nebula – cloud provisioning 7/27/11 13 IGARSS 2011, Vancouver, Canada
100. A Use CaseGPM Precipitation Retrieval Algorithm Development 7/27/11 14 IGARSS 2011, Vancouver, Canada GPM Core Satellite: Dual-Frequency Precipitation Radar (JAXA) and GPM Microwave Imager (NASA) GPM Constellation: International partner satellites with mostly microwave radiometers Retrieval algorithms – 3 types Radar-only Radiometer-only Radar-radiometer-combined Participants in algorithm development are distributed in Japan, NASA centers (GSFC, MSFC, JPL), NCAR, and universities (FSU, Uwisc, etc.)
101. A Use CaseGPM Algorithm Development – Current Situation 7/27/11 15 IGARSS 2011, Vancouver, Canada Interdependence among 3 types of algorithms Communication/Coordination– Narrow bandwidth Periodic workshop meetings and teleconferences Data access – Duplicative Each location/group has a copy or subset of required data Sharing of data/tools – Individual, not concerted through ftp/email Knowledge sharing – Delayed
102. A Use CaseGPM Algorithm Development – with ESC 7/27/11 16 IGARSS 2011, Vancouver, Canada Cloud Tools ESC Client A Tools ESC Client Z VM Image VM Image A B Tools Data Tools Data mySci Cat. mySci Cat. Data Data Community Catalog ESC
103. A Use CaseGPM Algorithm Development – Multi-level Membership D C B A K J I H G F E M L GPM Combined Algorithm Radar-Only Radiometer-Only
104. A Use CaseGPM Algorithm Development – in ESC 7/27/11 18 IGARSS 2011, Vancouver, Canada Enhanced communication/coordination – wide bandwidth Efficient data access – less duplication Improved sharing – more pervasive Effective knowledge sharing – immediate
106. Why now? Because we can do it (finally)! Advances in standards acceptance andimplementation (OPeNDAP, autoconf) A consistent, loosely coupled architecture encapsulates complexity and maximizes flexibility Social networking has reached the mainstream Key lessons can be learned from prior efforts The need is growing Interest in working with multiple datasets is growing Calls for transparency and reproducibility are growing 7/27/11 20 IGARSS 2011, Vancouver, Canada
107. What’s New? Macro View (forest-level) Systematic approach to making data available to services and vice versa Integration of all major analysis components Consistent view of all architectural components Cyberinfrastructure services for all architectural components Micro View (tree-level): Nothing! 7/27/11 21 IGARSS 2011, Vancouver, Canada
108. How to move forward? Option 1 RFC to community on feasibility, challenges, approach Followed by RFPs for component and integration Option 2 Narrow end-to-end prototype Followed by refactoring and broadening 7/27/11 22 IGARSS 2011, Vancouver, Canada