SlideShare une entreprise Scribd logo
1  sur  27
Télécharger pour lire hors ligne
Acquired
      Analysed
        Archived

    Climate Data
    for Our Future


  Prof. Dr. Andreas Hense
  andreas.hense@h-brs.de

 "Open Access – Open Data“
Expert Conference in Cologne
    December 13, 2010
Project Partners



    Bonn-Rhine-Sieg University         Bonn University,                   Deutsches Klimarechen-
     oAS, Computer Science,         Meteorological Institute                  zentrum GmbH
         Sankt Augustin                     Bonn                                 Hamburg




    Prof. Dr. Andreas V. Hense    Prof. Dr. Andreas N. Hense            Dr. Michael Lautenschlager

      Professor for Business         Professor for Climate                  Head of Data Mgmt,
       Information Systems                 dynamics                       Director WDC for Climate
                                                                                   (WDCC)
                                                                           Process definitions,
    Project management &           Experimental data &                  routines for technical QA,
     software development        routines for scientific QA                      hosting

13.12.10                            Publication of Environmental Data                                2
Agenda

   Project Objectives
   Meteorological Background
   The World Data Center for Climate (WDCC)
   The Publication System Atarrabi




13.12.10             Publication of Environmental Data   3
Project Objectives




13.12.10   Publication of Environmental Data           4
Location in the Scientific Process




                                                                  Project focus
                                             Analysis            Long term        Publication
     Idea   Preparation   Execution
                                           Local storage          archival        (DOI/URN)




13.12.10                     Publication of Environmental Data                                  5
Data Publication in Research

Problem:
 Publication and citation have always been common practice


  for scientific articles.
 Scientific articles are often based on data.


 To check the results of an article or to do further research


  the data are necessary.

Solution:
  Publish the article AND the data.




13.12.10                Publication of Environmental Data    6
Aspects of Data Publication

   Storage location – The volume of data can be huge (e.g. meteorological
    data). Who can reliably store the data and assure long term availability and
    fast access?
   Formats – There can be various formats to represent data. Which (meta)
    data format is the most commonly used?
   Exposition/Registry – It is not sufficient to save data ”somewhere” on the
    web. Scientists have to notice the existence of data. What is the best way
    to expose data to search engines? Are there well known (domain specific)
    catalogues where data can be registered?
   Quality – Not all data are qualified for publication. What are the minimum
    requirements? What are (scientific and technical) quality assurance
    procedures?
   Stability – Can data be changed after publication? How are new versions
    published?
   Identifier – How can data be referenced uniformly? Are there any
    standards?
13.12.10                        Publication of Environmental Data                  7
Storage Sites



                  Experiment
                   Analyze                  Publish                    Expose
                  Collaborate




Storage Site       local       group       institutional             national      (global)
                   (research workbench) (community portal)           (national repository)

Data Visibility    private      group           institution          public




13.12.10                         Publication of Environmental Data                            8
A Common Scenario



                  Experiment
                   Analyze                  Publish                    Expose
                  Collaborate



                                copy
Storage Site       local       group       institutional                  national      (global)
                   (research workbench) (community portal)           (national repository)

Data Visibility    private      group           institution          public




13.12.10                         Publication of Environmental Data                                 9
The National Storage Solution



                  Experiment
                   Analyze                 Publish                    Expose
                  Collaborate




Storage Site       local       group       institutional            national           (global)
                   (research workbench) (community portal)          (national repository)

Data Visibility    private      group          institution          public




13.12.10                        Publication of Environmental Data                                 10
Project Objectives

   Definition of a standard procedure for publication of
    observational data including documentation of quality
    assurance actions.
   Development of a web-based software system that leads
    the researcher through metadata entering as well as
    assists the publication agent to finalize the process.
   Integration of the software system into the existing central
    data repository for meteorology (Word Data Center for
    Climate (WDCC)).
   Generalisation of the defined process for other
    environmental sciences.


13.12.10                 Publication of Environmental Data    11
Meteorological
                                   Data




13.12.10   Publication of Environmental Data                                           12
                                          http://www.flickr.com/photos/nanagyei/4661459810/
Meteorological Data Sources


              Climate Simulations                                  Experimental data

   Data from Models: grid-oriented, 2-3             Empirical Data: various structures in time
   spatial- & 1 time-dimensions & 1 variable        and space
   dimension & 1 sampling/probability
   dimension




                                                                Not so big amount but
      Large amount, but simple structure                         much more complex

13.12.10                             Publication of Environmental Data                           13
Weather Experiments




13.12.10       Publication of Environmental Data   14
Meteorological Data

   We can distinguish
      experimental (observational) data (small amount, heterogenous)
       and
      climate simulation data (huge amount, simple structure)
                               Experimental data                         Climate simulation data
      Storage       WDCC (work in progress)                     WDCC
      location
      Formats       NetCDF (with restrictions, work in          NetCDF
                    progress)
      Exposition/   WDCC (work in progress), TIBORDER           CERA catalogue, TIBORDER
      Registry      (work in progress)
      Quality                     Project focus                 QA more technical than scientific
      Stability           No changes to primary data allowed, changes to metadata are restricted
      Identifier                Digital Object Identifier (DOI), Uniform Resource Name (URN)




13.12.10                                   Publication of Environmental Data                        15
Relevant Meteorological Projects

Experimental data:
 As part of the ”Quantitative Precipitation Forecast”


  (DFG SPP1167):
          Convective and Orographically-induced Precipitation
           Study (COPS), measurements in the Black Forest in
           2007, http://www.cops2007.de.
          General Observation Period (GOP), extended
           measurements in Central Europe in 2007,
           http://gop.meteo.uni-koeln.de/gop/doku.php.
          All participants have agreed to publish the data to
           support further research.

13.12.10                     Publication of Environmental Data   16
Relevant Meteorological Projects

Climate simulation data:
  Coupled Model Intercomparison Project Phase 5 (CMIP5):
     Standard experimental protocol for studying the output of


      coupled ocean-atmosphere general circulation models
      (GCMs)
     Provides a community-based infrastructure in support of


      climate model diagnosis, validation, intercomparison,
      documentation and data access.
     Addresses outstanding scientific questions that arose as


      part of the IPCC AR4 (the Intergovernmental Panel on
      Climate Change 4th Assessment Report) process.
     Provides estimates of future climate change that will be


      useful to those considering its possible consequences.
13.12.10                  Publication of Environmental Data       17
The World Data Center
                                  for Climate (WDCC)




13.12.10   Publication of Environmental Data         18
Long term archival

   The WDCC in Hamburg, Germany operates large databases (60 PB)
    for the long-term archival of data from climate simulation and
    weather experiments.
   WDCC is controlled by ”Deutsches Klimarechenzentrum” (German
    climate data processing center)
   Data production: 50 PB/year

    Limit for mass storage archive: 10 PB/year
       Data with expiration date
   Limit for long-term data archive: 1 PB/year
     
        Data without expiration date
   Currently only a very small amount of data is published
    (approx. 1,5 TB), this is expected to grow significantly.


13.12.10                   Publication of Environmental Data         19
WDCC equipment




   HPC Cluster ("blizzard")
          IBM p575 "Power6" cluster
          water cooled, 16 dual core CPUs per node, total: 264 nodes, 8448 cores
          Total system peak performance: 158 TeraFlops/s
          Top500: Rank 27 in 06/09
          20 TeraByte memory
          3 PetaByte GPFS file system (additional 3 PetaByte in 2011)



13.12.10                               Publication of Environmental Data            20
HLRE2 Data Archive: HPSS

   6 Sun StorageTek SL8500 tape libraries
       10 000 media slots per library, 8 robots per library, 73 tape drives
       total capacity: 60 PetaByte.
   projected fill rate: 10 PetaByte/year




13.12.10                        Publication of Environmental Data              21
The Publication
                                  System Atarrabi




13.12.10   Publication of Environmental Data                                         22
                                    http://www.flickr.com/photos/17258892@N05/2588347668/
Publication via DOI and URN

   Experiments of particular importance can be published with
    a DOI and a URN.
   The decision making will take place at WDCC.
   DOI and URN registration by ”TIB Hannover”.
   Data is double-checked before publication (scientific and
    technical quality assurance).
   Most important is a complete and correct metadata record.




13.12.10                 Publication of Environmental Data   23
System Context
                                          Metadata database
                                          at WDCC Hamburg




                                                 CERA2
                                                                         Makes use of a
                                                database
                                                                         workflow engine



                                                1      4
                                                                            80%
             0%
                                   4           Atarrabi              2
            DOI
            URN
                                              Publication
                                                System
           TIBORDER catalogue at
               TIB Hannover                                                       Researcher
                                                      3



                                         0%


                                              Publication agent
                                              at WDCC Hamburg


13.12.10                                Publication of Environmental Data                      24
Wizard-based Metadata Entering

   Divide metadata fields into several logical units.
   The user can leave the wizard at any time and
    return later to continue.
           General      Spatial and temporal coverage          Instruments




13.12.10                   Publication of Environmental Data                 25
Technology Stack




13.12.10     Publication of Environmental Data   26
Acquired
                                           Analysed
                                             Archived

                                         Climate Data
                                         for Our Future

                                     Prof. Dr. Andreas Hense
                                     andreas.hense@h-brs.de

                                             visit us:
                                        umwelt.wikidora.com




13.12.10   Publication of Environmental Data                   27

Contenu connexe

En vedette

ICLR Friday Forum: Climate data in Ontario (November 13, 2015)
ICLR Friday Forum: Climate data in Ontario (November 13, 2015)ICLR Friday Forum: Climate data in Ontario (November 13, 2015)
ICLR Friday Forum: Climate data in Ontario (November 13, 2015)glennmcgillivray
 
Contextualizing the Visualization of Climate Data
Contextualizing the Visualization of Climate DataContextualizing the Visualization of Climate Data
Contextualizing the Visualization of Climate DataRaquel Alegre
 
20090701 Climate Data Staging
20090701 Climate Data Staging20090701 Climate Data Staging
20090701 Climate Data StagingHenning Bergmeyer
 
Saudi arabia climate policy report
Saudi arabia climate policy reportSaudi arabia climate policy report
Saudi arabia climate policy reportAaron Dorman
 
Geography Water
Geography WaterGeography Water
Geography WaterDaniela L
 
Collaborate 2012: Environmental Accounting and Reporting
Collaborate 2012: Environmental Accounting and ReportingCollaborate 2012: Environmental Accounting and Reporting
Collaborate 2012: Environmental Accounting and ReportingAngela Miller
 
Saudi arabia presentation
Saudi arabia presentationSaudi arabia presentation
Saudi arabia presentationbpdow12
 
BigDataEurope - Big Data & Climate Change
BigDataEurope - Big Data & Climate ChangeBigDataEurope - Big Data & Climate Change
BigDataEurope - Big Data & Climate ChangeBigData_Europe
 
Saudi Aramco Carbon Management - May 2013
Saudi Aramco Carbon Management - May 2013Saudi Aramco Carbon Management - May 2013
Saudi Aramco Carbon Management - May 2013Global CCS Institute
 
The Role of DAta for Climate Monitoring and Prediction
The Role of DAta for Climate Monitoring and PredictionThe Role of DAta for Climate Monitoring and Prediction
The Role of DAta for Climate Monitoring and PredictionNAP Events
 
Beginners guide to weather and climate data
Beginners guide to weather and climate dataBeginners guide to weather and climate data
Beginners guide to weather and climate dataMargriet Groenendijk
 
Architecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case StudyArchitecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case StudyMark Ginnebaugh
 
Climate data in india - Open and Closed
Climate data in india - Open and ClosedClimate data in india - Open and Closed
Climate data in india - Open and ClosedPavan Srinath
 
Climate of india
Climate of indiaClimate of india
Climate of indiabj786
 

En vedette (17)

ICLR Friday Forum: Climate data in Ontario (November 13, 2015)
ICLR Friday Forum: Climate data in Ontario (November 13, 2015)ICLR Friday Forum: Climate data in Ontario (November 13, 2015)
ICLR Friday Forum: Climate data in Ontario (November 13, 2015)
 
Contextualizing the Visualization of Climate Data
Contextualizing the Visualization of Climate DataContextualizing the Visualization of Climate Data
Contextualizing the Visualization of Climate Data
 
20090701 Climate Data Staging
20090701 Climate Data Staging20090701 Climate Data Staging
20090701 Climate Data Staging
 
Saudi arabia climate policy report
Saudi arabia climate policy reportSaudi arabia climate policy report
Saudi arabia climate policy report
 
Geography Water
Geography WaterGeography Water
Geography Water
 
Collaborate 2012: Environmental Accounting and Reporting
Collaborate 2012: Environmental Accounting and ReportingCollaborate 2012: Environmental Accounting and Reporting
Collaborate 2012: Environmental Accounting and Reporting
 
Saudi arabia presentation
Saudi arabia presentationSaudi arabia presentation
Saudi arabia presentation
 
Making Climate Data Sing
Making Climate Data SingMaking Climate Data Sing
Making Climate Data Sing
 
BigDataEurope - Big Data & Climate Change
BigDataEurope - Big Data & Climate ChangeBigDataEurope - Big Data & Climate Change
BigDataEurope - Big Data & Climate Change
 
Saudi Aramco Carbon Management - May 2013
Saudi Aramco Carbon Management - May 2013Saudi Aramco Carbon Management - May 2013
Saudi Aramco Carbon Management - May 2013
 
The Role of DAta for Climate Monitoring and Prediction
The Role of DAta for Climate Monitoring and PredictionThe Role of DAta for Climate Monitoring and Prediction
The Role of DAta for Climate Monitoring and Prediction
 
Non-renewable groundwater management in Saudi Arabia
Non-renewable groundwater management in Saudi ArabiaNon-renewable groundwater management in Saudi Arabia
Non-renewable groundwater management in Saudi Arabia
 
Gulf Region
Gulf RegionGulf Region
Gulf Region
 
Beginners guide to weather and climate data
Beginners guide to weather and climate dataBeginners guide to weather and climate data
Beginners guide to weather and climate data
 
Architecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case StudyArchitecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case Study
 
Climate data in india - Open and Closed
Climate data in india - Open and ClosedClimate data in india - Open and Closed
Climate data in india - Open and Closed
 
Climate of india
Climate of indiaClimate of india
Climate of india
 

Similaire à Andreas Hense: Climate data for our future – acquired, analysed, archived

Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Robert Grossman
 
What is DataCite-screenshots
What is DataCite-screenshotsWhat is DataCite-screenshots
What is DataCite-screenshotsdatacite
 
DSD-INT 2015 - Addressing high resolution modelling over different computing ...
DSD-INT 2015 - Addressing high resolution modelling over different computing ...DSD-INT 2015 - Addressing high resolution modelling over different computing ...
DSD-INT 2015 - Addressing high resolution modelling over different computing ...Deltares
 
DSD-INT 2015 - Addressing high resolution modelling over different computing ...
DSD-INT 2015 - Addressing high resolution modelling over different computing ...DSD-INT 2015 - Addressing high resolution modelling over different computing ...
DSD-INT 2015 - Addressing high resolution modelling over different computing ...Deltares
 
Tim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasetsTim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasetsTERN Australia
 
Tim Osborn: Research Integrity: Integrity of the published record
Tim Osborn: Research Integrity: Integrity of the published recordTim Osborn: Research Integrity: Integrity of the published record
Tim Osborn: Research Integrity: Integrity of the published recordJisc
 
Presentation
PresentationPresentation
Presentationbolu804
 
Foss4G 2009 Scenz Grid
Foss4G 2009 Scenz GridFoss4G 2009 Scenz Grid
Foss4G 2009 Scenz Gridnoho
 
Knowledge Discovery in Environmental Management
Knowledge Discovery in Environmental Management Knowledge Discovery in Environmental Management
Knowledge Discovery in Environmental Management Dr. Aparna Varde
 
GeoCENS Source Talk: Results from an Atlantic Rainforest Micrometeorology Sen...
GeoCENS Source Talk: Results from an Atlantic Rainforest Micrometeorology Sen...GeoCENS Source Talk: Results from an Atlantic Rainforest Micrometeorology Sen...
GeoCENS Source Talk: Results from an Atlantic Rainforest Micrometeorology Sen...Cybera Inc.
 
Semantically-Enabled Environmental Data Discovery and Integration: Demonstrat...
Semantically-Enabled Environmental Data Discovery and Integration: Demonstrat...Semantically-Enabled Environmental Data Discovery and Integration: Demonstrat...
Semantically-Enabled Environmental Data Discovery and Integration: Demonstrat...Tatiana Tarasova
 
Empowering Transformational Science
Empowering Transformational ScienceEmpowering Transformational Science
Empowering Transformational ScienceChelle Gentemann
 
Modeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVModeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVEUDAT
 
Open ecosystems help science storm the cloud
Open ecosystems help science storm the cloudOpen ecosystems help science storm the cloud
Open ecosystems help science storm the cloudChelle Gentemann
 
IDs书友会 - 主题1 - Swinburne Next Generation Research
IDs书友会 - 主题1 - Swinburne Next Generation Research IDs书友会 - 主题1 - Swinburne Next Generation Research
IDs书友会 - 主题1 - Swinburne Next Generation Research IDs Club 澳洲互联网俱乐部
 
Duerr rdap11 data_publication_at_nsidc
Duerr rdap11 data_publication_at_nsidcDuerr rdap11 data_publication_at_nsidc
Duerr rdap11 data_publication_at_nsidcASIS&T
 
Your Research Data Management with the support of 3TU.Datacentrum
Your Research Data Management with the support of 3TU.DatacentrumYour Research Data Management with the support of 3TU.Datacentrum
Your Research Data Management with the support of 3TU.DatacentrumAnnemiekvdKuil
 
Research data spring: a consortial approach to RDM within SaS
Research data spring: a consortial approach to RDM within SaSResearch data spring: a consortial approach to RDM within SaS
Research data spring: a consortial approach to RDM within SaSJisc RDM
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceRaul Palma
 

Similaire à Andreas Hense: Climate data for our future – acquired, analysed, archived (20)

Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11
 
What is DataCite-screenshots
What is DataCite-screenshotsWhat is DataCite-screenshots
What is DataCite-screenshots
 
Cyberistructure
CyberistructureCyberistructure
Cyberistructure
 
DSD-INT 2015 - Addressing high resolution modelling over different computing ...
DSD-INT 2015 - Addressing high resolution modelling over different computing ...DSD-INT 2015 - Addressing high resolution modelling over different computing ...
DSD-INT 2015 - Addressing high resolution modelling over different computing ...
 
DSD-INT 2015 - Addressing high resolution modelling over different computing ...
DSD-INT 2015 - Addressing high resolution modelling over different computing ...DSD-INT 2015 - Addressing high resolution modelling over different computing ...
DSD-INT 2015 - Addressing high resolution modelling over different computing ...
 
Tim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasetsTim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasets
 
Tim Osborn: Research Integrity: Integrity of the published record
Tim Osborn: Research Integrity: Integrity of the published recordTim Osborn: Research Integrity: Integrity of the published record
Tim Osborn: Research Integrity: Integrity of the published record
 
Presentation
PresentationPresentation
Presentation
 
Foss4G 2009 Scenz Grid
Foss4G 2009 Scenz GridFoss4G 2009 Scenz Grid
Foss4G 2009 Scenz Grid
 
Knowledge Discovery in Environmental Management
Knowledge Discovery in Environmental Management Knowledge Discovery in Environmental Management
Knowledge Discovery in Environmental Management
 
GeoCENS Source Talk: Results from an Atlantic Rainforest Micrometeorology Sen...
GeoCENS Source Talk: Results from an Atlantic Rainforest Micrometeorology Sen...GeoCENS Source Talk: Results from an Atlantic Rainforest Micrometeorology Sen...
GeoCENS Source Talk: Results from an Atlantic Rainforest Micrometeorology Sen...
 
Semantically-Enabled Environmental Data Discovery and Integration: Demonstrat...
Semantically-Enabled Environmental Data Discovery and Integration: Demonstrat...Semantically-Enabled Environmental Data Discovery and Integration: Demonstrat...
Semantically-Enabled Environmental Data Discovery and Integration: Demonstrat...
 
Empowering Transformational Science
Empowering Transformational ScienceEmpowering Transformational Science
Empowering Transformational Science
 
Modeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVModeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROV
 
Open ecosystems help science storm the cloud
Open ecosystems help science storm the cloudOpen ecosystems help science storm the cloud
Open ecosystems help science storm the cloud
 
IDs书友会 - 主题1 - Swinburne Next Generation Research
IDs书友会 - 主题1 - Swinburne Next Generation Research IDs书友会 - 主题1 - Swinburne Next Generation Research
IDs书友会 - 主题1 - Swinburne Next Generation Research
 
Duerr rdap11 data_publication_at_nsidc
Duerr rdap11 data_publication_at_nsidcDuerr rdap11 data_publication_at_nsidc
Duerr rdap11 data_publication_at_nsidc
 
Your Research Data Management with the support of 3TU.Datacentrum
Your Research Data Management with the support of 3TU.DatacentrumYour Research Data Management with the support of 3TU.Datacentrum
Your Research Data Management with the support of 3TU.Datacentrum
 
Research data spring: a consortial approach to RDM within SaS
Research data spring: a consortial approach to RDM within SaSResearch data spring: a consortial approach to RDM within SaS
Research data spring: a consortial approach to RDM within SaS
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth Science
 

Plus de "Open Access - Open Data" conference, 13th/14th December, 2010

Plus de "Open Access - Open Data" conference, 13th/14th December, 2010 (14)

Guido F. Herrmann:
Guido F. Herrmann:Guido F. Herrmann:
Guido F. Herrmann:
 
Jan Velterop: Science publishing: the different interests of record keeping a...
Jan Velterop: Science publishing: the different interests of record keeping a...Jan Velterop: Science publishing: the different interests of record keeping a...
Jan Velterop: Science publishing: the different interests of record keeping a...
 
Rainer Kuhlen: A commons-based foundation of open access and other open models
Rainer Kuhlen: A commons-based foundation of open access and other open models Rainer Kuhlen: A commons-based foundation of open access and other open models
Rainer Kuhlen: A commons-based foundation of open access and other open models
 
Derk Haank: Open Access publishing at Springer
Derk Haank: Open Access publishing at SpringerDerk Haank: Open Access publishing at Springer
Derk Haank: Open Access publishing at Springer
 
Jan Brase: Data and Libraries - the DataCite consortium
Jan Brase: Data and Libraries - the DataCite consortiumJan Brase: Data and Libraries - the DataCite consortium
Jan Brase: Data and Libraries - the DataCite consortium
 
Martin Rasmussen: Ensuring availability and quality of research data through ...
Martin Rasmussen: Ensuring availability and quality of research data through ...Martin Rasmussen: Ensuring availability and quality of research data through ...
Martin Rasmussen: Ensuring availability and quality of research data through ...
 
Anita Eppelin: Open Access and Open Data in Germany: current political develo...
Anita Eppelin: Open Access and Open Data in Germany: current political develo...Anita Eppelin: Open Access and Open Data in Germany: current political develo...
Anita Eppelin: Open Access and Open Data in Germany: current political develo...
 
Sünje Dallmeier-Tiessen: Research data "publishing": models, roles and respon...
Sünje Dallmeier-Tiessen: Research data "publishing": models, roles and respon...Sünje Dallmeier-Tiessen: Research data "publishing": models, roles and respon...
Sünje Dallmeier-Tiessen: Research data "publishing": models, roles and respon...
 
Celina Ramjoué: Open Access in the European Research Area (ERA)
Celina Ramjoué: Open Access in the European Research Area (ERA)Celina Ramjoué: Open Access in the European Research Area (ERA)
Celina Ramjoué: Open Access in the European Research Area (ERA)
 
Toby Green: Data, data everywhere
Toby Green: Data, data everywhereToby Green: Data, data everywhere
Toby Green: Data, data everywhere
 
Alma Swan: The Open Access advantage
Alma Swan: The Open Access advantageAlma Swan: The Open Access advantage
Alma Swan: The Open Access advantage
 
Stevan Harnad: Open Access - Open Data: similarities and differences
Stevan Harnad: Open Access - Open Data: similarities and differencesStevan Harnad: Open Access - Open Data: similarities and differences
Stevan Harnad: Open Access - Open Data: similarities and differences
 
Malcolm Read: Drivers for Open Access and Data - a funder's perspective
Malcolm Read: Drivers for Open Access and Data - a funder's perspectiveMalcolm Read: Drivers for Open Access and Data - a funder's perspective
Malcolm Read: Drivers for Open Access and Data - a funder's perspective
 
Cologne open access slides dec 2010
Cologne open access slides dec 2010Cologne open access slides dec 2010
Cologne open access slides dec 2010
 

Andreas Hense: Climate data for our future – acquired, analysed, archived

  • 1. Acquired Analysed Archived Climate Data for Our Future Prof. Dr. Andreas Hense andreas.hense@h-brs.de "Open Access – Open Data“ Expert Conference in Cologne December 13, 2010
  • 2. Project Partners Bonn-Rhine-Sieg University Bonn University, Deutsches Klimarechen- oAS, Computer Science, Meteorological Institute zentrum GmbH Sankt Augustin Bonn Hamburg Prof. Dr. Andreas V. Hense Prof. Dr. Andreas N. Hense Dr. Michael Lautenschlager Professor for Business Professor for Climate Head of Data Mgmt, Information Systems dynamics Director WDC for Climate (WDCC) Process definitions, Project management & Experimental data & routines for technical QA, software development routines for scientific QA hosting 13.12.10 Publication of Environmental Data 2
  • 3. Agenda  Project Objectives  Meteorological Background  The World Data Center for Climate (WDCC)  The Publication System Atarrabi 13.12.10 Publication of Environmental Data 3
  • 4. Project Objectives 13.12.10 Publication of Environmental Data 4
  • 5. Location in the Scientific Process Project focus Analysis Long term Publication Idea Preparation Execution Local storage archival (DOI/URN) 13.12.10 Publication of Environmental Data 5
  • 6. Data Publication in Research Problem:  Publication and citation have always been common practice for scientific articles.  Scientific articles are often based on data.  To check the results of an article or to do further research the data are necessary. Solution:  Publish the article AND the data. 13.12.10 Publication of Environmental Data 6
  • 7. Aspects of Data Publication  Storage location – The volume of data can be huge (e.g. meteorological data). Who can reliably store the data and assure long term availability and fast access?  Formats – There can be various formats to represent data. Which (meta) data format is the most commonly used?  Exposition/Registry – It is not sufficient to save data ”somewhere” on the web. Scientists have to notice the existence of data. What is the best way to expose data to search engines? Are there well known (domain specific) catalogues where data can be registered?  Quality – Not all data are qualified for publication. What are the minimum requirements? What are (scientific and technical) quality assurance procedures?  Stability – Can data be changed after publication? How are new versions published?  Identifier – How can data be referenced uniformly? Are there any standards? 13.12.10 Publication of Environmental Data 7
  • 8. Storage Sites Experiment Analyze Publish Expose Collaborate Storage Site local group institutional national (global) (research workbench) (community portal) (national repository) Data Visibility private group institution public 13.12.10 Publication of Environmental Data 8
  • 9. A Common Scenario Experiment Analyze Publish Expose Collaborate copy Storage Site local group institutional national (global) (research workbench) (community portal) (national repository) Data Visibility private group institution public 13.12.10 Publication of Environmental Data 9
  • 10. The National Storage Solution Experiment Analyze Publish Expose Collaborate Storage Site local group institutional national (global) (research workbench) (community portal) (national repository) Data Visibility private group institution public 13.12.10 Publication of Environmental Data 10
  • 11. Project Objectives  Definition of a standard procedure for publication of observational data including documentation of quality assurance actions.  Development of a web-based software system that leads the researcher through metadata entering as well as assists the publication agent to finalize the process.  Integration of the software system into the existing central data repository for meteorology (Word Data Center for Climate (WDCC)).  Generalisation of the defined process for other environmental sciences. 13.12.10 Publication of Environmental Data 11
  • 12. Meteorological Data 13.12.10 Publication of Environmental Data 12 http://www.flickr.com/photos/nanagyei/4661459810/
  • 13. Meteorological Data Sources Climate Simulations Experimental data Data from Models: grid-oriented, 2-3 Empirical Data: various structures in time spatial- & 1 time-dimensions & 1 variable and space dimension & 1 sampling/probability dimension Not so big amount but Large amount, but simple structure much more complex 13.12.10 Publication of Environmental Data 13
  • 14. Weather Experiments 13.12.10 Publication of Environmental Data 14
  • 15. Meteorological Data  We can distinguish  experimental (observational) data (small amount, heterogenous) and  climate simulation data (huge amount, simple structure) Experimental data Climate simulation data Storage WDCC (work in progress) WDCC location Formats NetCDF (with restrictions, work in NetCDF progress) Exposition/ WDCC (work in progress), TIBORDER CERA catalogue, TIBORDER Registry (work in progress) Quality Project focus QA more technical than scientific Stability No changes to primary data allowed, changes to metadata are restricted Identifier Digital Object Identifier (DOI), Uniform Resource Name (URN) 13.12.10 Publication of Environmental Data 15
  • 16. Relevant Meteorological Projects Experimental data:  As part of the ”Quantitative Precipitation Forecast” (DFG SPP1167):  Convective and Orographically-induced Precipitation Study (COPS), measurements in the Black Forest in 2007, http://www.cops2007.de.  General Observation Period (GOP), extended measurements in Central Europe in 2007, http://gop.meteo.uni-koeln.de/gop/doku.php.  All participants have agreed to publish the data to support further research. 13.12.10 Publication of Environmental Data 16
  • 17. Relevant Meteorological Projects Climate simulation data:  Coupled Model Intercomparison Project Phase 5 (CMIP5):  Standard experimental protocol for studying the output of coupled ocean-atmosphere general circulation models (GCMs)  Provides a community-based infrastructure in support of climate model diagnosis, validation, intercomparison, documentation and data access.  Addresses outstanding scientific questions that arose as part of the IPCC AR4 (the Intergovernmental Panel on Climate Change 4th Assessment Report) process.  Provides estimates of future climate change that will be useful to those considering its possible consequences. 13.12.10 Publication of Environmental Data 17
  • 18. The World Data Center for Climate (WDCC) 13.12.10 Publication of Environmental Data 18
  • 19. Long term archival  The WDCC in Hamburg, Germany operates large databases (60 PB) for the long-term archival of data from climate simulation and weather experiments.  WDCC is controlled by ”Deutsches Klimarechenzentrum” (German climate data processing center)  Data production: 50 PB/year  Limit for mass storage archive: 10 PB/year  Data with expiration date  Limit for long-term data archive: 1 PB/year  Data without expiration date  Currently only a very small amount of data is published (approx. 1,5 TB), this is expected to grow significantly. 13.12.10 Publication of Environmental Data 19
  • 20. WDCC equipment  HPC Cluster ("blizzard")  IBM p575 "Power6" cluster  water cooled, 16 dual core CPUs per node, total: 264 nodes, 8448 cores  Total system peak performance: 158 TeraFlops/s  Top500: Rank 27 in 06/09  20 TeraByte memory  3 PetaByte GPFS file system (additional 3 PetaByte in 2011) 13.12.10 Publication of Environmental Data 20
  • 21. HLRE2 Data Archive: HPSS  6 Sun StorageTek SL8500 tape libraries  10 000 media slots per library, 8 robots per library, 73 tape drives  total capacity: 60 PetaByte.  projected fill rate: 10 PetaByte/year 13.12.10 Publication of Environmental Data 21
  • 22. The Publication System Atarrabi 13.12.10 Publication of Environmental Data 22 http://www.flickr.com/photos/17258892@N05/2588347668/
  • 23. Publication via DOI and URN  Experiments of particular importance can be published with a DOI and a URN.  The decision making will take place at WDCC.  DOI and URN registration by ”TIB Hannover”.  Data is double-checked before publication (scientific and technical quality assurance).  Most important is a complete and correct metadata record. 13.12.10 Publication of Environmental Data 23
  • 24. System Context Metadata database at WDCC Hamburg CERA2 Makes use of a database workflow engine 1 4 80% 0% 4 Atarrabi 2 DOI URN Publication System TIBORDER catalogue at TIB Hannover Researcher 3 0% Publication agent at WDCC Hamburg 13.12.10 Publication of Environmental Data 24
  • 25. Wizard-based Metadata Entering  Divide metadata fields into several logical units.  The user can leave the wizard at any time and return later to continue. General Spatial and temporal coverage Instruments 13.12.10 Publication of Environmental Data 25
  • 26. Technology Stack 13.12.10 Publication of Environmental Data 26
  • 27. Acquired Analysed Archived Climate Data for Our Future Prof. Dr. Andreas Hense andreas.hense@h-brs.de visit us: umwelt.wikidora.com 13.12.10 Publication of Environmental Data 27