3. Domain-Specific Data Stewardship
3
• Domain-specific guidelines, templates, software tools, and
user support/training that facilitate data submission
• including domain-specific tools for data management planning and
compliance reporting
• Development, maintenance, and promotion of domain-
specific, community-based standards for data and metadata
• Provenance documentation, uncertainties, semantics
(vocabularies, taxonomy), formats
• User interfaces optimized for science questions
• Harmonization & integration of data for advanced mining &
analysis
• Access to external data in relevant otherMapping of data to
standards-based interfaces for interoperability
4. Domain-specific
Repository
Science Community
Central Role of Discipline-specific Repositories
4
Libraries
Archives
Computer
Science
Publishers, e
ditors
Metadata registration
Software (tool) development
Interoperability
Data policies
Persistent access
Bibliometrics
Data Curation
Data access & discovery
Data products
Data harmonization (standards)
User Support
Funding
Agencies
Data
Facilities
Registries
5. IEDA Foci
Data Discovery & Access
Data Preservation & Curation
Data Analysis
Investigator Support
• QA/QC, documentation
• Persistent identification (DOI)
• Long-term archiving
5
6. Marine Geoscience Data System
6
Data Collections and Custom Data
Access:
• GeoPRISMs
• Ridge2000
• MARGINS
• Academic Seismic Portal (ASP)
• Antarctic and Southern Ocean
(ASODS)
• Metadata Catalog and File repository
• Catalog inventory > 0.5 million files, 47
TB, 2,500 programs
7. EarthChem Library
7
• Repository for geochemical data
• analytical data sets
• syntheses
• models
• reports
• Online data submission
• Templates for data annotations
• Quality control following the Editors
Roundtable best practices
8. IGSN / SESAR
• IGSN: Unique, persistent, resolvable identifiers
• SESAR: registry of samples in the Earth Sciences
• Searchable catalog of samples across the Earth Sciences
• Preservation and persistent access of sample metadata
• Used across all Earth Science communities that deal with
samples
• User services for sample metadata management
• submission, editing, transfer of ownership, tracking of subsamples, etc.
• International governance by the IGSN e.V.
• non-profit organization, founded in 2011, registered in Germany
• currently 13 members (4 new members in 2013)
8
9. IEDA Foci
Data Discovery & Access
Data Preservation & Curation
Data Analysis
Investigator Support
• Web-based User interfaces (specialist & non-specialists)
• Programmatic access interfaces (interoperability)
• GeoMapApp, GoogleEarth, etc.
• Links to the literature
9
11. IEDA Foci
Data Discovery & Access
Data Preservation & Curation
Data Analysis
Investigator Support
• Visualization tools (GeoMapApp, Virtual
Ocean, Earth Observer)
• Syntheses & Products
11
12. GlobalMulti-Resolution TopographySynthesis
12
Compilation of multi-beam sonar
data collected by scientists and
institutions worldwide, edited and
merged into a single continuously
updated compilation of high-
resolution seafloor topography.
13. Global Synthesis of rock compositions
(EarthChem, PetDB)
13
• Map of basalt samples from mid-ocean ridges
• Color scaled to the 87Sr/86Sr ratio measured on
these samples
14. IEDA Foci
Data Discovery & Access
Data Preservation & Curation
Data Analysis
Investigator Support
• Web-based data submission
• Data Management Plan tool
• Data Compliance Report tool
• Community
14
15. Use of DMP Tool
15
Target Program Count
other 27
OIA 3
OCE 134
OPP 11
SBE 1
BIO 5
EAR 94
Total 275
16. IEDA Infrastructure
• Cooperative Agreement with NSF
• Sustainable funding
• Formal community governance & guidance
• Professional data management policies & procedures
• Persistent identification of data & samples (DOI, IGSN)
• Standards-compliant metadata catalog
• Long-term archiving agreements with National Geophysical Data Center
& Columbia University Libraries
• Risk management
• “Accreditation” as member of the World Data System
• Disciplinary expertise
16
17. System Usage
• # of unique visitors to the IEDA web site increased by 251%
• 7998 unique visitors between Oct 2012 and Sept 2013
• primary pages accessed: Data Management Plan & IEDA collections.
17
Results of the user survey of the
project “Stakeholder Alignment
in the Geosciences: Assessing the
Potential Impacts of
EarthCube”, showing that IEDA
ranks with top 5-8 most cited
data sources in the Earth
Sciences
J. Cutcher-Gershenfeld, presentation at the
EarthCube Domain End-user workshop for
Paleogeoscience, February 2013
18. Downloads from IEDA Systems
18
Data Collection Year 2 Year 3
PetDB 2166 2326
SedDB 52 200
EarthChem Library 95 401
EarthChem Portal (1153) 567
MGDS 5,049 4,331
GMRT 7,200 10,177
28. Editors Roundtable
• Based on the Editors Roundtable in Geochemistry (2007/8)
• policy recommendations for reporting of geochemical data
• Goal: Establish an ongoing forum for information exchange
between editors, publishers, professional societies, and data
facilities
• regular meetings at major conferences
• wiki (knowledge hub) for best practices, guidelines, capabilities for data
publication and data citation,
• focus on domain-specific requirements, practices, data facilities, etc.
• Will be international and independent of a specific institution
or society (ESIP?)
• Could serve as a role model for other disciplines
28
29. IEDA Data Rescue Initiative
• preserve valuable legacy data sets that are in danger because
of impending retirement or degradation
• augment data collections maintained by IEDA
• improve procedures and tools for user contributions
• 2013 International Data Rescue Award in the Geosciences
• IEDA Data Rescue Mini-Awards
• Data Rescue Process Study (collaboration with Elsevier Research Data
Services)
29
30. IEDA Data Rescue Mini-awards
30
379
409
380
381
378
382
411
417
711
413
414
410
412
415
416
418
420
419
422
421
408
15427, 72 neg. 15’
Delano J, Hauri E, Saal
A, Shearer C:
“Geochemistry of Lunar
Glasses”
Gill J: “Geochemical &
geochronological data from
Fiji,IBM, and Endeavor segments”
Tivey, M:“Near-bottom
Magnetic Data Rescue”
31. Lessons Learned
31
• Investigators Lessons
• Take ownership of your own legacy
• Data curation by others may not be complete or correct
• Data rescue of an entire career does not need to be overwhelming
• Start with small steps
• Disciplinary repositories will help and guide you to what is needed
• Despite the time investment, data rescue is worth it
• Others will now be able to re-use the data
• Notes taken years ago actually explain anomalies
• Repository Lessons
• For Long Tail Data, every project is different
• A small incentive will motivate investigators
• Data Rescue missions help the repository determine next steps for
development of tools and services
32. • $5,000 award (sponsored by Elsevier) plus trophy
• International jury
• 16 submissions
32
35. Collaborations
• New subawards
• to UTIG for ASP@UTIG
• to M. Ghiorso (OFM-Research) to migrate LEPR data system into IEDA
infrastructure (includes Trace KD database developed by Roger Nielsen)
• Industry collaborations
• Elsevier funds Data Rescue Process Study
• ESRI will help with GeoMapApp
• EarthCube projects
35
36. EarthCube Projects
• “Deploying Web Services Across Multiple Geoscience Domains”
• BB, lead: T.Ahern, IRIS; IEDA co-PI: Carbotte): The project is focused on developing web
services for broadening access to data collections of IRIS, IEDA, UNAVCO, UCAR, Caltec, and
SDSC by other disciplines. (main)
• “Community Inventory of EarthCube Resources for Geosciences
Interoperability (CINERGI)”
• BB, lead: I. Zaslavsky, SDSC/UCSD; IEDA co-PI: Lehnert): The project focuses on developing
an inventory of EarthCube resources, including data systems, standards, services, etc.
• “Leveraging Semantics and Crowdsourcing in Data Sharing and Discovery”
• BB, lead: T. Narock, University of Maryland, IEDA co-PI: R. Arko: The project focuses on
applying Semantic Web technologies, including Linked Data, to support sharing and
integration of ocean science data sets.
• “C4P: Collaboration and Cyberinfrastructure for Paleobiosciences”
• RCN, lead: K. Lehnert, IEDA; project focuses on advancing cyberinfrastructure for
paleobiosciences
• “Building a Sediment Experimentalist Network (SEN)”
• RCN, lead: W. Kim, UT Austin; IEDA co-PI: Hsu
• “EarthCube Test Enterprise Governance: An Agile Approach”
• Test Enterprise Governance, lead: L. Allison, University of Arizona; IEDA sub-awardee:
Lehnert
36
37. Council of Data Facilities
“The mission of the Council of Data Facilities is to serve in a
coordinating and facilitating role”
• Provide a collective voice on behalf of the member data facilities to the
NSF and other foundations and associations, as appropriate.
• Identify, endorse, and promote standards and best or exemplary
practices in the organization and operation of a data facility.
• Identify and support the development and utilization of shared
infrastructure services, including computing services, professional staff
development and training services, and related activities.
• Foster innovation through collaborative projects.
• Collaborate with standard-setting bodies with respect to standards for
data sharing and interoperability, metadata, and related matters.
37
38. Council of Data Facilities
• Definition: “A data facility is eligible for membership in the
Council if it acquires, curates, preserves, and/or disseminates
data, software, models and data services for one or more
defined communities in the geosciences.”
• Category A: NSF-funded not-for-profit or academic data facilities
• Category B: Federally Funded Research and Development Centers
(FFRDCs) and other federal, state, and local data facilities.
• Category C: International, private, and other not-for-profit or academic
data facilities..
• Category D: Associate members
• Membership categories A, B, and C are all voting members of
the Council, with each member sending one designated
representative to the General Assembly.
38
39. Council of Data Facilities
• provide advice and guidance to the NSF via the Council’s
Executive Committee on matters pertaining
• identify and develop opportunities for collaboration (shared
infrastructure, professional development of staff, etc.)
• contribute to the development of geoscience
cyberinfrastructure standards and identified best practices and
their implementation or adoption, and help ensure compliance
and integration into architectures and workflows in their
respective facilities
• educate other members of the Council on new developments
relevant to data centers in their respective
fields, disciplines, and domains (international, private
foundation, etc.).
39
40. IEDA: A Multi-Disciplinary Microcosm
www.iedadata.org
40
• geochemistry, marine geophysics, marine geology, geochronology, and more
• sensor data versus sample-based observations & experiments
• raw data (e.g. multi-beam), field data, lab data, derived data, samples
• gridded data, point data, time-series data, maps, photos, and more
• file sizes vary from a few kilobytes to terabytes
Notes de l'éditeur
Development, governance, and promotion of domain-specific, community-based standards for data and metadataProvenance documentation, uncertainties, semantics (vocabularies, taxonomy), formatsData qualityQuality assessment & control of ingested & served dataScience-driven software tools for data discovery, access, and analysisHarmonization & integration of data for advanced data mining and analysis (data products)User support/training for data managementMapping of data to standards-based interfaces for interoperability
IEDA is a data facility that hosts observational solid earth data andtools from the marine, terrestrial, and polar environments.¤ Multiple diverse data systems that were developed independently,serving both¤ sensor data from large collaborative cruise programs¤ sample-based measurements from unique analytical laboratories¤ IEDA data systems enable the data to be discovered and reused.