Presentation at AGU Fall Meeting 2018: Large-scale, global geochemical data syntheses like EarthChem and GEOROC have, for nearly two decades, inspired and made possible a vast range of scientific studies and new discoveries, facilitating the analysis and mining of geochemical data and creating new paradigms in geochemical data analysis such as statistical geochemistry. These syntheses provide easy access to fully integrated compilations of thousands of datasets (‘data fusion’) with millions of geochemical measurements that are accompanied by comprehensive and harmonized metadata for context and provenance to search, filter, sort, and evaluate the data.
The syntheses have been assembled and maintained through manual labor by data managers, who extract data and metadata from text, tables, and supplements of publications for inclusion in the databases, a time-consuming task due to the multitude of data formats, units, normalizations, vocabularies, etc., i.e. lack of best practices for geochemical data reporting. In order to support and advance future science endeavors that rely on access to and analysis of large volumes of geochemical data, we need to develop and implement global standards for geochemical data that not only make geochemical data FAIR (Findable, Accessible, Interoperable, Re-usable), but ready for data fusion. As more geochemical data systems are emerging at national, programmatic, and subdomain levels in response to Open Access policies and science needs, standard protocols for exchanging geochemical data among these systems will need to be developed, implemented, and governed.
Critical is the alignment with existing standards such as the Semantic Sensor Network (SSN) ontology, a recent joint W3C and OGC standard that standardizes description of sensors, observation, sampling, and actuation, with sufficient flexibility to allow details of these elements to be defined in different domains. New initiatives within the International Council for Science and CODATA are working towards coordinating the International Science Unions to identify and endorse the more authoritative standards (including vocabularies and ontologies). These initiatives present a timely opportunity for geochemical data to ensure that they are born ‘connected’ within and across disciplines.
Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking!
1. BOOSTING
DATA SCIENCE
IN GEOCHEMISTRY
We Need Global Geochemical Data
Standards and Networking!
Kerstin Lehnert Lamont-Doherty Earth Observatory, Columbia University, USA
Lesley A Wyborn Australian National University, Australia
Simon J D Cox CSIRO Land and Water, Australia
Jens F Klump CSIRO Earth Science Resource Engineering, Australia
Brent McInnes Curtin University, Australia
2. Data Science is Happening in
Geochemistry (and Mineralogy, Petrology, etc.)
V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 2
Goldschmidt 2018 Workshop
“Data Science in Geochemistry”
3. Just Reflecting on this Session ...
■ How much work went into assembling the data to do the data-driven
research in each of the talks?
■ What standards were followed to compile the data? What
information about uncertainties or analytical procedure was
included, what terminology was used?
■ Can I integrate the data compiled for talk A with the one from talk B?
■ Can we use the tools presented in talk X with the data from talk Y?
V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 3
4. Obstacles for Data Science
■ Surveys in recent years show that data scientists still spend 75-80%
of their time ‘data wrangling’.
V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 4
Source:
Crowdflower
• RDA EU survey 2013 (75%)
• Brodie 2015 (80%)
• CrowdFlower 2017 (80%)
Did you?
5. Example: Data Synthesis for DECADE
■ 15 scientists working for 5 days
■ Major progress was only made
with the compilation of melt
inclusion geochemistry because
data were discoverable in PetDB
and GEOROC.
■ Another 2 months of effort of the
EarthChem data manager
required to format & integrate
data from different databases
and unpublished data.
V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 5
6. Urgency for a Geochemical Data
Standard
■ We need to be able to share & integrate data globally from multiple
databases each with their own schema.
■ We need to integrate & link data across disciplines (transdisciplinary).
■ We need to ensure compliance with FAIR data principles.
■ We need it to
– Be more comprehensive with respect to data documentation,
– Be aligned with modern standards, e.g. RDF,
– Use, where possible, internationally endorsed vocabularies.
■ Above all, we need to have a more formal approval and governance.
– We need to think of standard specifically for both technical and 'social' reasons.
V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 6
7. A Never-Ending Story?
V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 7
IGC 2008
8. Can We “Standardize” Geochemical
Data?
V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 8
I believe that our failure to unite our
voices as geochemists has a simple origin
– it is the complexity of our subject.
9. We Made Some Progress
■ Editors Roundtable recommendations with geochemical journals and
databases
■ EarthChem XML schema
■ Rise of the IGSN
V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 9
Goldstein et al. 2014, published in the
EarthChem Library
doi:10.1594/IEDA/100426
10. EarthChem Data Templates
V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 10
11. EarthChemXML
• Developed for the EarthChem Portal (ECP) in 2006
• Locally developed XML schema for data exchange that partner data
systems use to encode their database content for inclusion in the
ECP database.
• Not comprehensive with respect to metadata, uses EarthCem
vocabularies (so does not align with broader community
vocabularies)
• XML format is voluminous, especially for databases with hundreds of
thousands of records.
V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 11
12. EarthChem Portal
V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 12
• 22,074 publications
• 1,054,738 samples
• 30,059,995 analytical values
Global Federation of Geochemical Databases:
• PetDB
• SedDB
• GEOROC (Germany)
• USGS
• MetPetDB
• GANSEKI (Japan)
• Data exchange protocol: EarthChemXML
• APIs & web services (WMS, WFS)
• Interoperability with modeling tools
More formal & community governed
standards needed for FAIR
13. Interoperable EarthChem Data
V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 13
DECADE Portal (beta)
http://decade.iedadata.org
14. V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 14
15. V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 15
16. V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 16
17. V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 17
18. Proliferation of Geochemical
Databases
■ International
■ Science programs
■ Thematic
V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 18
sponsored by the State Key Lab of
the Geological Processes and
Mineral Resources in the China
University of Geosciences
19. ‚long tail‘ communities:
• Analogue modelling
• Rock physics/ mechanics
• Paleomagnetics
• Geochemistry
Slide contributed by Kirsten Elger, GFZ Potsdam
V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 19
20. Spain, 11, 41%
Netherlands, 1,
4%Portugal, 3,
11%
Italy, 12, 44%
27 Analytical labs, 4 countries
Spain
Netherlands
Portugal
Italy
Barcelona Workshop (Nov 2018):
• Agreement to use EarthChem
Library templates for data
publications via GFZ Data Services
• Interest in collaboration for the
development of global standards for
geochemical data
V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 20
Slide contributed by Kirsten Elger, GFZ Potsdam
21. Data Standards in Geochemistry
are no longer an option
■ Publishers & funders are demanding FAIR data.
■ In order to do Data Science, we need to have a global network of
geochemical data that can be accessed in a standardized format.
■ No one can do it alone – no one organization, no one group, no one
country has the required resources or expertise. We need to build a
global geochemistry data platform together!
V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 21
“We must, indeed, all hang together or, most
assuredly, we shall all hang separately”.
Benjamin Franklin
22. AGU Town Hall
“Building a Global Network of Geochemical Data”
Tuesday, Dec 11, 6:15-7:00pm
Marriott Marquis, Independence E
Panel:
Roberta Rudnick (President, Geochemical Society)
Catherine Chauvel (Editor, Chemical Geology)
Maria Uhle (NSF Program Director for International Activities)
Lesley Wyborn, ANU
V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 22