Knowledge evolves in geoscience, and the evolution is reflected in datasets. In a context with distributed data sources, the evolution of knowledge may cause considerable challenges to data management and re-use. For example, a short news published in 2009 (Mascarelli, 2009) revealed the geoscience community’s concern that the International Commission on Stratigraphy’s change to the definition of Quaternary may bring heavy reworking of geologic maps. Now we are in the era of the World Wide Web, and geoscience knowledge is increasingly modeled and encoded in the form of ontologies and vocabularies by using semantic technologies. Accordingly, knowledge evolution leads to a consequence called ontology dynamics. Flouris et al. (2008) summarized 10 topics of general ontology changes/dynamics such as: ontology mapping, morphism, evolution, debugging and versioning, etc. Ontology dynamics makes impacts at several stages of a data life cycle and causes challenges, such as: the request for reworking of the extant data in a data center, semantic mismatch among data sources, differentiated understanding of a same piece of dataset between data providers and data users, as well as error propagation in cross-discipline data discovery and re-use (Ma et al., 2014). This presentation will analyze the best practices in the geoscience community so far and summarize a few recommendations to reduce the negative impacts of ontology dynamics in a data life cycle, including: communities of practice and collaboration on ontology and vocabulary building, link data records to standardized terms, and methods for (semi-)automatic reworking of datasets using semantic technologies.
References:
Flouris, G., Manakanatas, D., Kondylakis, H., Plexousakis, D., Antoniou, G., 2008. Ontology change: classification and survey. The Knowledge Engineering Review 23 (2), 117-152.
Ma, X., Fox, P., Rozell, E., West, P., Zednik, S., 2014. Ontology dynamics in a data life cycle: Challenges and recommendations from a Geoscience Perspective. Journal of Earth Science 25 (2), 407-412.
Mascarelli, A.L., 2009. Quaternary geologists win timescale vote. Nature 459, 624.
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies
1. TWC
Knowledge Evolution in
Distributed Geoscience Datasets and
the Role of Semantic Technologies
Xiaogang (Marshall) Ma
Tetherless World Constellation
Rensselaer Polytechnic Institute
@MarshallXMamax7@rpi.edu
x.marshall.ma
rpi.edu/~max7
0000-0002-9110-7369MarshallXMa
2. TWCWilliam Smith's 1815 geologic
map of England and Wales
with part of Scotland
William Smith
(1769-1839)
(Image source: Geological Society of London)
13. TWC
13
Distributed datasets:
Mismatches of geological
units across political
boundaries
Italy/France near
Cuneo/Colmar
Cambrian Carboniferous
(Asch et al., 2012)
(Base map courtesy:
OneGeology-Europe and USGS)
14. TWC
14
Distributed datasets:
Mismatches of geological
units across political
boundaries
Italy/France near
Cuneo/Colmar
Cambrian Carboniferous
(Asch et al., 2012)
(Ma et al., 2014)
Felsic and hornblendic gneisses
Granitic rocks
Wyoming/Colorado
(Base map courtesy:
OneGeology-Europe and USGS)
15. TWC• Data and models, vocabularies, and ontologies
– Have we ever had model-independent datasets?
• Ontology dynamics and a data life cycle
15
CONCEPT
*Initial concepts
*Questions and
answers
*Grant info
COLLECTION
*Questionnaire
*Coded instrument
*CAI metadata
*Paradata
PROCESSING
*Data specs
*Recodes
*Summary
descriptive info
DISTRIBUTION
*Terms of use
*Citation
*Packaging info
DISCOVERY
*Catalog record
*Indexing
*Related
publications
ANALYSIS
*Replication code
*Publications
ARCHIVING
*Preservation metadata
*Confidentiality
*Additional processing
REPURPOSING
*Post-hoc harmonization
*Data transformations
Diagram reproduced from (Spencer, 2012)
17. TWCPotential challenges
• Reworking of the extant data in a data center
– e.g. caused by ontology/vocabulary versioning
• Semantic mismatch among data sources
– e.g. heterogeneity in ontologies of the same topic
• Differentiated understanding of a same piece of dataset
between data providers and data users
– e.g. a data provider understands Quaternary as 1.806 Ma-present,
and a data user understands it as 2.588 Ma-present
• Error propagation in cross-discipline data re-use
– e.g. heterogeneous datasets may cause misconception in
subsequent works
17
(Ma et al., 2014)
18. TWCOneGeology-Europe
• 20 European nations
providing national geologic
maps at scale ~1: 1M
• Harmonized geological
terms and map legends
• Multilingual labels in 18
languages
• Central portal for data
browsing/query among
distributed data sources
A contribution to
INSPIRE
http://www.onegeology-europe.org
18
A few recent works of interest
20. TWC
20
Earth Resource Form
Environmental Impact Value
Exploration Activity Type
Exploration Result
UNFC Value
Earth Resource Expression
Earth Resource Shape
Enduse Potential
Mineral Occurrence Type
Mining Activity Type
Processing Activity Type
Mining Waste Type Value
Commodity Code
Mineral Deposit Group
Mineral Deposit Type
Product Value
Recently finished CGI vocabularies
• Construct a collection of vocabularies for
populating information interchange
documents and enabling interoperability
• Provide labels for concepts, scope to
various communities defined by
language, science domain, or application
domain
CGI Geoscience Terminology Workgroup
http://cgi-iugs.org/tech_collaboration/
geoscience_terminology_working_group.html
21. TWC
21
USGS Online Geologic Maps
• Standardized vocabulary
with detailed annotation
• Forward and backward
queries between spatial
data and attribute data
• Links to further data
sources, e.g. aeromagnetic
survey, mineral resources
data, soils, geochemical
samples, etc.
http://mrdata.usgs.gov/geology/
state/map.html
23. TWCRecommendations
• Communities of practice on ontology and vocabulary
– Bottom-up, self-organized, and loose top-down control
• Formalize the ‘Concept’ step in a data life cycle
– Top-down, and adopt outputs from the bottom-up approach
• Make it a virtuous circle among the bottom-up and top-
down approaches
23
Thanks for listening.
@MarshallXMamax7@rpi.edu