1. Harmonization of vocabularies for water data
Jonathan Yu | Research engineer
HIC 2014, 17 August 2014
LAND AND WATER FLAGSHIP | OCEANS AND ATMOSPHERE FLAGSHIP
2. Outline
• Context and problem space – need formal mechanisms for
publishing vocabularies
• Use of semantic web tech to publish and harmonise vocabularies
• Challenges still exist
• conceptualisation as both classes and individuals – pragmatic but problematic
• URI patterns
• Versioning and keeping track
• Suggested paths forward?
3. Issues
• Formalization
• RDF SKOS OWL
• Collections
• Re-use/clone/leave alone
• URI Patterns
• Distribution
• UIs/APIs
• Versioning
• Mappings
• Search and discovery
Presentation title | Presenter name3 |
5. AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
cas_rn
number
ANGDTS
Code ANGDTS Description Units_used
WDTF
Parameter chemical name
ADWG
name
IUPAC
name Group Ion
EC EC
ease at which conduction current can be
caused to flow through material in
microSiemens/centimetre
us/cm
ms/cm
mg/L
ElectricalConduc
tivityAt25C_uSc
m
Electrical
Conductivity Conductivity
PH pH
negative logarithm of hydrogen ion
concentration in ph units
pH units WaterpH_pH
pH pH
pH, alkalinity,
acidity
16887-
00-6
16887-
00-6
concentration of chloride as Cl in
milligrams/litre
mg/L
mg/kg Chloride Chloride Chloride Anion
TDS TDS
the portion of total solids that passes
through filter and deemed to have been
dissolved in sample in milligrams/litre
mg/L Total Dissolved
Solids
Total
Dissolved
Solids Salinity
TOTALAL
KALINITY
ALKT
concentration in milligrams/litre CaCO3 of
titratable bases using a methyl-orange
endpoint of about pH 4.3
mg/L Total Alkalinity
(as CaCO3)
pH, alkalinity,
acidity
HARDNE
SS_CACO
3
HARD
the ability of water to precipitate soap and
is sum of calcium and magnesium
concentrations as milligrams/litre CaCO3
mg/L Hardness (as
CaCO3)
Hardness
(as calcium
carbonate)
Hardness (as
calcium
carbonate)
SAR SAR
ratio of sodium to magnesium and calcium
and used to assess risk of excess sodium in
irrigation water Ratio
Sodium
Adsorption
Ratio Salinity
3812-32-
6
ALKC
alkalinity ascribed to carbonate in
milligrams/litre CO3
mg/L
%MOL
Carbonate
Alkalinity (as
CaCO3) Carbonate
pH, alkalinity,
acidity
NITRATE
14797-
55-8
concentration of nitrate as N in
milligrams/litre
mg/L
mg/kg Nitrate
Nitrate and
Nitrite
Nitrate and
Nitrite Anion
7439-89-
6
7439-89-
6
concentration of iron as Fe in
milligrams/litre
mg/L
mg/kg
ug/L Iron Iron Metal Cation
Formalization: table – structure + mappings
Healthy Headwater - NGIS Terms
7. Formalization: RDFS/OWL add rich predicates
• Water Quality Vocabulary
Presentation title | Presenter name7 |
8. AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
Formalization: alignment with existing vocabularies
(Water Quality extension to QUDT)
QUDT
OP
9. AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
Formalization: link detailed model to SKOS
access using SKOS API
10. Other approaches: OWL Class per concept
• deep subsumption hierarchy:
SWEET, OBO
Presentation title | Presenter name10 |
• intersecting constraints:
CGI Lithology
11. Formalization challenge
• Sometimes formalized as OWL - usually as SKOS
(example? SWEET / GEMET?)
• Class vs individuals
(Example from QUDT?)
• Hybrid approaches exist – vocabulary as individuals of classes from
an ontology but aligned with SKOS
(Example from OP?)
• https://www.seegrid.csiro.au/wiki/Siss/VocabularyFormalizationIn
SKOS
Presentation title | Presenter name11 |
12. Collections
skos:Collection –skos:member skos:Concept|skos:Collection
• A new collection can claim existing concepts as members
• Nested collections
skos:Concept –skos:inscheme skos:ConceptScheme
• Concepts assert their own membership
• No nesting
owl:Ontology
• No membership predicate
– rdfs:member? dct:hasPart?
void:Dataset, ldp:Container, reg:Register
Presentation title | Presenter name12 |
13. Re-use: new collections from old
– clone, or leave alone
Presentation title | Presenter name13 |
• eReefs WQ vocabulary includes
a subset of 330+ chemicals
from 36000+ in ChEBI
• New resources in local
namespace
• SKOS *Match predicate gives
provenance, link to more detail
14. Clone or leave alone?
• Question of caching content vs federating queries/discovery of
content
• Consider CHEBI – big
• Cache or just link to its definitions?
• Tradeoff between performance and convenience vs updating and synchronize
• LDR allows registration of external resources
• New register = subset or combination of terms already published elsewhere?
Presentation title | Presenter name14 |
15. URI Patterns – opaque?
What does the URL path imply?
http://vocab.nerc.ac.uk/collection/G04/current/008/
G04 ISO RoleCode, 008 Principal Investigator
http://resource.geosciml.org/classifier/ics/ischart/Pliocene
= Pliocene, URI supplied by GeoSciML,
definition sourced from International Commission for Stratigraphy (ics),
in the collection known as ‘International Stratigraphic Chart’ (ischart)
Semantics? Management? Set-membership?
Presentation title | Presenter name15 |
17. Versioning - 2
Are these the same thing? How can we tell? How can a machine tell?
http://sweet.jpl.nasa.gov/1.1/time.owl#PLEISTOCENE
http://sweet.jpl.nasa.gov/2.0/timeGeologic.owl#Pleistocene
http://sweet.jpl.nasa.gov/2.2/stateTimeGeologic.owl#Pleistocene
http://sweet.jpl.nasa.gov/2.3/stateTimeGeologic.owl#Pleistocene
Compare with
http://resource.geosciml.org/classifier/ics/ischart/Pliocene
– URI for the concept
http://def.seegrid.csiro.au/sissvoc/isc2014/resource.html
?uri=http://resource.geosciml.org/classifier/ics/ischart/Pliocene
– URI for a description of the concept (i.e. record),
according to the 2014 version of the service
Care with version number in URI!
Presentation title | Presenter name17 |
18. Versioning - 3
• Version info in item?
http://vocab.nerc.ac.uk/collection/G04/current/008/ a skos:Concept ;
skos:prefLabel ”principalInvestigator” ;
owl:versionInfo “1” ;
dc:date “2012-07-04 10:56:53.0” .
Presentation title | Presenter name18 |
• Version info
in registration record?
19. Versioning
• How do we manage versions of definitions?
• Do we version a definition of an abstract concept?
• Does the definition of the concept change or does our understanding
change?
• Version the set or individual items?
Presentation title | Presenter name19 |
20. Distribution
• Vocabulary packaged in a file or page
http://resource.geosciml.org/vocabulary/timescale/isc2014.ttl
http://resource.geosciml.org/vocabulary/timescale/isc2014.html
• Dereference the URI for a resource in the vocabulary
http://resource.geosciml.org/classifier/ics/ischart/ (all)
http://resource.geosciml.org/classifier/ics/ischart/Cambrian
• SPARQL endpoint
http://resource.geosciml.org/sparql/isc2014
• Vocabulary service
http://def.seegrid.csiro.au/sissvoc/isc2014/collection
Presentation title | Presenter name20 |
21. Semantic web tech to publish vocabularies
• SISSVoc
Presentation title | Presenter name21 |
22. Mappings
• Embed in vocabulary vs. store separately?
Presentation title | Presenter name22 |
23. Mapping challenge
• Linking between ontologies – which to use? All or some?
• SKOS relations - exactMatch, closeMatch, narrowMatch, broadMatch
• OWL predicates - sameAs for individuals, equivalentClass for classes and
equivalentProperty for properties
• Dublin core
• Prov-O
• VoID
• VOAF
• Linking between classes and individuals in OWL – logics-based reasoning
support
Presentation title | Presenter name23 |
26. Standards…
• The standard ISO 8601 concerns dates, a common type of
information used for data and documentation.
• March 5, 2014
• 2014-03-05
• 3/5/14
• 05/03/2014
• 5 Mar 2014
• Multiple representations but essentially one meaning
Source: http://dataabinitio.com/?p=449
Presentation title | Presenter name26 |
27. Challenges still exist
• Variation of formalisation and publication
• conceptualisation as both classes and individuals – pragmatic but
problematic
• URI patterns
• Versioning and keeping track
Presentation title | Presenter name27 |
29. Jonathan Yu
Research Software Engineer
Jonathan.Yu@csiro.au
Bruce Simons
SDI Modeller
Bruce.Simons@csiro.au
ADD BUSINESS UNIT/FLAGSHIP NAME
Thank you
Terms of use: Image sources from Wikipedia under CC2.0 licence
http://en.wikipedia.org/wiki/File:Amazing_Great_Barrier_Reef_1.jpg
Simon Cox
Research Scientist
Simon.Cox@csiro.au
http://ereefs.org.au/
Editor's Notes
This is the “think-piece”
Vocabulary formalization has become a lot more standardized and formal with the development of RDF/OWL/SKOS
After a long period of development, tooling has matured: Protégé, TopBraid, SPARQL, ELDA
But still a variety of patterns.
In this presentation we canvas some of the issues.
We provide some pointers, but few firm conclusions at this stage.
N.B. some interactions between these concerns: e.g.
Containers and URI patterns
Versioning and URI patterns
Re-use and containers and URI patterns
Distribution and URI patterns
Classic ‘glossary’ is a list of terms and (textual) definitions
A technical vocabulary often has multiple attributes for each entry.
In this example, most of the columns are for alternative identifiers, but also includes units of measure, groups (broader generalizations), i.e. the definition has structure semantics
For existing vocabularies, SKOS is an RDF vocabulary that provides a gentle on-ramp into semantic technologies.
Vocabulary entry is a ‘Concept’ denoted by a URI
‘Term’ is a label for the concept.
Concept may have multiple labels (multi-lingual, preferred vs. alternate)
Hierarchical relationships (broader, narrower) are supported directly
UI, TTL, broader-hierarchy
In a dataset, SKOS individual from vocabulary is the value of a specific ObjectProperty on a data item
If there is a richer model for a vocabulary, then this can be modelled as a specific ontology.
Vocabulary entries are then realised as instances or members of the classes from the ontology
In cases where there is a pre-existing accepted base ontology, the specific ontology can be aligned, so that its classes are in subsumption hierarchy, reusing the external classes
Specific classes (and properties) can be sub-classes of skos:Concept (or sub-properties of skos:related).
This allows a vocabulary to have specialized classes and predicates, but also be accessible through a SKOS API.
It is also possible to model a vocabulary as classes.
This supports slightly different use-cases.
To support DL reasoning, the link from a dataset to an item in a vocabulary of OWL classes is different to SKOS concepts:
In a dataset using a vocabulary modeled this way, the OWL Class from vocabulary is the value of rdf:type on a data item
Various ways to create and maintain an RDF resource corresponding with a ‘vocabulary’.
SKOS provides Collection and Concept
Collection has members: new collections can be composed of old members
ConceptScheme has topConcepts: supports traversal of a hierarchy, but not really ‘membership’
Other container classes:
owl:Ontology – for classes and properties, but not defined very formally
void:Dataset – for triples
ldp:Container – from W3C Linked Data Platform
reg:Register – from UK Registry
Does a
Common use-case is partial re-use of a collection(s) already defined somewhere:
How to manage a subset?
clone/local copy
New collection with membership by-reference
URIs do not have to be opaque. But too much structure can tie you down:
URI structure is a path, so if bind semantics to URI, then
Ontological commitment by choice of identifiers (too early?)
Privileges one classification hierarchy (path only supports monohierarchy)
Path should only be used to delegate management & ensure uniqueness
Path = register or set
Register has a specific delegation or ownership associated.
URI management arrangements/set membership when first registered
SWEET uses different URIs for the same thing in different versions;
There are no mappings inside SWEET data to link to prior versions!
Embedding versioning information (or any other lifecycle metadata) in the item itself is probably sub-optimal
LDR records lifecycle information (including version information) in a separate registration record.
Vocabulary should be distributed in multiple ways to suite different users.
URI patterns should be predictable.
URIs should be memorable if possible (words rather than numbers).