ISO 19157 provides a powerful framework for describing quality of Earth science datasets. As NASA migrates towards using that standard, it is important to understand whether and how existing data quality content fits into the ISO 19157 model. This talk demonstrates that fit and concludes that ISO 19157 can include all existing content and also includes new capabilities that can be very useful for all kinds of NASA data users.
Can ISO 19157 support current NASA data quality metadata?
1. ISO 19157 – A Unified Metadata Model for
Data Quality
Ted Habermann
Director of Earth Science
The HDF Group
thabermann@hdfgroup.org
1
The Global Change Master Directory
(GCMD) and the EOS Clearinghouse
(ECHO) both include data quality
information?
Can this information be mapped to
ISO 19157?
GCMD ECHO
ISO
?Presented May 21, 2014 at the Earth Science
Information Partnership Information Quality Cluster
2. Metadata in Multiple Dialects
Documentation
Repository
ISO 19115-1,
19115-2, 19119
19157 and
extensions
THREDDS
HDF, netCDF
(NcML)
FGDC,
Data.Gov
SensorML
WCS, WMS,
WFS, SOS
Open
Provenance
Model, PROV
DIF, ECS,
ECHO
KML
3. GCMD Data Quality
3
The <Quality> field allows the author to
provide information about the quality of
the data or any quality assurance
procedures followed in producing the
data described in the metadata.
4. GCMD Quality Examples
4
<Quality>
Due to the lack of high resolution data available over the region for
1993-94, it has been hard to validate the product. However the maps of
burnt areas correspond well with active fire maps for the
region. Where large [>3km] scars are found, the detection is more
reliable. In areas of small scars more problems are involved. It is
hoped that the 1994-95 data set will cover the whole of the study area
and be calibrated by high resolution data.
</Quality> <Quality>
QA performed by CDIAC One of the roles of the Carbon Dioxide
Information Analysis Center (CDIAC) is quality assurance (QA) of
data. The QA process is an important component of the value-added
concept of assuring accurate, usable information for researchers,
because data received by CDIAC are rarely in condition for immediate
distribution, regardless of source.
</Quality> <Quality>
The fire training-set may also have been biased against savanna and
savanna woodland fires since their detection is more difficult than in
humid, forst environments with cool background temperatures
[Malingreau, 1990]. There may, therefore, be an under-sampling of
fires in these warmer background environments.
</Quality> <Quality>
Note that Data File 12, Report #2, TASK 2 (Auclair et al., 1994a) is a
Quality Assurance and Quality Control chapter for the areas of Canada,
Alaska, United States (48 states), with range estimates of validation
and error, a listing of discussions with experts in the field and a
review of the draft of data files.
</Quality>
5. ISO Data Quality Results
DQ_ConformanceResult
+ specification : CI_Citation
+ explanation : CharacterString
+ pass : Boolean
DQ_CoverageResultDQ_DescriptiveResult
+ statement : CharacterString
<<Abstract>>
DQ_Result
+ dateTime [0..1] : DateTime
+ resultScope [0..1] : DQ_Scope
DQ_QuantitativeResult
+ valueType [0..1] : RecordType
+ valueUnit : UnitOfMeasure
+ value [1..*] : Record
ISO 19157 includes four kinds of quality results.
It includes a new kind of report (DQ_DescriptiveResult) that includes a simple text
description of the result of the quality test.
6. ISO Data Quality Results
DQ_ConformanceResult
+ specification : CI_Citation
+ explanation : CharacterString
+ pass : Boolean
DQ_CoverageResultDQ_DescriptiveResult
+ statement : CharacterString
<<Abstract>>
DQ_Result
+ dateTime [0..1] : DateTime
+ resultScope [0..1] : DQ_Scope
DQ_QuantitativeResult
+ valueType [0..1] : RecordType
+ valueUnit : UnitOfMeasure
+ value [1..*] : Record
<Quality>
QA performed by CDIAC One of the roles of the Carbon Dioxide
Information Analysis Center (CDIAC) is quality assurance (QA) of
data. The QA process is an important component of the value-added
concept of assuring accurate, usable information for researchers,
because data received by CDIAC are rarely in condition for immediate
distribution, regardless of source.
</Quality>
The DQ_DescriptiveResult
matches the GCMD quality
element exactly.
7. GCMD Quality Examples
7
<Quality>
Due to the lack of high resolution data available over the region for
1993-94, it has been hard to validate the product. However the maps of
burnt areas cor- respond well with active fire maps for the
region. Where large [>3km] scars are found, the detection is more
reliable. In areas of small scars more problems are involved. It is
hoped that the 1994-95 data set will cover the whole of the study area
and be calibrated by high resolution data.
</Quality> <Quality>
QA performed by CDIAC One of the roles of the Carbon Dioxide
Information Analysis Center (CDIAC) is quality assurance (QA) of
data. The QA process is an important component of the value-added
concept of assuring accurate, usable information for researchers,
because data received by CDIAC are rarely in condition for immediate
distribution, regardless of source.
</Quality> <Quality>
The fire training-set may also have been biased against savanna and
savanna woodland fires since their detection is more difficult than in
humid, forst environments with cool background temperatures
[Malingreau, 1990]. There may, therefore, be an under-sampling of
fires in these warmer background environments.
</Quality> <Quality>
Note that Data File 12, Report #2, TASK 2 (Auclair et al., 1994a) is a
Quality Assurance and Quality Control chapter for the areas of Canada,
Alaska, United States (48 states), with range estimates of validation
and error, a listing of discussions with experts in the field and a
review of the draft of data files.
</Quality>
8. Standalone Quality Reports
DQ_Element
+ standaloneQualityReportDetails [0..1] : CharacterString
DQ_DataQuality
+ scope : DQ_Scope
+ report 1..*
DQ_StandaloneQualityReportInformation
+ reportReference: CI_Citation
+ abstract : CharacterString
+ standaloneQualityReport 0..1
ISO 19157 acknowledges that important quality information can exist in standalone
reports that may not fit easily into the ISO conceptual model. These standalone
reports are cited with abstracts that summarize the results for the user without
neccessitating access to the cited report.
10. ECHO Quality Information
ECHO includes two kinds of quality information
QAStats – Standard measures for all products
QAPercentMissingData - Granule level % missing data. This attribute can be
repeated for individual parameters within a granule.
QAPercentOutOfBoundsData – Granule level % out of bounds data. This attribute
can be repeated for individual parameters within a granule.
QAPercentInterpolatedData – Granule level % interpolated data. This attribute
can be repeated for individual parameters within a granule.
QAPercentCloudCover – This attribute is used to characterize the cloud cover
amount of a granule. This attribute may be repeated for individual parameters
within a granule. (Note - there may be more than one way to define a cloud or it's
effects within a product containing several parameters; i.e. this attribute may be
parameter specific)
11. ECHO Quality Information
ECHO includes two kinds of quality information
QAFlags – Classes of quality measures with product specific implementations
AutomaticQualityFlag – The granule level flag applying generally to the granule and specifically to
parameters the granule level. When applied to parameter, the flag refers to the quality of that parameter
for the granule (as applicable). The parameters determining whether the flag is set are defined by the
developer and documented in the Quality Flag Explanation.
AutomaticQualityFlagExplanation – A text explanation of the criteria used to set automatic quality flag,
including thresholds or other criteria.
OperationalQualityFlag – The granule level flag applying both generally to a granule and specifically to
parameters at the granule level. When applied to parameter, the flag refers to the quality of that
parameter for the granule (as applicable). The parameters determining whether the flag is set are defined
by the developers and documented in the Operational Quality Flag Explanation.
OperationalQualityFlagExplanation – A text explanation of the criteria used to set operational quality
flag; including thresholds or other criteria.
ScienceQualityFlag – Granule level flag applying to a granule, and specifically to parameters. When
applied to parameter, the flag refers to the quality of that parameter for the granule (as applicable). The
parameters determining whether the flag is set are defined by the developers and documented in the
Science Quality Flag Explanation.
ScienceQualityFlagExplanation – A text explanation of the criteria used to set science quality flag;
including thresholds or other criteria.
12. ECHO Data Quality Measures
DQ_MeasureReference
+ measureIdentification [0..1] : MD_Identifier
+ nameOfMeasure [0..*] : CharacterString
+ measureDescription [0..1] : CharacterString
if measureIdentification is not provided,
then nameOfMeasure shall be provided
+ measure 0..1
ISO 19157 data quality measure references identify measures in several ways and
provides a brief description of the measure.
<<Abstract>>
DQ_Element
+ measure [0..*] : DQ_MeasureReference
+ evaluationMethod [0..1] : DQ_EvaluationMethod
+ result [0..1] : DQ_Result
QAPercentMissingData
QAPercentOutOfBoundsData
QAPercentInterpolatedData
QAPercentCloudCover
AutomaticQualityFlag
OperationalQualityFlag
ScienceQualityFlag
AutomaticQualityFlagExplanation
OperationalQualityFlagExplanation
ScienceQualityFlagExplanation
13. Data Quality Measures
DQM_Measure
+ measureIdentifier : MD_Identifier
+ name : CharacterString
+ alias [0..*] : CharacterString
+ sourceReference [0..*] : CI_Citation
+ elementName [1..*] : TypeName
+ definition : CharacterString
+ description [0..1] : DQM_Description
+ valueType : TypeName
+ valueStructure [0..1] : DQM_ValueStructure
+ example [0..*] : DQM_Description
DQM_BasicMeasure
+ name : CharacterString
+ definition : CharacterString
+ example : DQM_Description [0..1]
+ valueType : TypeName
<<CodeList>>
DQ_ValueStructure
+ bag
+ set
+ sequence
+ table
+ matrix
+ coverage
DQM_Parameter
+ name : CharacterString
+ definition : CharacterString
+ description: DQM_Description [0..1]
+ valueType : TypeName
+ valueStructure [0..1] : DQM_ValueStructure
DQM_Description
+ textDescription: CharacterString
+ extendedDescription [0..1] : MD_BrowseGraphic
DQ_MeasureReference
+ measureIdentification [0..1] : MD_Identifier
+ nameOfMeasure [0..*] : CharacterString
+ measureDescription [0..1] : CharacterString
if measureIdentification is not provided,
then nameOfMeasure shall be provided
+ measure 0..1
ISO 19157 data quality measure properties are in a
separate object and provide a much more complete
description of the measures.
<<Abstract>>
DQ_Element
+ measure [0..*] : DQ_MeasureReference
+ evaluationMethod [0..1] : DQ_EvaluationMethod
+ result [0..1] : DQ_Result
14. ECHO Quality Measure Descriptions
14
AutomaticQualityFlagExplanation
(2289)
66% parameter is produced correctly
7% No automatic quality assessment is performed in the PGE
5%
Based on percentage of product that is good. Suspect used where true
quality is not known.
4% Automatic quality determination software not yet implemented
2% QA flag explanation
ScienceQualityFlagExplanation
(346)
OperationalQualityFlagExplanation
(238)
35% Passed
14%
Passed,parameter passed the specified operational test. Inferred Pass,parameter
terminated with warnings. Failed parameter terminated with fatal errors.
11% Not Investigated
10%
Passed,parameter passed the specified science test. Inferred Pass,parameter
terminated with warnings for specified science test. Failed parameter terminated
with fatal errors for specified science test.
10%
See http://landweb.nascom.nasa.gov/cgi-
bin/QA_WWW/qaFlagPage.cgi?sat=aqua for the product Science Quality status.
8% Validated, see http://disc.gsfc.nasa.gov/Aura/MLS/ for quality document
6%
See http://landweb.nascom.nasa.gov/cgi-
bin/QA_WWW/qaFlagPage.cgi?sat=terra for the product Science Quality status.
5%
An updated science quality flag and explanation is put in the product .met file
when a granule has been evaluated. The flag value in this file, Not Investigated, is
an automatic default that is put into every granule during production.
15. ISO Data Quality Results
DQ_ConformanceResult
+ specification : CI_Citation
+ explanation : CharacterString
+ pass : Boolean
DQ_CoverageResultDQ_DescriptiveResult
+ statement : CharacterString
<<Abstract>>
DQ_Result
+ dateTime [0..1] : DateTime
+ resultScope [0..1] : DQ_Scope
DQ_QuantitativeResult
+ valueType [0..1] : RecordType
+ valueUnit : UnitOfMeasure
+ value [1..*] : Record
Passed
Suspect
Failed
Not Investigated
Inferred Passed
Being Investigated
18. Acknowledgements
This work was partially supported by contract number NNG10HP02C from NASA.
Any opinions, findings, conclusions, or recommendations expressed in this material are
those of the author and do not necessarily reflect the views of NASA or The HDF Group.