An introduction with high-level background information on scientific data stewardship maturity matrix.
What's new in this version: updated reference list for maturity assessment models and applications.
As SlideShare has disabled the re-upload feature, the latest version will be maintained at:
https://figshare.com/articles/Scientific_Data_Stewardship_Maturity_Matrix/1150243
Introduction to Scientific Data Stewardship Maturity Matrix
1. Introduction to Scientific Data
Stewardship Maturity Matrix
Ge Peng
Cooperative Institute for Climate and Satellite – North Carolina (CICS-NC), NC State University
and NOAA’s National Centers for Environmental Information – NC (NCEI-NC)
(Formerly known as NOAA’s National Climatic Data Center (NCDC))
A Unified Framework for Measuring Stewardship Practices
Applied to Digital Environmental Datasets
In Collaboration with
Jeff Privette, Ed Kearns, Nancy Ritchey, and Steve Ansari
NCEI-NC/NOAA
Version: 09/15/2016 r2
2. • What is scientific data stewardship? What does it mean?
• Why should we care?
• Why do we need a data stewardship maturity matrix (DSMM)?
• Where are we now?
• What is the NCEI/ICS-NC Scientific Data Stewardship Maturity Matrix?
• How did we get to where we are?
• Who could use the DSMM? What are the ways to use the DSMM?
• Putting maturity assessment into perspective
• What to do next?
In This Presentation
An overview of the scientific data stewardship maturity
assessment model with high-level background
information on
3. What Is Scientific Data Stewardship?
Data Quality
Screening/
Assurance/
Control/
Evaluation/
Assessment/
Monitoring
Activities to ensure or improve the quality and usability
of geosciences data and products
• Activities to preserve or improve the information content,
accessibility, and usability of environmental data and
metadata (National Research Council, 2007)
To Ensure Data Are
• always meaningful
• trustworthy
• Common data
format
• Spatial &
temporal
characteristics
• Uncertainty
estimates
4. What Does
Scientific Data Stewardship Mean?
Ensure your data are
preserved and secure
available, discoverable, and accessible
credible and understandable
usable and useful
sustainable and extendable
citable and traceable
Version: 20141017 Rev. 2.2 POC: gpeng@cicsnc.org
5. Why Should We Care?
Quality of data and what being done with/to data matter!
Knowing stewardship maturity is essential in making informed,
actionable, and efficient data management decisions!
6. Problem: Most of data centers currently cannot readily convey - or even assess –
the level of stewardship practices for its stakeholders or customers. No community
scorecard exists.
Hypothetic questions to a data center:
1. Congress: Are your datasets compliant with the U.S. Data Quality Act? If not, then what?
2. Business: Is your product credible? Readily accessible with common data format?
Sustainable?
3. Modelers: Is the quality of a routinely updated product being assessed?
Solution: Define a Stewardship Maturity Matrix to assess stewardship practices
applied to individual data products
Why Do We Need a Data Stewardship Maturity Matrix?
This is a vulnerability – and an opportunity!
The value and quality of a data set depends – in part – on the
stewardship practices applied after its production.
7. Where Are We Now?
• A stewardship maturity matrix for individual digital
environmental datasets – baselined
• A paper – published by a peer-reviewed journal
with free online access
(Peng et al., 2015: doi:10.2481/dsj.14-049)
8. What Is the NCEI/CICS-NC Scientific Data
Stewardship Maturity Matrix (DSMM)?
A Unified Framework for
Measuring Stewardship Practices Applied to
Individual Digital Earth Sciences Data Products
That Are Publicly Available Online
Leveraging Institutional Knowledge and Community Best Practices and Standards
9. DSMM Defines Measureable, Five-Level Progressive Practices
in Nine Quasi-Independent Key Components
(Data system integrity is also very important but not included in the matrix due to potential security risks to the system.)
10. The Scope of Stewardship Practices
• Those applied to individual datasets – measureable and progressive
• Those associated with the functional entities of the Open Archival
Information System (OAIS) (within the shaded box in the diagram below)
CCSDS (2012) Version: 650x0m2-2012
11. How Did We Get here?
Policies Processes Tasks
Procedures
/Standards
•U.S. laws
•Agencies’
guidelines
•Experts’
recommendation
•Research to
operations
•Data/metadata
management
•Data application
•Data
preservation
•Data governance
•Data provenance
•Data quality
assessment
•Evaluate product
•Verify file
checksum
•Create metadata
•Monitor data
quality
Non-Functional
Requirements
Functional
Core Areas
Community
Practices
Key Matrix
Components
• Relevant
• Measurable
• Progressive
• Quasi-Independent
Pathway to Identify Key Components and
Define Levels of Stewardship Maturity Matrix
12. DSMM Follows CMMI level Structure
Level 1
Ad Hoc
Not Managed
Level 2
Minimal
Limit Managed
Level 3
Intermediate/Managed
Community Good Practices
Level 4
Advanced/Well Managed
Community Best Practices
Level 5
Optimal/Well Managed
Measured, Controlled, Audit
Reference Maturity Level Structure
• Capability Maturity Model Integration (CMMI)
• Levels of Maturity of Digital repository
Recommended level for
online operational products
stewarded by
National Data Centers
13. Overarching Goals
• General
• Simple
• Concise
Assess & Convey & Path Forward
Not to Reinvent Wheels
Leveraging
• NCEI Subject Matter Experts (SMEs)
(institutional knowledge)
• Community accepted good and best
practices and standards
• SMEs from national and international
communities
14. Who Could Use The Matrix?
• Data providers and scientific stewards
to evaluate and improve the quality and usability of their products against community
best practices
• Modelers, decision-support system users, and scientists
to improve their products and uncertainty estimates
to make investment and use decision
• Data managers/stewards of data centers and repositories
to validate their compliance or lack of to community accepted stewardship practice or
standards
to assess the current state
to create a roadmap forward to improve or enhance its stewardship maturity of
practices applied to a certain product or all its holdings
• General data users
to make an educated choice on selecting or utilizing a dataset
15. Ways to Utilize DSMM & Assessment Results
• To know the current state of your
dataset(s) – maturity assessment
(stewardship maturity scoreboard)
• To know where you want or need
to be – stewardship requirements
• To know how to get there –
roadmap forward (informed,
actionable steps)
• A reference model for stewardship planning and resource allocation –
informed decision-making support
• A consolidate source and transparency for information about stewardship
practices – assessment with detailed justifications
Current
Need to Be
Stewardship Maturity Scoreboard and Roadmap Forward
• Content-rich quality metadata – enhanced discoverability and usability
19. Communities Are Interested In This Subject!
Introduction to Stewardship Maturity Matrix
on slideshare.net
(http://tinyurl.com/DSMMintro)
• 1598 views globally since 1st upload in July 2014
Data Stewardship Maturity Matrix
on slideshare.net
(http://tinyurl.com/DSMMslide)
• 976 views globally since 1st upload in July 2014
(Based on view metrics provided by slideshare.net as of 9/15/2016)
DSMM Self-Assessment Template
on figshare.com
(http://tinyurl.com/DSMMtemplate)
• 465 downloads since 1st upload in February 2015
(Based on download metrics provided by figshare.com as of 9/15/2016)
(Based on view metrics provided by slideshare.net as of 9/15/2016)
20. What To Do Next?
• ESIP (The Federation of Earth Science Information Partners) Data Stewardship
Committee – ensure consistent application and implementation of DSMM
across agencies and potentially get the committee endorsement (e.g., Downs
et al., 2015)
• EUMETSAT – provide a common stewardship assessment framework between
NOAA and EUMETSAT satellite Climate Data Records (CDRs)
• OMB A-16 NGDA Portfolio lifecycle maturity assessment model working group
– potentially integrate DSMM into their portfolio assessment model
• Use case studies (NCEI, ESIP, NSIDC, NCAR, DataOne, CSIRO, etc.) – application
and refinement of DSMM & defining roles and responsibilities for assessment
(e.g., Ritchey and Peng, 2015; Hou et al., 2015, Peng et al., 2016b,c);
• Decision-support tools (NOAA OSD & TRIO, CICS-NC, NCEI) – assess, display,
and integrate content-rich quality information in a more systematic way (e.g.,
Austin and Peng, 2015; Ritchey et al., 2016; Zinn et al., 2017).
21. What Is Good
Scientific Data Stewardship?
Make it easier for users
to trust your data
to find your dataset(s)
to get your data files
To understand your data
to learn the quality of your data
to use your data
to integrate your data
Version: 20141017 Rev. 2.1 POC: gpeng@cicsnc.org
22. Acknowledgement
Benefit greatly from input and feedback from many
people at or affiliated with NCEI-NC and other data
centers and agencies
Appreciate support and guidance from NCEI-NC (formerly
known as NCDC), CICS-NC, CDR Program, RSAD, and
Product Branch management
23. *** NCEI-NC Informal Focus Groups ***
• Data Preservability
Nancy Ritchey
Ed Kearns
Drew Saunders
Jason Cooper
Ge Peng
• Data Accessibility/Usability
Steve Ansari
Drew Saunders
John Keck
John Stachniewicz
Philip Jones
Jay Morris
Louis Vasquez
Christina Lief
Jeff Privette
Ge Peng
• Data Integrity/Security
Scott Koger
Jason Symonds
David Bowman
Ryan Nelson
Steve Ansari
Ed Kearns
Ken Schmidt
Ge Peng
• Production Sustainability
Jeff Privette
Walter Jesse Glance
Ken Knapp
Tom Zhao
Ge Peng
• Data Quality
Jeff Privette
Richard Kauffold
Otis Brown
Ken Knapp
Bryant Cramer
Ed Kearns
Ge Peng
• Transparency/Traceability
Ana Privette
Drew Saunders
Ge Peng
• User Requirement
Sam McCown
Jeff Robel
Derek Arndt
Jenny Dissen
Ge Peng
24. We Would Like to Thank Them All!
Special THANKS to
Jeff Privette, Ed Kearns, Nancy Ritchey, Steve Ansari,
Ken Knapp, Drew Saunders, John Keck, Scott Koger,
John Bates, Otis Brown, Bryant Cramer, Richard Kauffold,
Linda Copley, Phil Jones, Daniel Wunder, Terry McPherson,
Dan Kowal, Ken Casey, Grace Peng, Ruth Duerr,
Donna Scott, Matthew Austin, Ana Privette,
NCEI – NC Metadata Working Group
25. Like to learn more? Could contribute?
contact us at gpeng@cicsnc.org or
Maturity.Matrix@gmail.com
register at http://goo.gl/kUW5Qq or
http://tinyurl.com/DSMMregister
26. Reference
Austin, M. and G. Peng, 2015: A Prototype for content-rich decision-making support in NOAA using data as an asset.
Poster: IN21A-1676. 2015 AGU Fall meeting, 14 – 18 December 2015, San Francisco, CA, USA.
Bates, J. J. and J.L. Privette, 2012: A maturity model for assessing the completeness of climate data records. EOS,
Transactions of the AGU, 44, 441.
CCSDS (The Consultative Committee for Space Data Systems), 2012: Reference Model for an Open Archival Information
System (OAIS), Recommended Practices, Issue 2. Version: CCSDS 650.0-M-2. 135 pp.
DAMA International, 2010: Guide to the Data Management Body of Knowledge (DAMA-DMBOK). Eds. Mosley, M.,
Brackett, M., & Earley, S., Technics Publications, LLC, New Jersey, USA. 2nd Print Edition. 406 pp.
Downs, R.R., R. Duerr, D.J. Hills, and H.K. Ramapriyan, 2015: Data Stewardship in the Earth Sciences. D-Lib Magazine,
21, doi: 10.1045/july2015-downs
EUMETSAT, 2013: CORE-CLIMAX Climate Data Record Assessment Instruction Manual. Version 2, 25 November 2013.
EUMETSAT, 2015: GAIA-CLIM Measurement Maturity Matrix Guidance: Gap Analysis for Integrated Atmospheric ECV
Climate Monitoring: Report on system of systems approach adopted and rationale. Version: 27 Nov 2015.
FGDC, 2016: National Geospatial Data Asset (NGDA) Lifecycle Maturity Assessment (LMA) 2015 Report - Analysis and
Recommendations. Version: 8 December 2016.
Hou, C.-Y., M. Mayermik, G. Peng, R. Duerr, and A. Rosati, 2015: Assessing formation quality: Use case studies for the
data stewardship maturity matrix. Poster: IN21A-1675. 2015 AGU Fall meeting, 14 – 18 December 2015, San
Francisco, CA, USA.
National Research Council, 2007: Environmental data management at NOAA: Archiving, stewardship, and access. 116
pp. The National Academies Press, Washington, D.C.
NCEI MM-Serv WG (Use/Service Maturity Matrix Working Group), 2017: A reference framework for assessing service
maturity of digital environmental datasets. Under development.
27. Reference – Cont.
Peng, G., J.L. Privette, E.J. Kearns, N.A. Ritchey, and S. Ansari, 2015: A unified framework for measuring stewardship
practices applied to digital environmental datasets. Data Science Journal, 13, 231 - 253. doi:
http://dx.doi.org/10.2481/dsj.14-049.
Peng, G., H. Ramapriyan, and D. F. Moroni, 2016a: The State of Building a Consistent Framework for Curation and
Presentation of Earth Science Data Quality. Poster: IN41C.1666, AGU 2016 Fall Meeting, 12 – 16 December 2016,
San Francisco, CA, USA.
Peng, G., N. A. Ritchey, K. S. Casey, E. J. Kearns, J. L. Privette, D. Saunders, P. Jones, T. Maycock, and S. Ansari, 2016b:
Scientific stewardship in the Open Data and Big Data era - Roles and responsibilities of stewards and other major
product stakeholders. D.-Lib Magazine. 22, doi:10.1045/may2016-peng.
Peng, G., J. Lawrimore, V. Toner, C. Lief, R. Baldwin, N. Ritchey, and D. Bringar, 2016c: Assessment of Stewardship
Maturity of the Global Historical Climatology Network-Monthly (GHCN-M) Dataset and Lessons Learned. D.-Lib
Magazine,22, doi:10.1045/nov2016-peng.
Ritchey, N. and G. Peng, 2015: Assessing stewardship maturity: use case study results and lessons learned. IN14A-05,
2015 AGU Fall meeting, 14 – 18 December 2015, San Francisco, CA, USA.
Ritchey, N.A., G. Peng, A. Milan, P. Lemieux, R. Partee, R. Lonin, and K.S. Casey, 2016: Practical Application of the Data
Stewardship Maturity Model for NOAA’s OneStop Project. IN42D-08. AGU 2016 Fall Meeting, 12 – 16 December
2016, San Francisco, CA, USA.
Zhou, L. H., M. Divakarla, and X. P. Liu, 2016: An Overview of the Joint Polar Satellite System (JPSS) Science Data
Product Calibration and Validation. Remote Sensing, 8(2). doi:10.3390/rs8020139
Zinn, S., J. Relph, G. Peng, A. Milan, and A. Rosenberg, 2017: Design and implementation of automation tools for
DSMM diagrams and reports. Invited Talk. ESIP 2017 Winter Meeting, 11 – 13 January 2017, Bethesda, MD, USA.
28. A self-assessment template using the latest DSMM is available at:
http://dx.doi.org/10.6084/m9.figshare.1211954