SlideShare une entreprise Scribd logo
1  sur  30
Sharing Data Through Guided Metadata
Improvement
Lindsay Powers and Ted Habermann - The HDF Group
Matthew Jones – National Center for Ecological
Analysis and Synthesis, University of California Santa
Barbara
DataONE webinar 1
To help scientific communities:
•Improve data discovery and access
•Increase data use and re-use
•Enhance understanding, especially across
domains
… by improving metadata completeness and
consistency.
2
DataON
E
webinar
Goals
Terminology
3
DataON
E
webinar
Concept : General term for describing a documentation entity
(e.g. Title, Revision Date, Process Step, Spatial Extent).
Spiral: A set of concepts required to support a particular
documentation need or use case.
Recommendation: A set of concepts that a group believes is
required for achieving a documentation goal.
Dialect : A particular form of the documentation language that is
specific to a community (e.g. DIF, CSDGM, EML, ECHO, custom).
Collection: A group of metadata records ideally in a machine-
readable format, commonly organized by a data center,
organization or project and often stored in a database or web
accessible folder.
DataONE Member Nodes (communities) using EML*
• Ecological Society of America (ESA)
• Global Lake Ecological Observatory Network (GLEON)
• Alaska Ocean Observing System (GOA)
• Montana Institute on Ecosystems (IOE)
• Knowledge Network for Biocomplexity (KNB)
• University of Kansas Biodiversity Institute (KUBI)
• Long-term Ecological Research Network (LTER)
• European Long-term Ecosystem Research Network (LTER_EUROPE)
• University of California / DataONE (ONEShare)
• Terrestrial Environmental Research Network (TERN)
• Taiwan Forestry Research Institute (TFRI)
• National Phenology Network (USANPN)
4DataONE webinar
* Ecological Metadata Language
DataONE Member Nodes using CSDGM*
• California Digital Libraries (CDL)
• USGS Core Sciences Clearinghouse (USGSCSAS)
• Earth Data Analysis Center (EDACGSTORE)
• Environmental Data for the Oak Ridge Area (EDORA)
• Oak Ridge National Lab Distributed Active Archive Center
(ORNLDAAC)
• Regional and Global Biogeochemical Dynamics Data (RGD)
• Sustainable Environment Actionable Data (SEAD)
• New Mexico Experimental Program to Stimulate Comptetitive
Research (NMEPSCOR)
5DataONE webinar
* Content Standard for Digital Geospatial Metadata
Questions
• Can community developed metadata
recommendations help improve metadata content
within a particular community?
• Can community developed metadata
recommendations help improve metadata content
among different communities?
• Can metadata recommendations developed in a
specific dialect be used to help improve metadata in
other dialects? Can they facilitate communication?
DataONE webinar 6
Communities, dialects and recommendations
DataONE webinar 7
LTER
KNB
GLEON
GOA
LTER-EU
TERN
USANPN
ESA
KUBI
CommunicationBarriers
(Dialects)
CDL
USGSCSAS
EDAC
EDORA
ORNLDAAC
RGD
SEAD
NMEPSCoR
Recommendations
LTER
FGDC
etc.EML CSDGM
Access
Discovery Identification Evaluation
Integration
LTER Metadata Recommendations
LTER developed a suite of metadata recommendations
based on community requirements…
•Did these recommendations make metadata more
complete in comparison to other entities?
•Does LTER metadata practice provide a good example
for other communities?
DataONE webinar 8
LTER Recommendations (Spirals)
DataONE webinar 9
Discovery
Geographic Coverage
Taxonomic Coverage
Temporal Coverage
Maintenance
Discovery
Geographic Coverage
Taxonomic Coverage
Temporal Coverage
Maintenance
Identification
Contact name and organization
Creator name and organization
Abstract
Publication info
Keywords
Resource identifier and title
Identification
Contact name and organization
Creator name and organization
Abstract
Publication info
Keywords
Resource identifier and title
Evaluation
Attribute definition
Entity type
Process step
Project description
Resource use
Constraints
Evaluation
Attribute definition
Entity type
Process step
Project description
Resource use
Constraints
Methods
• Randomly sampled up to 250 records from each EML
or CSDGM metadata collection (Member Node)
• Mapped dialect to LTER Recommendation concepts
• Analyzed collections for completeness in relation to
recommendations
• Compared collections to identify shining examples
10DataONE webinar
11
LTER Recommendations and EML Collections
DataONE webinar
Can community developed metadata recommendations help improve metadata
content within a particular community? Among different communities using the
same dialect?
Improving MD across communities
DataONE webinar 12
EML Concepts/
Recommendation ESA GLEON GOA IOE KNB LTER LTER_EU
ONEShar
e TERN TFRI USANPN KUBI
Resource Identifier 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%
Resource Title 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%
Author / Originator 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%
Metadata Contact 100% 58% 0% 0% 68% 98% 84% 0% 0% 0% 0% 0%
Contributor Name 100% 42% 95% 0% 74% 1% 0% 0% 0% 53% 100% 0%
Publisher 0% 25% 0% 0% 0% 100% 0% 94% 100% 0% 0% 0%
Publication Date 100% 50% 0% 0% 68% 100% 69% 100% 0% 0% 0% 0%
Resource Contact 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%
Abstract 100% 92% 100% 100% 97% 100% 88% 98% 100% 100% 100% 0%
Keyword 80% 75% 100% 96% 87% 100% 100% 100% 100% 100% 100% 100%
Resource Distribution 100% 100% 97% 100% 87% 100% 100% 94% 100% 100% 100% 100%
Taxonomic Extent 100% 0% 77% 8% 35% 0% 21% 0% 100% 12% 0% 0%
Spatial Extent 100% 92% 94% 100% 90% 100% 48% 97% 100% 100% 100% 100%
Temporal Extent 100% 92% 94% 4% 87% 100% 98% 94% 100% 35% 100% 100%
Maintenance 0% 25% 0% 0% 0% 99% 0% 0% 0% 0% 0% 0%
Resource Use Constraints 100% 92% 100% 100% 94% 99% 89% 88% 0% 82% 100% 0%
Process Step 80% 67% 94% 0% 68% 100% 100% 0% 100% 88% 100% 0%
Project Description 0% 33% 95% 8% 13% 1% 0% 94% 100% 0% 0% 0%
Entity Type Definition 0% 75% 79% 8% 16% 2% 0% 95% 0% 24% 100% 0%
Attribute Definition 0% 83% 84% 29% 23% 3% 0% 95% 0% 100% 100% 0%
LTER recommendations and CSDGM
DataONE webinar 13
Can metadata recommendations developed in a specific dialect be used to help
improve metadata in other dialects?
Recommendations across dialects
DataONE webinar 14
CSDGM Concepts CDL USGSCSAS EDAC EDORA ORNLDAAC RGD SEAD NMEPSCOR
Resource Identifier -100% -100% -100% -100% -100% -100% -100% -100%
Resource Title 100% 100% 100% 100% 100% 100% 100% 100%
Author / Originator 100% 100% 100% 100% 100% 100% 100% 100%
Metadata Contact 100% 100% 100% 100% 100% 100% 100% 100%
Contributor Name 100% 100% 100% 100% 100% 100% 100% 100%
Publisher 100% 26% 1% 0% 0% 0% 67% 0%
Publication Date 100% 100% 100% 0% 0% 0% 100% 100%
Resource Contact 100% 80% 100% 100% 100% 100% 67% 100%
Abstract 100% 100% 100% 100% 100% 100% 100% 100%
Keyword 100% 100% 100% 100% 100% 100% 100% 100%
Resource Distribution 0% 100% 100% 100% 100% 100% 67% 100%
Taxonomic Extent -100% -100% -100% -100% -100% -100% -100% -100%
Spatial Extent 100% 100% 100% 100% 100% 100% 100% 100%
Temporal Extent 0% 36% 95% 100% 100% 100% 89% 57%
Maintenance 100% 100% 100% 100% 100% 100% 100% 100%
Resource Use
Constraints 100% 100% 100% 0% 0% 0% 100% 100%
Process Step 0% 0% 0% 0% 0% 0% 0% 0%
Project Description -100% -100% -100% -100% -100% -100% -100% -100%
Entity Type Definition 100% 98% 81% 0% 0% 0% 0% 100%
Attribute Definition 100% 98% 81% 0% 0% 0% 0% 100%
15DataONE webinar
Recommendations bridging communities
Can metadata recommendations developed in a specific dialect facilitate
Communication across dialects?
Some answers…
• Have the LTER recommendations improved LTER metadata completeness?
− LTER collections are above average in complete concepts
− Room for improvement in completeness of Evaluation concepts
− Many LTER concepts are nearly complete, small effort to complete
• Can LTER recommendations help other communities?
− We believe so, there seems to be a lot of alignment of MD priority
concepts.
− What strategies might be useful to communities to help improve
completeness?
• Can metadata recommendations developed in a specific dialect be used to
help improve metadata in other dialects?
− CSDGM dialect contains most of the concepts found in the LTER
recommendation spirals, and therefor these recommendations can be easily
applied to CSDGM collections
− CSDGM collections are complete with respect to most of the LTER recommended
concepts
DataONE webinar 16
Guidance Documentation
17
Documentation
Metadata
Sharable Metadata
data.ucar.edu
http://wiki.esipfed.org/index.php/Category:Documentation_Connections
DataONE webinar
DataONE webinar 18
EOL
CISL
ACOM
CGD
Unidata
HAO
RAL MMM IIS
The UCAR Labs and Dialects
DataCite
MODS
EOL
RDA-CISL
ISO 19115-1
CGD
ncML
Recommendations Comparison
What recommendation fits our science?
19DataONE webinar
DataCite
DataCite
ISO
ISO
DataONE Repositories
February 9, 2016 DataONE webinar 20
• Large
communities
• Diverse metadata
• Diverse data
MetaDIG Tools and services
• Metadata Improvement and Guidance (MetaDIG):
− Individual researchers (producers)
• At record level, during submission
− Data repositories
• At collection level
− Individual researchers (consumers)
• At record level, for re-use
February 9, 2016 DataONE webinar 21
Automation
• Automate:
− Metadata Completeness
• against recommendations
− Metadata and Data Congruency
− Metadata Effectiveness
• Semantics, therefore much harder
February 9, 2016 DataONE webinar 22
Metadata Quality Service
February 9, 2016 DataONE webinar 23
EML Congruency Checker
• Starting point:
− LTER tool for Ecological Metadata Language
− Standard, extensible report format
− Suite of developed checks
• Expand to support:
− Multiple metadata standards
− Multiple recommendations
February 9, 2016 DataONE webinar 24
valid
Extensible quality checks
Check# Check Name Check Type
M1 Descriptive Title Title exists, > 7 words Metadata
M2 Unique Attribute Names Attribute names unique Metadata
M3 Valid Units Units assigned from controlled
vocabulary
Metadata
M4 Schema valid Metadata validates Metadata
C1 Checksum matches Data checksums match
metadata
Congruency
C2 Data links live All URLs return data Congruency
D1 Duplicate data rows Count duplicate rows Data
…
February 9, 2016 DataONE webinar 25
• Checks in Java, R, Python
• Categorized by function (discovery, re-use, …)
• Operate across dialects (EML, CSDGM, ISO19139)
Recommendations
• Checks: like unit tests for recommendations
• Community Recommendations
− Group of quality checks
− Can be created by any community
− Can include standard or custom checks
− Checks: access both metadata and data
February 9, 2016 DataONE webinar 26
Recommendation Checks
LTER Best Practice M1, M2, C2, C3, D3, …
ACDD M2, M3, M4, C1, C2, D3, …
USGS Best Practice M3, M4, M5, C6, C8, D1, D2, D3, …
…
For Creators
• Integrated into data set landing pages
• Link to detailed issues page
• Downloadable in machine-parsable formats
February 9, 2016 DataONE webinar 27
Recommendations
LTER Best Practice ACDD
55% 40%
For Repositories
February 9, 2016 DataONE webinar 28
Recommendations
LTER Best Practice
ACDD
63%
52%
Metadata Completeness
Recommendation
Recap
• MetaDIG project plans
− Metadata evaluation and completeness
− Metadata completeness tools and services
− Communication, guidance, and outreach
02/10/16 29
Thanks
This work was supported by National Science
Foundation award ACI - 1443062.
February 9, 2016 DataONE webinar 30

Contenu connexe

Similaire à DataONE_Guided Metadata Improvement

Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Jenn Riley
 
Gettingstartedwithdigitalcollectionsweb[1]
Gettingstartedwithdigitalcollectionsweb[1]Gettingstartedwithdigitalcollectionsweb[1]
Gettingstartedwithdigitalcollectionsweb[1]guest410707c
 
TERENA OER portal, metadata extraction analysis, LAK, Leuven @9apr2013
TERENA OER portal, metadata extraction analysis, LAK, Leuven @9apr2013TERENA OER portal, metadata extraction analysis, LAK, Leuven @9apr2013
TERENA OER portal, metadata extraction analysis, LAK, Leuven @9apr2013Ilias Hatzakis
 
If You Tag it, Will They Come? Metadata Quality and Repository Management
If You Tag it, Will They Come? Metadata Quality and Repository ManagementIf You Tag it, Will They Come? Metadata Quality and Repository Management
If You Tag it, Will They Come? Metadata Quality and Repository ManagementSarah Currier
 
Recommendations Analysis Dashboard
Recommendations Analysis DashboardRecommendations Analysis Dashboard
Recommendations Analysis DashboardSean Gordon
 
A Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data RepositoriesA Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data RepositoriesLIBER Europe
 
Semantic Web Adoption
Semantic Web AdoptionSemantic Web Adoption
Semantic Web Adoptionguest262aaa
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsVivien Bonazzi
 
End-to-End : Open Access Process Review and Improvements
End-to-End: Open Access Process Review and ImprovementsEnd-to-End: Open Access Process Review and Improvements
End-to-End : Open Access Process Review and ImprovementsRepository Fringe
 
Semantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-WorldSemantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-WorldAmit Sheth
 
Analysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebAnalysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebStefan Dietze
 
The Importance of Metadata - EUDAT Summer School (Shaun de Witt, CCFE)
The Importance of Metadata - EUDAT Summer School (Shaun de Witt, CCFE)The Importance of Metadata - EUDAT Summer School (Shaun de Witt, CCFE)
The Importance of Metadata - EUDAT Summer School (Shaun de Witt, CCFE)EUDAT
 
Augmenting MySQL with NoSQL options - Data Lifecycles
Augmenting MySQL with NoSQL options - Data LifecyclesAugmenting MySQL with NoSQL options - Data Lifecycles
Augmenting MySQL with NoSQL options - Data LifecyclesDavid Murphy
 
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web María Poveda Villalón
 

Similaire à DataONE_Guided Metadata Improvement (20)

Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
 
Gettingstartedwithdigitalcollectionsweb[1]
Gettingstartedwithdigitalcollectionsweb[1]Gettingstartedwithdigitalcollectionsweb[1]
Gettingstartedwithdigitalcollectionsweb[1]
 
TERENA OER portal, metadata extraction analysis, LAK, Leuven @9apr2013
TERENA OER portal, metadata extraction analysis, LAK, Leuven @9apr2013TERENA OER portal, metadata extraction analysis, LAK, Leuven @9apr2013
TERENA OER portal, metadata extraction analysis, LAK, Leuven @9apr2013
 
If You Tag it, Will They Come? Metadata Quality and Repository Management
If You Tag it, Will They Come? Metadata Quality and Repository ManagementIf You Tag it, Will They Come? Metadata Quality and Repository Management
If You Tag it, Will They Come? Metadata Quality and Repository Management
 
Recommendations Analysis Dashboard
Recommendations Analysis DashboardRecommendations Analysis Dashboard
Recommendations Analysis Dashboard
 
Hdg geo discussion
Hdg geo discussionHdg geo discussion
Hdg geo discussion
 
A Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data RepositoriesA Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data Repositories
 
RFCs for HDF5 and HDF-EOS5 Status Update
RFCs for HDF5 and HDF-EOS5 Status UpdateRFCs for HDF5 and HDF-EOS5 Status Update
RFCs for HDF5 and HDF-EOS5 Status Update
 
Semantic Web Adoption
Semantic Web AdoptionSemantic Web Adoption
Semantic Web Adoption
 
Metadata : Concentrating on the data, not on the scheme
Metadata : Concentrating on the data, not on the schemeMetadata : Concentrating on the data, not on the scheme
Metadata : Concentrating on the data, not on the scheme
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data Commons
 
End-to-End : Open Access Process Review and Improvements
End-to-End: Open Access Process Review and ImprovementsEnd-to-End: Open Access Process Review and Improvements
End-to-End : Open Access Process Review and Improvements
 
SMART Seminar Series: SMART Data Management
SMART Seminar Series: SMART Data ManagementSMART Seminar Series: SMART Data Management
SMART Seminar Series: SMART Data Management
 
Semantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-WorldSemantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-World
 
Analysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebAnalysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the Web
 
BioSharing - Update - Feb2016
BioSharing - Update - Feb2016BioSharing - Update - Feb2016
BioSharing - Update - Feb2016
 
Metadata as Standard: improving Interoperability through the Research Data Al...
Metadata as Standard: improving Interoperability through the Research Data Al...Metadata as Standard: improving Interoperability through the Research Data Al...
Metadata as Standard: improving Interoperability through the Research Data Al...
 
The Importance of Metadata - EUDAT Summer School (Shaun de Witt, CCFE)
The Importance of Metadata - EUDAT Summer School (Shaun de Witt, CCFE)The Importance of Metadata - EUDAT Summer School (Shaun de Witt, CCFE)
The Importance of Metadata - EUDAT Summer School (Shaun de Witt, CCFE)
 
Augmenting MySQL with NoSQL options - Data Lifecycles
Augmenting MySQL with NoSQL options - Data LifecyclesAugmenting MySQL with NoSQL options - Data Lifecycles
Augmenting MySQL with NoSQL options - Data Lifecycles
 
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
 

Dernier

Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfWildaNurAmalia2
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 

Dernier (20)

Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 

DataONE_Guided Metadata Improvement

  • 1. Sharing Data Through Guided Metadata Improvement Lindsay Powers and Ted Habermann - The HDF Group Matthew Jones – National Center for Ecological Analysis and Synthesis, University of California Santa Barbara DataONE webinar 1
  • 2. To help scientific communities: •Improve data discovery and access •Increase data use and re-use •Enhance understanding, especially across domains … by improving metadata completeness and consistency. 2 DataON E webinar Goals
  • 3. Terminology 3 DataON E webinar Concept : General term for describing a documentation entity (e.g. Title, Revision Date, Process Step, Spatial Extent). Spiral: A set of concepts required to support a particular documentation need or use case. Recommendation: A set of concepts that a group believes is required for achieving a documentation goal. Dialect : A particular form of the documentation language that is specific to a community (e.g. DIF, CSDGM, EML, ECHO, custom). Collection: A group of metadata records ideally in a machine- readable format, commonly organized by a data center, organization or project and often stored in a database or web accessible folder.
  • 4. DataONE Member Nodes (communities) using EML* • Ecological Society of America (ESA) • Global Lake Ecological Observatory Network (GLEON) • Alaska Ocean Observing System (GOA) • Montana Institute on Ecosystems (IOE) • Knowledge Network for Biocomplexity (KNB) • University of Kansas Biodiversity Institute (KUBI) • Long-term Ecological Research Network (LTER) • European Long-term Ecosystem Research Network (LTER_EUROPE) • University of California / DataONE (ONEShare) • Terrestrial Environmental Research Network (TERN) • Taiwan Forestry Research Institute (TFRI) • National Phenology Network (USANPN) 4DataONE webinar * Ecological Metadata Language
  • 5. DataONE Member Nodes using CSDGM* • California Digital Libraries (CDL) • USGS Core Sciences Clearinghouse (USGSCSAS) • Earth Data Analysis Center (EDACGSTORE) • Environmental Data for the Oak Ridge Area (EDORA) • Oak Ridge National Lab Distributed Active Archive Center (ORNLDAAC) • Regional and Global Biogeochemical Dynamics Data (RGD) • Sustainable Environment Actionable Data (SEAD) • New Mexico Experimental Program to Stimulate Comptetitive Research (NMEPSCOR) 5DataONE webinar * Content Standard for Digital Geospatial Metadata
  • 6. Questions • Can community developed metadata recommendations help improve metadata content within a particular community? • Can community developed metadata recommendations help improve metadata content among different communities? • Can metadata recommendations developed in a specific dialect be used to help improve metadata in other dialects? Can they facilitate communication? DataONE webinar 6
  • 7. Communities, dialects and recommendations DataONE webinar 7 LTER KNB GLEON GOA LTER-EU TERN USANPN ESA KUBI CommunicationBarriers (Dialects) CDL USGSCSAS EDAC EDORA ORNLDAAC RGD SEAD NMEPSCoR Recommendations LTER FGDC etc.EML CSDGM Access Discovery Identification Evaluation Integration
  • 8. LTER Metadata Recommendations LTER developed a suite of metadata recommendations based on community requirements… •Did these recommendations make metadata more complete in comparison to other entities? •Does LTER metadata practice provide a good example for other communities? DataONE webinar 8
  • 9. LTER Recommendations (Spirals) DataONE webinar 9 Discovery Geographic Coverage Taxonomic Coverage Temporal Coverage Maintenance Discovery Geographic Coverage Taxonomic Coverage Temporal Coverage Maintenance Identification Contact name and organization Creator name and organization Abstract Publication info Keywords Resource identifier and title Identification Contact name and organization Creator name and organization Abstract Publication info Keywords Resource identifier and title Evaluation Attribute definition Entity type Process step Project description Resource use Constraints Evaluation Attribute definition Entity type Process step Project description Resource use Constraints
  • 10. Methods • Randomly sampled up to 250 records from each EML or CSDGM metadata collection (Member Node) • Mapped dialect to LTER Recommendation concepts • Analyzed collections for completeness in relation to recommendations • Compared collections to identify shining examples 10DataONE webinar
  • 11. 11 LTER Recommendations and EML Collections DataONE webinar Can community developed metadata recommendations help improve metadata content within a particular community? Among different communities using the same dialect?
  • 12. Improving MD across communities DataONE webinar 12 EML Concepts/ Recommendation ESA GLEON GOA IOE KNB LTER LTER_EU ONEShar e TERN TFRI USANPN KUBI Resource Identifier 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% Resource Title 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% Author / Originator 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% Metadata Contact 100% 58% 0% 0% 68% 98% 84% 0% 0% 0% 0% 0% Contributor Name 100% 42% 95% 0% 74% 1% 0% 0% 0% 53% 100% 0% Publisher 0% 25% 0% 0% 0% 100% 0% 94% 100% 0% 0% 0% Publication Date 100% 50% 0% 0% 68% 100% 69% 100% 0% 0% 0% 0% Resource Contact 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% Abstract 100% 92% 100% 100% 97% 100% 88% 98% 100% 100% 100% 0% Keyword 80% 75% 100% 96% 87% 100% 100% 100% 100% 100% 100% 100% Resource Distribution 100% 100% 97% 100% 87% 100% 100% 94% 100% 100% 100% 100% Taxonomic Extent 100% 0% 77% 8% 35% 0% 21% 0% 100% 12% 0% 0% Spatial Extent 100% 92% 94% 100% 90% 100% 48% 97% 100% 100% 100% 100% Temporal Extent 100% 92% 94% 4% 87% 100% 98% 94% 100% 35% 100% 100% Maintenance 0% 25% 0% 0% 0% 99% 0% 0% 0% 0% 0% 0% Resource Use Constraints 100% 92% 100% 100% 94% 99% 89% 88% 0% 82% 100% 0% Process Step 80% 67% 94% 0% 68% 100% 100% 0% 100% 88% 100% 0% Project Description 0% 33% 95% 8% 13% 1% 0% 94% 100% 0% 0% 0% Entity Type Definition 0% 75% 79% 8% 16% 2% 0% 95% 0% 24% 100% 0% Attribute Definition 0% 83% 84% 29% 23% 3% 0% 95% 0% 100% 100% 0%
  • 13. LTER recommendations and CSDGM DataONE webinar 13 Can metadata recommendations developed in a specific dialect be used to help improve metadata in other dialects?
  • 14. Recommendations across dialects DataONE webinar 14 CSDGM Concepts CDL USGSCSAS EDAC EDORA ORNLDAAC RGD SEAD NMEPSCOR Resource Identifier -100% -100% -100% -100% -100% -100% -100% -100% Resource Title 100% 100% 100% 100% 100% 100% 100% 100% Author / Originator 100% 100% 100% 100% 100% 100% 100% 100% Metadata Contact 100% 100% 100% 100% 100% 100% 100% 100% Contributor Name 100% 100% 100% 100% 100% 100% 100% 100% Publisher 100% 26% 1% 0% 0% 0% 67% 0% Publication Date 100% 100% 100% 0% 0% 0% 100% 100% Resource Contact 100% 80% 100% 100% 100% 100% 67% 100% Abstract 100% 100% 100% 100% 100% 100% 100% 100% Keyword 100% 100% 100% 100% 100% 100% 100% 100% Resource Distribution 0% 100% 100% 100% 100% 100% 67% 100% Taxonomic Extent -100% -100% -100% -100% -100% -100% -100% -100% Spatial Extent 100% 100% 100% 100% 100% 100% 100% 100% Temporal Extent 0% 36% 95% 100% 100% 100% 89% 57% Maintenance 100% 100% 100% 100% 100% 100% 100% 100% Resource Use Constraints 100% 100% 100% 0% 0% 0% 100% 100% Process Step 0% 0% 0% 0% 0% 0% 0% 0% Project Description -100% -100% -100% -100% -100% -100% -100% -100% Entity Type Definition 100% 98% 81% 0% 0% 0% 0% 100% Attribute Definition 100% 98% 81% 0% 0% 0% 0% 100%
  • 15. 15DataONE webinar Recommendations bridging communities Can metadata recommendations developed in a specific dialect facilitate Communication across dialects?
  • 16. Some answers… • Have the LTER recommendations improved LTER metadata completeness? − LTER collections are above average in complete concepts − Room for improvement in completeness of Evaluation concepts − Many LTER concepts are nearly complete, small effort to complete • Can LTER recommendations help other communities? − We believe so, there seems to be a lot of alignment of MD priority concepts. − What strategies might be useful to communities to help improve completeness? • Can metadata recommendations developed in a specific dialect be used to help improve metadata in other dialects? − CSDGM dialect contains most of the concepts found in the LTER recommendation spirals, and therefor these recommendations can be easily applied to CSDGM collections − CSDGM collections are complete with respect to most of the LTER recommended concepts DataONE webinar 16
  • 18. DataONE webinar 18 EOL CISL ACOM CGD Unidata HAO RAL MMM IIS The UCAR Labs and Dialects DataCite MODS EOL RDA-CISL ISO 19115-1 CGD ncML
  • 19. Recommendations Comparison What recommendation fits our science? 19DataONE webinar DataCite DataCite ISO ISO
  • 20. DataONE Repositories February 9, 2016 DataONE webinar 20 • Large communities • Diverse metadata • Diverse data
  • 21. MetaDIG Tools and services • Metadata Improvement and Guidance (MetaDIG): − Individual researchers (producers) • At record level, during submission − Data repositories • At collection level − Individual researchers (consumers) • At record level, for re-use February 9, 2016 DataONE webinar 21
  • 22. Automation • Automate: − Metadata Completeness • against recommendations − Metadata and Data Congruency − Metadata Effectiveness • Semantics, therefore much harder February 9, 2016 DataONE webinar 22
  • 23. Metadata Quality Service February 9, 2016 DataONE webinar 23
  • 24. EML Congruency Checker • Starting point: − LTER tool for Ecological Metadata Language − Standard, extensible report format − Suite of developed checks • Expand to support: − Multiple metadata standards − Multiple recommendations February 9, 2016 DataONE webinar 24 valid
  • 25. Extensible quality checks Check# Check Name Check Type M1 Descriptive Title Title exists, > 7 words Metadata M2 Unique Attribute Names Attribute names unique Metadata M3 Valid Units Units assigned from controlled vocabulary Metadata M4 Schema valid Metadata validates Metadata C1 Checksum matches Data checksums match metadata Congruency C2 Data links live All URLs return data Congruency D1 Duplicate data rows Count duplicate rows Data … February 9, 2016 DataONE webinar 25 • Checks in Java, R, Python • Categorized by function (discovery, re-use, …) • Operate across dialects (EML, CSDGM, ISO19139)
  • 26. Recommendations • Checks: like unit tests for recommendations • Community Recommendations − Group of quality checks − Can be created by any community − Can include standard or custom checks − Checks: access both metadata and data February 9, 2016 DataONE webinar 26 Recommendation Checks LTER Best Practice M1, M2, C2, C3, D3, … ACDD M2, M3, M4, C1, C2, D3, … USGS Best Practice M3, M4, M5, C6, C8, D1, D2, D3, … …
  • 27. For Creators • Integrated into data set landing pages • Link to detailed issues page • Downloadable in machine-parsable formats February 9, 2016 DataONE webinar 27 Recommendations LTER Best Practice ACDD 55% 40%
  • 28. For Repositories February 9, 2016 DataONE webinar 28 Recommendations LTER Best Practice ACDD 63% 52% Metadata Completeness Recommendation
  • 29. Recap • MetaDIG project plans − Metadata evaluation and completeness − Metadata completeness tools and services − Communication, guidance, and outreach 02/10/16 29
  • 30. Thanks This work was supported by National Science Foundation award ACI - 1443062. February 9, 2016 DataONE webinar 30

Notes de l'éditeur

  1. Our approach is…. Value community defined modes of describing data and processes – communities know best what is important to them Currently exploring the similarities and difference among communities… what do they value?
  2. Dialect – examples include formalized metadata language standards such as EML (Ecological Community), Directory Interchange Format (NASA Global Change Master Directory), Federal Geographic Data Committee, Content Standard for Digital Geospatial Metadata –( US Federal metadata standard (dialect) ), Earth Observing System Clearinghouse (NASA Earth Science Data System Community) Recommendations often are associated with or developed by creators of dialects/standards
  3. Ecological Metadata Language is most common dialect among DataONE collections Was developed by Ecology community
  4. Developed by the Federal Geographic Data Committee, the historical Federal metadata standard for geospatial data
  5. The LTER recommendations are different because they were developed by a community to meet specific use cases and are not derived from a specific dialect. Compare LTER collections to others using same dialect without the awareness of the recommendations Is LTER more complete? Yes in some cases but not all. Can LTER be used as a shining example for other communities?
  6. 3 levels of recommendation Not specific to EML though also developed by the Ecology community and therefore highly relevant to EML collections
  7. Identification, Discovery, Evaluation Green represents 100% complete concepts = all records complete Yellow = the concept exists but is not being used % = % of complete records for that concept When we compare relative percentages LTER does much better but USANPN, TERN and ESA provide shining examples for other communities when it comes to MD completeness;
  8. Missing: Identification = Resource Identifier Discovery = Taxonomic Extent Evaluation = Project Description CSDGM has additional profiles e.g. biological profile… this is just standard CSDGM