This is module 10 in the EDI Data Publishing training course. In this module, you will receive an introduction to what a data package is, how DOIs are assigned to data packages, and the repository's steps to insert a data package.
3. 3
Here is the greenish title slide
Objectives
Objectives
4. What is the EDI Data Repository?
● An Internet accessible open access data repository
● Uses the PASTA+ data repository software stack
● Metadata-driven publication workflow
● Generates Digital Object Identifiers for all public data packages
● Supports two DataONE member nodes
● Contains about 44,000 unique data packages
● Stores about 11TB of data
● Uses Amazon AWS Glacier for off-line/site storage
4
5. History
5
time
(not to scale)
today
2016
2013
2010
2007
DOIs
minted
Early NIS
discussions
NIS/PASTA user
testing and
evaluation
LTER NIS
Production
release
PASTA
development
begins
2nd
LTER
MN
Transitions to
EDI Data
Repository
EDI
MN
44,000
Data
Packages
DataCite
Membership
LTER Network EDI
2009
1st
LTER
MN
10. Data package
10
Data Package (noun): an assemblage of science metadata and one or more science
data objects; data packages include a quality report object and are described by
package metadata called a “resource map” (i.e. manifest)
Science Metadata
001010001011010110110101
01010101000111010010101
0001011001010101010001
1101100101010100...
Science Data Quality Report
✓
✓
✗
✓
1. Science Metadata
2. Science Data
3. Quality Report
Resource Map
+ + +
Data Package
YOU are responsible
for this
11. Data package identifiers
Package Identifier (noun): a string value that uniquely identifies the data package
within the EDI Data Repository.
11
edi.10.1
16. Data package versioning
● PASTA+ enforces strong versioning - published data are immutable
● To add/modify metadata or data to a data package, you must upload a new
revision of your EML metadata
● Within the new EML metadata, you must increment the “revision” value of the
package identifier
16
17. Data package quality evaluation
A series of quality checks for…
Metadata validation
● Well formed and schema valid
● Content validation (does content match best practices?)
Data validation
● Accessible (can data be downloaded?)
Congruence validation
● Metadata description of data matches physical structure of data (e.g., correct
number of columns, rows, datatype, delimiters)
17
19. Quality evaluation report
● Valid - quality check meets criteria
● Warn - quality check does not meet criteria, but does not fail upload
● Error - quality check does not meet criteria, results in failed upload
● Info - quality check only provides information
19