7. How?
• Key identifying elements
• Emerging recommendations
• Variation among the domains
• In common: Persistent identifier
8. DataCite
German National Library of Economics (ZBW) Canada Institute for Scientific and Technical Information
German National Library of Science and Technology (TIB) (CISTI)
German National Library of Medicine (ZB MED) Technical Information Center of Denmark
GESIS - Leibniz Institute for the Social Sciences, Germany Institute for Scientific & Technical Information (INIST-
Australian National Data Service (ANDS) CNRS), France
ETH Zurich, Switzerland TU Delft Library, The Netherlands
The Swedish National Data Service (SNDS)
The British Library , UK
California Digital Library (CDL), USA
Office of Scientific & Technical Information (OSTI), USA
Purdue University Library
9. What is an identifier?
What you see: alphanumeric string (never changes)
Associated with: location of object (such as a URL)
Optional: who, what, when, etc (i.e. metadata)
By Joelk75: http://www.flickr.com/photos/75001512@N00/2728233597/
10. Identifier example
string: doi:10.9999/FK40K2GTV
html version: http://dx.doi.org/10.9999/FK40K2GTV
location: http://www.bologna.edu/biology/xfg/123.xls
metadata
creator: Dr. Felix Kottor
title: Data for chromosomal study of catfish (Ictalurus
punctatus)
publisher: University of Bologna
date: 8/31/2011
11. Identifier example
string: doi:10.9999/FK40K2GTV
html version: http://dx.doi.org/10.9999/FK40K2GTV
location: http://www.state.edu/ecology/783sdr/123.xls
metadata
creator: Dr. Felix Kottor
title: Data for chromosomal study of catfish (Ictalurus
punctatus)
publisher: Dryad Data Repository
date: 10/01/2011
12. EZID: long-term identifiers made easy
take control of the
management and
distribution of your research,
share and get credit for it,
and build your reputation
through its collection and
documentation
Primary Functions
1. Create persistent identifiers
2. Manage identifiers over time
3. Manage associated metadata over time
17. DataCite Metadata V. 2.2
• Small required set = citation elements
• Optional descriptive set:
– extendable lists
– can refer to other standards, schemes
– domain-neutral
– rich ability to describe relationships to other
digital objects
• Metadata Search (MDS) is full-text indexed
18. DataCite Metadata V. 2.2
Required properties
1. Identifier (with type attribute)
2. Creator (with name identifier attributes)
3. Title (with optional type attribute)
4. Publisher
5. PublicationYear
19. DataCite Metadata V. 2.2
Optional properties
6. Subject (with schema attribute)
7. Contributor (with type & name identifier attributes)
8. Date (with type attribute)
9. Language
10. ResourceType (with description attribute)
11. AlternateIdentifier (with type attribute)
12. RelatedIdentifier (with type &relation type attributes)
13. Size
14. Format
15. Version
16. Rights
17. Description (with type attribute)
20. DataCite Metadata V. 2.2
Optional properties
6. Subject (with schema attribute)
7. Contributor (with type & name identifier attributes)
8. Date (with type attribute)
9. Language
10. ResourceType (with description attribute)
11. AlternateIdentifier (with type attribute)
12. RelatedIdentifier (with type &relation type attributes)
13. Size
14. Format
15. Version
16. Rights
17. Description (with type attribute)
21. Data Management Planning
By NASA Goddard Photo and Video: http://www.flickr.com/photos/gsfc/3720663276/
22. A life cycle approach
CDL Curation and Publishing Services
http://www.cdlib.org
Create, edit, share, and save
data management plans
Open source add-in for Microsoft Excel
as a data collection tool
Create and manage
persistent identifiers
Curation repository:
store, manage, and share research data
Open access scholarly publishing services:
papers, journals, books, seminars & more
An infrastructure to publish and get credit Data Publication
for sharing research data
23. Identifiers and data management
Track your Organize
results your data
Get
more
citations
Meet funder requirements
24. Next Steps
DataCite
• Dublin Core application profile
• Content Service
• Metadata v. 2.3
EZID
•UI redesign
•Automated link checking
•Exposure for metadata
By Nicola Whitaker http://www.flickr.com/photos/nicolawhitaker/111009156/
25. Next Steps
Library
• service center
• information center
• your ideas here
By Nicola Whitaker http://www.flickr.com/photos/nicolawhitaker/111009156/
26. For more information
EZID
EZID application: http://n2t.net/ezid/
EZID website:
http://www.cdlib.org/services/uc3/ezid/
DataCite
DataCite Home: http://datacite.org/
DataCite Metadata Schema:
http://schema.datacite.org/meta/kernel-
2.2/index.html
DataCite Metadata Search: http://search.datacite.org
27. Questions?
by Horia Varlan
http://www.flickr.com/photos/horiavarlan/4273168957/in/photostream/
Joan Starr: uc3@ucop.edu
@joan_starr
Notes de l'éditeur
Thank you for this opportunity to speakwith you today about Dataset Metadata. Let me give special thanks to Meghan for asking me to speak.Image credits:By: MDB 28, http://www.flickr.com/photos/mdb28/3787828482/By davecurlee, http://www.flickr.com/photos/davecurlee/4689603488/By sabarishr: http://www.flickr.com/photos/sabarishr/5422105775/By rkrichardson: http://www.flickr.com/photos/45126397@N06/4506403367/By awsheffield: http://www.flickr.com/photos/awsheffield/5932294950/By Scutter: http://www.flickr.com/photos/scutter/109698478/By Amy the Nurse: http://www.flickr.com/photos/amyashcraft/4522601466/By Anita & Greg: http://www.flickr.com/photos/anita__greg/2849453715/
My library:Serving the 10 UC campuses226,000 students 134,000 faculty and staffWorking collaborativelylibrariesdata centersmuseums, archivesfaculty and researchersCDL has historically provided strategic, integrated technical and program services in a broad portfolio, including:Groundbreaking licensing agreementsUnion bibliographic servicesData curation & preservation toolsOpen access publishing servicesCDL: http://www.cdlib.org/
My group:The UC Curation Center is creative partnership between the CDL, the ten UC campuses, and peer institutions in the community.A community of shared concern and practiceProvide solutions, services, resources for digital assets Pool & distribute diverse experience, expertise, & resources
Access: The researchers’ requirements are for: ESIP—Earth Science Information Partners (http://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Citations/provider_guidelines)To provide fair credit to those responsible: exposureTo aid scientific reproducibility—re-useTo ensure scientific transparency and reasonable accountability: verificationTo aid in tracking the impact of the work: citation trackingPreservation: Easy to maintainThe funders’ requirements are for data management and And the library’s charge is to preserve our institutions’ scholarly assets
How are we going to meet these needs? If we go back to what the domains are doing…From ESIP –Earth Science Information Partners (same link)Author(s)--the people or organizations responsible for the intellectual work to develop the data set. The data creators.Release Date--when the particular version of the data set was first made available for use (and potential citation) by others.Title--the formal title of the data setVersion--the precise version of the data used. Careful version tracking is critical to accurate citation.Archive and/or Distributor--the organization distributing or caring for the data, ideally over the long term.Locator/Identifier--this could be a URL but ideally it should be a persistant service, such as a DOI, Handle or ARK, that resolves to the current location of the data in question.Access Date and Time--because data can be dynamic and changeable in ways that are not always reflected in release dates and versions, it is important to indicate when on-line data were accessed.From ICPSR—Inter-University Consortium for Political and Social Research http://www.icpsr.umich.edu/icpsrweb/ICPSR/curation/citations.jspTitleAuthorDateVersionPersistent identifier (such as the Digital Object Identifier, Uniform Resource Name URN, or Handle System)
What’s in common: the persistent identifier.
DataCite was formed in 2009 by 10 Libraries and Research Centers with a Mission: “"Helping you find, access, and reuse data“The number has now grown to 15. In addition there are 3 associate members, including the Korea Institute of Science and Technology Information, so there is a presence in Asia.California Digital Library was one of the founding members.DATACITE’s primary methodology for achieving this mission: issuing DOIs (Digital Object Identifiers) for datasets.
DOIs are one kind of persistent identifier.But what is an identifier?An identifier is an alphanumeric string assigned to an object, and if that assignment is managed with some metadata and the object is made available over time, the identifier becomes a VERY reliable way of keeping track of that object.
Let’s take a look at one.So you can see that with just the identifier and a simple set of metadata, you get:Location for VERIFICATIONEXPOSURE & CITATION TRACKING(this is not an actual DOI, nor an actual study)
And here’s that same DOI some time later.THE STRING NEVER CHANGES. This means it can be cited, tracked and associated with all kinds of metadata. More on that in a minute.
EZID is CDL’s application for offering DataCite DOIs as well as other identifiers.
If you go to the Home Page, you can use the UI to test EZID. CLICK for HELP TAB.
On the Help screen, you have the choice of creating a test ARK or DOI.[CLICK] Click the Create buttonARKs and DOIsARKsFlexibleCase-sensitiveSpecial features support granularityCan be deletedInexpensiveDOIsEstablished brand in publishingIndexed by major A&I citation databases DataCite policies applyCannot be deletedMore costlyDOIs should be assigned to objects that are under good long-term management, and where there is an intention is to make the object persistently available.DOIs must be registered exclusively with metadata that is available to public view.Can DOIs and ARKs work together?Yes. For example, researchers may choose to use ARKs for unpublished materials associated with an object that has been registered with a DOI. These two identifier schemes can work well together, and EZID offers them both, along with policy support consistent across both schemes.
EZID creates the identifier and sends you to the MANAGE tab where you have the opportunity to enter a target URL and other metadata.UI support: Dublin KernelDublin CoreDataCite KernelAPI supportAll of the aboveFull DataCite Schema
When you hover over a field, it opens up for editing as you can see here. This is where you would go if you wanted to maintain the metadata or the target URL.
Now let’s take a look at the full DataCite Metadata set.MDS=Metadata SearchRemember, we said that any solution needed to:ALLOW the submitter to accurately describe the object so that anyone accessing knows what they are getting. ALLOW the submitter to give credit where credit is due. PROVIDEsupport for *data management* – format, version, rights
The 5 Required properties = basic citation elementsIdentifier = DOI now; in future may open upCreator is repeatable; Name can have a nameIdentifier and schema as in ORCHID idTitle is repeatable and has an optional type attribute for Alternative Title; Subtitle; and TranslatedTitlePublisher: “In the case of datasets, "publish" is understood to mean making the data available to the community of researchers.”IDENTIFIER=VERIFICATIONALLOW the submitter to give credit where credit is due. EXPOSURE & CITATION TRACKINGIf the Year field isn’t quite what you want—use the repeatable DATE field in the optional set.
Optional elementsIncludes support for data management FORMAT, VERSION, RIGHTSIn addition, some of these offer expansion of the required set. Contributer expands Creator. Date expands PublicationYear.But the distinctive strength comes from Number 12.[CLICK]
Optional elementsThe Family Jewels = RelatedIdentifer, relationTypeIsCitedBy & Cites IsSupplementTo & IsSupplementedByIsContinuedBy & Continues IsNewVersionOf & IsPreviousVersionOf IsPartOf & HasPart IsDocumentedBy & Documents isCompiledBy & CompilesIsVariantFormOf & IsOriginalFormOfCOMING IN 2.3: IsIdenticalTo
“Data Management Planning” is a popularphrase these days. As metadata and preservation librarians, I think you’ll find many of the concepts to be very familiar, if wearing new clothes.Let me tell you a little story about the life of a dataset.You start out in a laptop (or a tablet) travelling around, or under a deskMaybe then you get emailed across the country or around the world.Years can go by as you get updated and altered.Eventually, maybe you have a day in the sun: your researcher decides to write up the results and cite you.Then, perhaps, it’s back to a server in the dark. Or, you move from server to server. Will you be forgotten?
That’s why we at California Digital Library have taken a life cycle approach with an array of tools.CDL has developed an array of tools and services ranging from the first stage of developing a data management plan, through to formal publication. We encourage researchers to assign an ID early in the process - to provide a credible data management plan for funders;- to make the later stages easier and - to manage situations where changes might occur during the course of the research—a researcher changes institutions or a research team changes the location of their data, for example.
Dublin Core application profile available for the DataCite Metadata Schema; we’ll keep it up to date and in-sync. From the DCMI: “A DCAP is designed to promote interoperability within the constraints of the Dublin Core model and to encourage harmonization of usage and convergence on "emerging semantics" around its edges.”Content Service exposes our metadata stored in the DataCite Metadata Store (MDS) using multiple formats Alpha version: The service can be accessed at http://data.datacite.orgEZID: UI redesignActivity reportingBrowse & searchEnhanced persistence supportAutomated link checking in support of our new Tombstone pages (a web page returned for a resource no longer found at its target location of record. The tombstone may provide “last known” metadata, including the original owner.)Exposure for metadata—evidence that citations will increase (Heather Piwowar’s work)Thomson-Reuters (Web of Knowledge)Elsevier (Scopus)OAI? RSS?GoogleScholar
Library as a service center: Consulting, EZID, DMP,DCXL, IRInformation: pointing people to standards, toolsHelping make connections.
The next steps for you as individuals is to get more information and try things for yourselves.