SlideShare une entreprise Scribd logo
1  sur  46
Télécharger pour lire hors ligne
Unless otherwise noted, the slides in this presentation are licensed by Mark A. Parsons under a Creative Commons Attribution-Share Alike 3.0 License
Data Citation Then and Now
Mark A. Parsons with help from Ruth Duerr and Peter Fox
!
!
!
17 June 2014
GeoData 2014
Boulder, CO
The Evolution of Data Citation
• Data was part of the literature—tables, maps, monographs, etc.—and we cited
accordingly. (Some data were still hoarded).
• Digital data becomes the norm. It’s messier and we forget how to do cite it
routinely.
• Initial efforts to define digital data citation in the 90s - early 00s
• Right idea, little traction
• Partially conflated with the citing URLs issue
• A blossoming in the mid-late 00s.
• Multiple disciplines start developing approaches and guidelines
• DOI a big driver, especially for DataCite, but other identifiers used too
(Handles, LSIDs, UNFs, ARKs and good ol’ URI/Ls)
• A slightly competitive atmosphere
• GeoData 2011 a milestone for ESIP Citation Guidelines—most sophisticated to
date. (http://dx.doi.org/10.7269/P34F1NNJ)
Then
• Now a consensus phase
• Out of Cite, Out of Mind: The Current State of Practice, Policy, and
Technology for the Citation of Data. 2013.

http://dx.doi.org/10.2481/dsj.OSOM13-043
• Draft Global Joint Declaration of Data Citation Principles. 2013.

http://www.force11.org/datacitation
The Evolution of Data Citation
Now
• Implementation phase just begun
• ESIP Guidelines adopted by a variety of NASA and NOAA data centers and
internationally by GEOSS.
• AGU Publishing Committee is developing author guidelines based on ESIP.
• Other disciplines, notably social science, has relationships with publishers.
• Several data centers partnering with publishers, e.g. Elsevier’s “article of the
future”.
• New PLOS data policy.
• Joint engagement activity following on the joint principles.
• It happens locally and requires culture change so debates will continue.
The Evolution of Data Citation—Next
Next
Outline of 2011 Talk
• Purpose of Data Citation
• How it’s currently done
• Basic citation form and content
• Identifiers and locators
• Microcitation
Purpose of data citation
Purpose of Data Citation
• Credit for data creators and stewards

• Track impact of data set

• Accountability for creators and stewards

• Aid reproducibility through direct, unambiguous
connection to the precise data used

!
• A location/reference mechanism not a discovery
mechanism per se.
7
Then
Purpose of Data Citation
• Aid scientific reproducibility through direct, unambiguous
connection to the precise data used.
• Credit for data authors and stewards
• Accountability for creators and stewards
• Track impact of data set
• Help identify data use (e.g., trackbacks)
• Data authors can verify how their data are being used.
• Users can better understand the application of the data.
!
• A locator/reference mechanism not a discovery mechanism per se
Now
Joint	
  Declaration	
  of	
  Data	
  Citation	
  Principles	
  	
  
(Overview)	
  
The	
  Noble	
  Eight-­‐Fold	
  Path	
  to	
  Citing	
  Data
1. Importance	
  
2. Credit	
  and	
  attribution	
  	
  
3. Evidence	
  
4. Unique	
  Identification	
  	
  
5. Access	
  
6. Persistence	
  	
  
7. Specificity	
  and	
  verifiability	
  	
  
8. Interoperability	
  and	
  flexibility
Principles	
  are	
  supplemented	
  with	
  a	
  glossary,	
  references	
  and	
  examples	
  
http://force11.org/datacitation
Purpose of Data Citation
A locator/reference/linking mechanism!
This helps
• Aid scientific reproducibility through direct, unambiguous connection to the precise
data used
• Identify and track data use
!
It contributes but is NOT central to
• Credit for data authors and stewards
• Accountability for creators and stewards
• Tracking impact of data set
Next
Is data publication the right metaphor?
	 M. A. Parsons & P. A. Fox
	 Data Science Journal, 2013
	 http://dx.doi.org/10.2481/dsj.WDS-042
Our ordinary conceptual system,
in terms of which we both think
and act, is fundamentally
metaphorical in nature.
—Lakoff and Johnson, 1980
If we hear the same language over
and over, we will think more and
more in terms of the frames and
metaphors activated by that
language.
—Lakoff, 2008
How it’s currently done
How data citation is currently done
• Citation of traditional publication that actually contains the data,
e.g. a parameterization value.

• Not mentioned, just used, e.g., in tables or figures

• Reference to name or source of data in text

• URL in text (with variable degrees of specificity)

• Citation of related paper (e.g. CRU Temp. records recommend
citing two old journal articles which do not contain the actual data
or full description of methods)

• Citation of actual data set typically using recommended citation
given by data center

• Citation of data set including a persistent identifier/locator,
typically a DOI
13
Then
How data citation is currently done
• Citation of traditional publication that actually contains the data, e.g.
a parameterization value.
• Not mentioned, just used, e.g., in tables or figures
• Reference to name or source of data in text
• URL in text (with variable degrees of specificity)
• Citation of related paper (e.g. CRU Temp. records recommend citing
two old journal articles which do not contain the actual data or full
description of methods.)
• Citation of actual data set typically using recommended citation given
by data center
• Citation of data set including a persistent identifier/locator, typically a
DOI
Now
0
 100
 200
 300
 400
 500
 600
2002
2003
2004
2005
2006
2007
2008
2009
“MODIS Snow Cover Data” in Google Scholar
1.3%
1.0%
0.7%
0.7%
1.3%
0.9%
1.3%
1.7%
Formal Citation
Total Entries
Then
How data citation is currently done
NowNext
• Implementation phase just begun
• ESIP Guidelines adopted by a variety of NASA and NOAA data centers and
internationally by GEOSS.
• AGU Publishing Committee is developing author guidelines based on ESIP.
• Other disciplines, notably social science, has relationships with publishers.
• Several data centers partnering with publishers, e.g. Elsevier’s “article of the
future”.
• New PLOS data policy.
• Joint engagement activity following on the joint principles.
• It happens locally and requires culture change so debates will continue.
Basic content
Basic data citation form and content
Per DataCite:
Creator. PublicationYear. Title. [Version]. Publisher. [ResourceType].
Identifier.
!
Per ESIP:
Author(s). ReleaseDate. Title, [version]. [editor(s)]. Archive and/or
Distributor. Locator. [date/time accessed]. [subset used].
!
NextThen Now
An Example Citation
Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston.
2002, Updated 2003. CLPX-Ground: ISA snow depth
transects and related measurements ver. 2.0. Edited by M.
Parsons and M. J. Brodzik. Boulder, CO: National Snow
and Ice Data Center. Data set accessed 2008-05-14 at
http://dx.doi.org/10.5060/D4H41PBP.
NextThen Now
Identifiers and locators
ID Scheme Data Set Item Data Set Item Data Set Item Data Set Item
URL/N/I
PURL
XRI
Handle
DOI
ARK
LSID
OID
UUID
An assessment of identification schemes
for digital Earth science data
Unique
Identifier
Unique
Locator
Citable
Locator
Scientifically
Unique ID
Good
Fair
Poor
Adapted from Duerr, R. E., et al.. 2011. On the utility of identification schemes for digital Earth science data: An
assessment and recommendations. Earth Science Informatics. 4:139-160.

http://dx.doi.org/10.1007/s12145-011-0083-6
Then Now
ID Scheme Data Set Item Data Set Item Data Set Item Data Set Item
URL/N/I
PURL
XRI
Handle
DOI
ARK
LSID
OID
UUID
An assessment of identification schemes
for digital Earth science data
Unique
Identifier
Unique
Locator
Citable
Locator
Scientifically
Unique ID
Locators
Identifiers
Good
Fair
Poor
Adapted from Duerr, R. E., et al.. 2011. On the utility of identification schemes for digital Earth science data: An
assessment and recommendations. Earth Science Informatics. 4:139-160.

http://dx.doi.org/10.1007/s12145-011-0083-6
Then Now
• Everything needs an identifier. Most things need locators. Intellectual content
needs citation.
• Different versions of things may need different identifiers/locators
• Subsets may need identifiers or clear reference to sub-setting process (e.g.
space and time).
• Different representations (conceptual models) may need different identifiers/
locators. E.g Maps.
What needs an identifier/locator?
What needs to be cited?
Why the DOI?
• Not perfect but well understood by publishers

• Thomson Reuters collaborating with DataCite to get data citations in
their index.

!
But...

• What is the citable unit?

• How do we handle different versions?

• What about “retired” data?

• When is a DOI assigned?
Then Now
Versioning approach recommended by DCC
• “As DOIs are used to cite data as evidence, the dataset to which a DOI
points should also remain unchanged, with any new version receiving a
new DOI.”

• “There are two possible approaches the data repository can take:

time slices and snapshots.”
Now
When to assign a DOI?
• First principle: Data should be citable as soon as they are available for use by
anyone other than the original authors.
• But...
• Most people (falsely) believe that a DOI implies permanence so how do we
cite transient data?
• Some believe that a DOI should not be assigned until the data has
undergone some level of review (e.g. Lawrence et al. 2010). So how do we
cite data used before the review?
• Data are often used by friends and collaborators in a raw, “unpublished”
state. Should this use be cited with a DOI?
• Near real time or preliminary data may only be available for a short un-
curated period. There may not be a good match between the submission
package and the distribution package. What gets the DOI? When?
Now
Versioning and locators:
some suggestions from NSIDC
• major version.minor version.[archive version]

• Individual stewards need to determine which are major vs. minor versions and describe the
nature and file/record range of every version.

• Assign DOIs to major versions. 

• Old DOIs should be maintained and point to some appropriate page that explains what
happened to the old data if they were not archived.

• A new major version leads to the creation of a new collection-level metadata record that is
distributed to appropriate registries. The older metadata record should remain with a
pointer to the new version and with explanation of the status of the older version data.

• Major and minor version (after the first version) should be exposed with the data set title
and recommended citation.

• Minor versions should be explained in documentation, ideally in file-level metadata.

• Applying UUIDs or ARKs to individual files upon ingest aids in tracking minor versions and
historical citations.
Microcitation
Basic data citation form and content
!
Author(s). ReleaseDate. Title, Version. [editor(s)]. Archive and/or
Distributor. Locator. [date/time accessed]. [subset used].

!
!
!
The best solution is to have unique identifiers or query IDs for
subsets, but that won’t be available for most data sets for a long
time, so we need alternative solutions...
NextThen Now
!
February 8, 2011, 4:45 PM
Page Numbers for Kindle Books an Imperfect Solution
Amazon’s Kindle will have
page numbers that
correspond to real books
and locations by passage.
Neither solution is perfect—‘locations’ or
page numbers—because the problem is
unsolvable. The best we can hope for is a
choice...
http://pogue.blogs.nytimes.com/2011/02/08/page-numbers-for-kindle-books-an-imperfect-solution/
• Bible

• Koran

• Bhagavad-Gita and
Ramayana 

• other sacred texts

!
• A “structural index”
Chapter and Verse
NextThen Now
Doing it as best we can...?
• Hall, Dorothy K., George A. Riggs, and Vincent V. Salomonson. 2007, updated
daily. MODIS/Aqua Snow Cover Daily L3 Global 500m Grid V005.3, Oct. 2007-
Sep. 2008, 84°N, 75°W; 44°N, 10°W. Boulder, Colorado USA: National Snow
and Ice Data Center. Data set accessed 2008-11-01 at http://dx.doi.org/
10.1234/xxx.
• Hall, Dorothy K., George A. Riggs, and Vincent V. Salomonson. 2007, updated
daily. MODIS/Aqua Snow Cover Daily L3 Global 500m Grid V005.3, Oct. 2007-
Sep. 2008, Tiles (15,2;16,0;16,1;16,2;17,0;17,1). Boulder, Colorado USA:
National Snow and Ice Data Center. Data set accessed 2008-11-01 at http://
dx.doi.org/10.1234/xxx.
• Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003.
CLPX-Ground: ISA snow depth transects and related measurements, Version
2.0, shapefiles. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National
Snow and Ice Data Center. Data set accessed 2008-05-14 at http://dx.doi.org/
10.5060/D4H41PBP.
NextThen Now
“Just-in-time” citation
Approach being developed by an RDA Working Group
• Ensure data is time-stamped and versioned
• Assign PID to time-stamped query/selection expression
!
https://rd-alliance.org/group/data-citation-wg.html
Next
Content negotiation—the details of identifier
resolution
So,	
  you	
  have	
  a	
  DOI,	
  or	
  a	
  handle,	
  …	
  
• Resolution	
  service	
  or	
  URI?	
  
• Landing	
  pages	
  (for	
  metadata,	
  citation	
  
recommendations,	
  …)	
  
• Content-­‐negotiation	
  (conneg)	
  for	
  embedding	
  in	
  
– Other	
  pages	
  
– Applications	
  
– Ingest,	
  object	
  type	
  repositories,	
  communities
“Landing Pages” For humans and machines
Landing	
  page	
  –	
  a	
  short	
  form
	
  http://data.rpi.edu/repository/handle/10833/24
Long	
  form
http://data.rpi.edu/repository/handle/10833/24?show=full
Content	
  Negotiation—Conneg
• Many	
  examples,	
  but	
  what	
  follows	
  is	
  ~	
  from:	
  
http://www.crosscite.org/cn/	
  
• What	
  is	
  it?	
  
– Es	
  ce	
  que	
  vous	
  parlez	
  Français?	
  
– Do	
  you	
  speak	
  html	
  or	
  JSON	
  or	
  RDF?	
  
• For	
  embedding	
  in…	
  
– Other	
  pages	
  
– Applications	
  
– Ingest,	
  object	
  type	
  repositories,	
  communities
Conneg
Supported	
  content	
  types..
Metadata
• Title
• Author
• Author Email
• Licence
• Subject
• Keyword
• Data Type
Dataset
CDF
DCO Object Deposit DCO Research Network
DCO-ID Request DCO-ID Request
Share
Knowledge
Join
Network
Allocate a universal accessible DCO-ID
Register Metadata
Upload Raw Data
DCO Object Registration
and Deposit
DCO Research
Community Network
Further	
  integration..
Get involved!
• RDA Working Group on citing dynamic data.
• https://rd-alliance.org/group/data-citation-wg.html
• RDA WG on identifier “types” and PID Interest Group
• https://rd-alliance.org/working-groups/pid-information-types-wg.html
• https://rd-alliance.org/internal-groups/pid-interest-group.html
• ESIP Preservation and Stewardship Committee
• http://wiki.esipfed.org/index.php/Preservation_and_Stewardship
• Implementation Team for the Joint Citation Principles
• http://www.force11.org/node/4849
Update of 2011 Talk
• Purpose of Data Citation—evolving into more specific concerns (slowly)
• How it’s currently done—consensus on needs and approach, but no
substantial progress on implementation
• Basic citation form and content—basics are solid and generally consistent
• Identifiers and locators—consensus emerging, now looking at identifiers for
everything and what to resolve.
• Micro-citation—good technical progress hindered by social implementation
and concept of citation vs. linking.
Overall Summary
• Use many metaphors and be cautious of them all.
• We know how to cite data (for the most part), we just need to make it a cultural
practice. Just do it.
• Separate concerns
• Reference/linking is different than credit
• Location and identity are different but can be the same.
• Data citation is not literature citation. Micro-citation is more important and the
citation needs to be machine interpretable and operable.
• Everything in data stewardship needs an identifier, but…

due diligence is the underlying requirement.
• These sort of socio-technical problems require collaborative, networked
solutions. Participate.

Contenu connexe

Tendances

W13 libr250 databases___sources1
W13 libr250 databases___sources1W13 libr250 databases___sources1
W13 libr250 databases___sources1lterrones
 
W13 libr250 databases_scholarlyvs_popular
W13 libr250 databases_scholarlyvs_popularW13 libr250 databases_scholarlyvs_popular
W13 libr250 databases_scholarlyvs_popularlterrones
 
W13 libr250 evaluating and citing websites1
W13 libr250 evaluating and citing websites1W13 libr250 evaluating and citing websites1
W13 libr250 evaluating and citing websites1lterrones
 
W13 libr250 do_iv_urlciations
W13 libr250 do_iv_urlciationsW13 libr250 do_iv_urlciations
W13 libr250 do_iv_urlciationslterrones
 
Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010Juan Sequeda
 
UCSD Library Presentation 10182010
UCSD Library Presentation 10182010UCSD Library Presentation 10182010
UCSD Library Presentation 10182010Philip Bourne
 
Year of the Monkey: Lessons from the first year of SearchMonkey
Year of the Monkey: Lessons from the first year of SearchMonkeyYear of the Monkey: Lessons from the first year of SearchMonkey
Year of the Monkey: Lessons from the first year of SearchMonkeyPeter Mika
 
Publishing data on the Semantic Web
Publishing data on the Semantic WebPublishing data on the Semantic Web
Publishing data on the Semantic WebPeter Mika
 
Semantic Search Summer School2009
Semantic Search Summer School2009Semantic Search Summer School2009
Semantic Search Summer School2009Peter Mika
 
Ws anspaugh fall2014_final
Ws anspaugh fall2014_finalWs anspaugh fall2014_final
Ws anspaugh fall2014_finalk-baril
 
WSPuttFall2014
WSPuttFall2014WSPuttFall2014
WSPuttFall2014k-kobiela
 
W13 libr250 why_keywords
W13 libr250 why_keywordsW13 libr250 why_keywords
W13 libr250 why_keywordslterrones
 
Hide the Stack: Toward Usable Linked Data
Hide the Stack:Toward Usable Linked DataHide the Stack:Toward Usable Linked Data
Hide the Stack: Toward Usable Linked Dataaba-sah
 
ICPSR Data Exploration Tools
ICPSR Data Exploration ToolsICPSR Data Exploration Tools
ICPSR Data Exploration ToolsICPSR
 
Honors Writing Seminar - Surface
Honors Writing Seminar - SurfaceHonors Writing Seminar - Surface
Honors Writing Seminar - SurfaceJenny Donley
 
Honors Writing Seminar 2015
Honors Writing Seminar 2015Honors Writing Seminar 2015
Honors Writing Seminar 2015Jenny Donley
 
Finding and Managing Information
Finding and Managing InformationFinding and Managing Information
Finding and Managing InformationNeny Isharyanti
 
The Semantic Web in Digital Libraries: A Literature Review
The Semantic Web in Digital Libraries: A Literature ReviewThe Semantic Web in Digital Libraries: A Literature Review
The Semantic Web in Digital Libraries: A Literature Reviewsstose
 
Documentation and Metdata - VA DM Bootcamp
Documentation and Metdata - VA DM BootcampDocumentation and Metdata - VA DM Bootcamp
Documentation and Metdata - VA DM BootcampSherry Lake
 

Tendances (20)

W13 libr250 databases___sources1
W13 libr250 databases___sources1W13 libr250 databases___sources1
W13 libr250 databases___sources1
 
W13 libr250 databases_scholarlyvs_popular
W13 libr250 databases_scholarlyvs_popularW13 libr250 databases_scholarlyvs_popular
W13 libr250 databases_scholarlyvs_popular
 
W13 libr250 evaluating and citing websites1
W13 libr250 evaluating and citing websites1W13 libr250 evaluating and citing websites1
W13 libr250 evaluating and citing websites1
 
W13 libr250 do_iv_urlciations
W13 libr250 do_iv_urlciationsW13 libr250 do_iv_urlciations
W13 libr250 do_iv_urlciations
 
Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010
 
UCSD Library Presentation 10182010
UCSD Library Presentation 10182010UCSD Library Presentation 10182010
UCSD Library Presentation 10182010
 
Year of the Monkey: Lessons from the first year of SearchMonkey
Year of the Monkey: Lessons from the first year of SearchMonkeyYear of the Monkey: Lessons from the first year of SearchMonkey
Year of the Monkey: Lessons from the first year of SearchMonkey
 
Publishing data on the Semantic Web
Publishing data on the Semantic WebPublishing data on the Semantic Web
Publishing data on the Semantic Web
 
Semantic Search Summer School2009
Semantic Search Summer School2009Semantic Search Summer School2009
Semantic Search Summer School2009
 
Ws anspaugh fall2014_final
Ws anspaugh fall2014_finalWs anspaugh fall2014_final
Ws anspaugh fall2014_final
 
WSPuttFall2014
WSPuttFall2014WSPuttFall2014
WSPuttFall2014
 
W13 libr250 why_keywords
W13 libr250 why_keywordsW13 libr250 why_keywords
W13 libr250 why_keywords
 
Hide the Stack: Toward Usable Linked Data
Hide the Stack:Toward Usable Linked DataHide the Stack:Toward Usable Linked Data
Hide the Stack: Toward Usable Linked Data
 
ICPSR Data Exploration Tools
ICPSR Data Exploration ToolsICPSR Data Exploration Tools
ICPSR Data Exploration Tools
 
Honors Writing Seminar - Surface
Honors Writing Seminar - SurfaceHonors Writing Seminar - Surface
Honors Writing Seminar - Surface
 
Honors Writing Seminar 2015
Honors Writing Seminar 2015Honors Writing Seminar 2015
Honors Writing Seminar 2015
 
Finding and Managing Information
Finding and Managing InformationFinding and Managing Information
Finding and Managing Information
 
Chap1
Chap1Chap1
Chap1
 
The Semantic Web in Digital Libraries: A Literature Review
The Semantic Web in Digital Libraries: A Literature ReviewThe Semantic Web in Digital Libraries: A Literature Review
The Semantic Web in Digital Libraries: A Literature Review
 
Documentation and Metdata - VA DM Bootcamp
Documentation and Metdata - VA DM BootcampDocumentation and Metdata - VA DM Bootcamp
Documentation and Metdata - VA DM Bootcamp
 

En vedette

Parsons scidatacon2016
Parsons scidatacon2016Parsons scidatacon2016
Parsons scidatacon2016Mark Parsons
 
CoBRA guideline : a tool to facilitate sharing, reuse, and reproducibility of...
CoBRA guideline : a tool to facilitate sharing, reuse, and reproducibility of...CoBRA guideline : a tool to facilitate sharing, reuse, and reproducibility of...
CoBRA guideline : a tool to facilitate sharing, reuse, and reproducibility of...Research Data Alliance
 
Data Policy for Open Science
Data Policy for Open ScienceData Policy for Open Science
Data Policy for Open ScienceMark Parsons
 
Why Data Citation Currently Misses the Point
Why Data Citation Currently Misses the PointWhy Data Citation Currently Misses the Point
Why Data Citation Currently Misses the PointMark Parsons
 
Open Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing WorkOpen Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing WorkResearch Data Alliance
 
Cultural Heritage: when data are much worst than one can believe
Cultural Heritage: when data are much worst than one can believe Cultural Heritage: when data are much worst than one can believe
Cultural Heritage: when data are much worst than one can believe Research Data Alliance
 
Stories of “Glocality"—Nations in a Global Infrastructure
Stories of “Glocality"—Nations in a Global InfrastructureStories of “Glocality"—Nations in a Global Infrastructure
Stories of “Glocality"—Nations in a Global InfrastructureResearch Data Alliance
 
Removing Barriers to Data Sharing: the Research Data Alliance
Removing Barriers to Data Sharing: the Research Data AllianceRemoving Barriers to Data Sharing: the Research Data Alliance
Removing Barriers to Data Sharing: the Research Data AllianceResearch Data Alliance
 
Efficient and effective: can we combine both to realize high-value, open, sca...
Efficient and effective: can we combine both to realize high-value, open, sca...Efficient and effective: can we combine both to realize high-value, open, sca...
Efficient and effective: can we combine both to realize high-value, open, sca...Research Data Alliance
 
SoBigData. European Research Infrastructure for Big Data and Social Mining
SoBigData. European Research Infrastructure for Big Data and Social MiningSoBigData. European Research Infrastructure for Big Data and Social Mining
SoBigData. European Research Infrastructure for Big Data and Social MiningResearch Data Alliance
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation Research Data Alliance
 
Rda in a_nutshell_february_2017_updated
Rda in a_nutshell_february_2017_updatedRda in a_nutshell_february_2017_updated
Rda in a_nutshell_february_2017_updatedResearch Data Alliance
 

En vedette (17)

Parsons scidatacon2016
Parsons scidatacon2016Parsons scidatacon2016
Parsons scidatacon2016
 
CoBRA guideline : a tool to facilitate sharing, reuse, and reproducibility of...
CoBRA guideline : a tool to facilitate sharing, reuse, and reproducibility of...CoBRA guideline : a tool to facilitate sharing, reuse, and reproducibility of...
CoBRA guideline : a tool to facilitate sharing, reuse, and reproducibility of...
 
Data Policy for Open Science
Data Policy for Open ScienceData Policy for Open Science
Data Policy for Open Science
 
Why Data Citation Currently Misses the Point
Why Data Citation Currently Misses the PointWhy Data Citation Currently Misses the Point
Why Data Citation Currently Misses the Point
 
Open Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing WorkOpen Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing Work
 
Cultural Heritage: when data are much worst than one can believe
Cultural Heritage: when data are much worst than one can believe Cultural Heritage: when data are much worst than one can believe
Cultural Heritage: when data are much worst than one can believe
 
Stories of “Glocality"—Nations in a Global Infrastructure
Stories of “Glocality"—Nations in a Global InfrastructureStories of “Glocality"—Nations in a Global Infrastructure
Stories of “Glocality"—Nations in a Global Infrastructure
 
Research Data Alliance Overview
Research Data Alliance OverviewResearch Data Alliance Overview
Research Data Alliance Overview
 
Removing Barriers to Data Sharing: the Research Data Alliance
Removing Barriers to Data Sharing: the Research Data AllianceRemoving Barriers to Data Sharing: the Research Data Alliance
Removing Barriers to Data Sharing: the Research Data Alliance
 
Efficient and effective: can we combine both to realize high-value, open, sca...
Efficient and effective: can we combine both to realize high-value, open, sca...Efficient and effective: can we combine both to realize high-value, open, sca...
Efficient and effective: can we combine both to realize high-value, open, sca...
 
Research Data Alliance Overview
Research Data Alliance OverviewResearch Data Alliance Overview
Research Data Alliance Overview
 
SoBigData. European Research Infrastructure for Big Data and Social Mining
SoBigData. European Research Infrastructure for Big Data and Social MiningSoBigData. European Research Infrastructure for Big Data and Social Mining
SoBigData. European Research Infrastructure for Big Data and Social Mining
 
"Cool" metadata for FAIR data
"Cool" metadata for FAIR data"Cool" metadata for FAIR data
"Cool" metadata for FAIR data
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
 
Whole brain optical imaging
Whole brain optical imagingWhole brain optical imaging
Whole brain optical imaging
 
Rda in a_nutshell_february_2017_updated
Rda in a_nutshell_february_2017_updatedRda in a_nutshell_february_2017_updated
Rda in a_nutshell_february_2017_updated
 
Data curator: who is s/he?
Data curator: who is s/he?Data curator: who is s/he?
Data curator: who is s/he?
 

Similaire à Parsons citation geodata2014

DataONE Education Module 08: Data Citation
DataONE Education Module 08: Data CitationDataONE Education Module 08: Data Citation
DataONE Education Module 08: Data CitationDataONE
 
Data Citation and DOIs
Data Citation and DOIsData Citation and DOIs
Data Citation and DOIsARDC
 
Landing Pages - Joe Hourcle - RDAP12
Landing Pages - Joe Hourcle - RDAP12Landing Pages - Joe Hourcle - RDAP12
Landing Pages - Joe Hourcle - RDAP12ASIS&T
 
RDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOneRDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOneResearch Data Alliance
 
How and Why to Share Your Data
How and Why to Share Your DataHow and Why to Share Your Data
How and Why to Share Your Datakfear
 
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...SC CTSI at USC and CHLA
 
IASSIST40: Data management & curation workshop
IASSIST40: Data management & curation workshopIASSIST40: Data management & curation workshop
IASSIST40: Data management & curation workshopRobin Rice
 
Data Citation: A Critical Role for Publishers
Data Citation: A Critical Role for PublishersData Citation: A Critical Role for Publishers
Data Citation: A Critical Role for PublishersBrian Hole
 
V.3 poster current citations and a future with linked data
V.3 poster current citations and a future with linked dataV.3 poster current citations and a future with linked data
V.3 poster current citations and a future with linked dataIliadis Dimitrios
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...Carole Goble
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data LocallyErin D. Foster
 
Linked Open Data in Romania
Linked Open Data in RomaniaLinked Open Data in Romania
Linked Open Data in RomaniaVlad Posea
 
Learning from past infrastructure to embrace friction and create the Research...
Learning from past infrastructure to embrace friction and create the Research...Learning from past infrastructure to embrace friction and create the Research...
Learning from past infrastructure to embrace friction and create the Research...Research Data Alliance
 
Towards Research Engines: Supporting Search Stages in Web Archives (2015)
Towards Research Engines: Supporting Search Stages in Web Archives (2015)Towards Research Engines: Supporting Search Stages in Web Archives (2015)
Towards Research Engines: Supporting Search Stages in Web Archives (2015)TimelessFuture
 
Metadata, Open Access and More: Crossref presentation
Metadata, Open Access and More: Crossref presentationMetadata, Open Access and More: Crossref presentation
Metadata, Open Access and More: Crossref presentationCrossref
 
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014Susanna-Assunta Sansone
 

Similaire à Parsons citation geodata2014 (20)

DataONE Education Module 08: Data Citation
DataONE Education Module 08: Data CitationDataONE Education Module 08: Data Citation
DataONE Education Module 08: Data Citation
 
Data Citation and DOIs
Data Citation and DOIsData Citation and DOIs
Data Citation and DOIs
 
Landing Pages - Joe Hourcle - RDAP12
Landing Pages - Joe Hourcle - RDAP12Landing Pages - Joe Hourcle - RDAP12
Landing Pages - Joe Hourcle - RDAP12
 
data citation
data citationdata citation
data citation
 
RDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOneRDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOne
 
How and Why to Share Your Data
How and Why to Share Your DataHow and Why to Share Your Data
How and Why to Share Your Data
 
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
 
IASSIST40: Data management & curation workshop
IASSIST40: Data management & curation workshopIASSIST40: Data management & curation workshop
IASSIST40: Data management & curation workshop
 
Data Citation: A Critical Role for Publishers
Data Citation: A Critical Role for PublishersData Citation: A Critical Role for Publishers
Data Citation: A Critical Role for Publishers
 
V.3 poster current citations and a future with linked data
V.3 poster current citations and a future with linked dataV.3 poster current citations and a future with linked data
V.3 poster current citations and a future with linked data
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data Locally
 
Linked Open Data in Romania
Linked Open Data in RomaniaLinked Open Data in Romania
Linked Open Data in Romania
 
Learning from past infrastructure to embrace friction and create the Research...
Learning from past infrastructure to embrace friction and create the Research...Learning from past infrastructure to embrace friction and create the Research...
Learning from past infrastructure to embrace friction and create the Research...
 
L07 metadata
L07 metadataL07 metadata
L07 metadata
 
Metadata
MetadataMetadata
Metadata
 
Towards Research Engines: Supporting Search Stages in Web Archives (2015)
Towards Research Engines: Supporting Search Stages in Web Archives (2015)Towards Research Engines: Supporting Search Stages in Web Archives (2015)
Towards Research Engines: Supporting Search Stages in Web Archives (2015)
 
Metadata, Open Access and More: Crossref presentation
Metadata, Open Access and More: Crossref presentationMetadata, Open Access and More: Crossref presentation
Metadata, Open Access and More: Crossref presentation
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
 

Dernier

ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 

Dernier (17)

ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 

Parsons citation geodata2014

  • 1. Unless otherwise noted, the slides in this presentation are licensed by Mark A. Parsons under a Creative Commons Attribution-Share Alike 3.0 License Data Citation Then and Now Mark A. Parsons with help from Ruth Duerr and Peter Fox ! ! ! 17 June 2014 GeoData 2014 Boulder, CO
  • 2. The Evolution of Data Citation • Data was part of the literature—tables, maps, monographs, etc.—and we cited accordingly. (Some data were still hoarded). • Digital data becomes the norm. It’s messier and we forget how to do cite it routinely. • Initial efforts to define digital data citation in the 90s - early 00s • Right idea, little traction • Partially conflated with the citing URLs issue • A blossoming in the mid-late 00s. • Multiple disciplines start developing approaches and guidelines • DOI a big driver, especially for DataCite, but other identifiers used too (Handles, LSIDs, UNFs, ARKs and good ol’ URI/Ls) • A slightly competitive atmosphere • GeoData 2011 a milestone for ESIP Citation Guidelines—most sophisticated to date. (http://dx.doi.org/10.7269/P34F1NNJ) Then
  • 3. • Now a consensus phase • Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data. 2013.
 http://dx.doi.org/10.2481/dsj.OSOM13-043 • Draft Global Joint Declaration of Data Citation Principles. 2013.
 http://www.force11.org/datacitation The Evolution of Data Citation Now
  • 4. • Implementation phase just begun • ESIP Guidelines adopted by a variety of NASA and NOAA data centers and internationally by GEOSS. • AGU Publishing Committee is developing author guidelines based on ESIP. • Other disciplines, notably social science, has relationships with publishers. • Several data centers partnering with publishers, e.g. Elsevier’s “article of the future”. • New PLOS data policy. • Joint engagement activity following on the joint principles. • It happens locally and requires culture change so debates will continue. The Evolution of Data Citation—Next Next
  • 5. Outline of 2011 Talk • Purpose of Data Citation • How it’s currently done • Basic citation form and content • Identifiers and locators • Microcitation
  • 6. Purpose of data citation
  • 7. Purpose of Data Citation • Credit for data creators and stewards • Track impact of data set • Accountability for creators and stewards • Aid reproducibility through direct, unambiguous connection to the precise data used ! • A location/reference mechanism not a discovery mechanism per se. 7 Then
  • 8. Purpose of Data Citation • Aid scientific reproducibility through direct, unambiguous connection to the precise data used. • Credit for data authors and stewards • Accountability for creators and stewards • Track impact of data set • Help identify data use (e.g., trackbacks) • Data authors can verify how their data are being used. • Users can better understand the application of the data. ! • A locator/reference mechanism not a discovery mechanism per se Now
  • 9. Joint  Declaration  of  Data  Citation  Principles     (Overview)   The  Noble  Eight-­‐Fold  Path  to  Citing  Data 1. Importance   2. Credit  and  attribution     3. Evidence   4. Unique  Identification     5. Access   6. Persistence     7. Specificity  and  verifiability     8. Interoperability  and  flexibility Principles  are  supplemented  with  a  glossary,  references  and  examples   http://force11.org/datacitation
  • 10. Purpose of Data Citation A locator/reference/linking mechanism! This helps • Aid scientific reproducibility through direct, unambiguous connection to the precise data used • Identify and track data use ! It contributes but is NOT central to • Credit for data authors and stewards • Accountability for creators and stewards • Tracking impact of data set Next
  • 11. Is data publication the right metaphor? M. A. Parsons & P. A. Fox Data Science Journal, 2013 http://dx.doi.org/10.2481/dsj.WDS-042 Our ordinary conceptual system, in terms of which we both think and act, is fundamentally metaphorical in nature. —Lakoff and Johnson, 1980 If we hear the same language over and over, we will think more and more in terms of the frames and metaphors activated by that language. —Lakoff, 2008
  • 13. How data citation is currently done • Citation of traditional publication that actually contains the data, e.g. a parameterization value. • Not mentioned, just used, e.g., in tables or figures • Reference to name or source of data in text • URL in text (with variable degrees of specificity) • Citation of related paper (e.g. CRU Temp. records recommend citing two old journal articles which do not contain the actual data or full description of methods) • Citation of actual data set typically using recommended citation given by data center • Citation of data set including a persistent identifier/locator, typically a DOI 13 Then
  • 14. How data citation is currently done • Citation of traditional publication that actually contains the data, e.g. a parameterization value. • Not mentioned, just used, e.g., in tables or figures • Reference to name or source of data in text • URL in text (with variable degrees of specificity) • Citation of related paper (e.g. CRU Temp. records recommend citing two old journal articles which do not contain the actual data or full description of methods.) • Citation of actual data set typically using recommended citation given by data center • Citation of data set including a persistent identifier/locator, typically a DOI Now
  • 15. 0 100 200 300 400 500 600 2002 2003 2004 2005 2006 2007 2008 2009 “MODIS Snow Cover Data” in Google Scholar 1.3% 1.0% 0.7% 0.7% 1.3% 0.9% 1.3% 1.7% Formal Citation Total Entries Then
  • 16. How data citation is currently done NowNext • Implementation phase just begun • ESIP Guidelines adopted by a variety of NASA and NOAA data centers and internationally by GEOSS. • AGU Publishing Committee is developing author guidelines based on ESIP. • Other disciplines, notably social science, has relationships with publishers. • Several data centers partnering with publishers, e.g. Elsevier’s “article of the future”. • New PLOS data policy. • Joint engagement activity following on the joint principles. • It happens locally and requires culture change so debates will continue.
  • 18. Basic data citation form and content Per DataCite: Creator. PublicationYear. Title. [Version]. Publisher. [ResourceType]. Identifier. ! Per ESIP: Author(s). ReleaseDate. Title, [version]. [editor(s)]. Archive and/or Distributor. Locator. [date/time accessed]. [subset used]. ! NextThen Now
  • 19. An Example Citation Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements ver. 2.0. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://dx.doi.org/10.5060/D4H41PBP. NextThen Now
  • 21. ID Scheme Data Set Item Data Set Item Data Set Item Data Set Item URL/N/I PURL XRI Handle DOI ARK LSID OID UUID An assessment of identification schemes for digital Earth science data Unique Identifier Unique Locator Citable Locator Scientifically Unique ID Good Fair Poor Adapted from Duerr, R. E., et al.. 2011. On the utility of identification schemes for digital Earth science data: An assessment and recommendations. Earth Science Informatics. 4:139-160.
 http://dx.doi.org/10.1007/s12145-011-0083-6 Then Now
  • 22. ID Scheme Data Set Item Data Set Item Data Set Item Data Set Item URL/N/I PURL XRI Handle DOI ARK LSID OID UUID An assessment of identification schemes for digital Earth science data Unique Identifier Unique Locator Citable Locator Scientifically Unique ID Locators Identifiers Good Fair Poor Adapted from Duerr, R. E., et al.. 2011. On the utility of identification schemes for digital Earth science data: An assessment and recommendations. Earth Science Informatics. 4:139-160.
 http://dx.doi.org/10.1007/s12145-011-0083-6 Then Now
  • 23. • Everything needs an identifier. Most things need locators. Intellectual content needs citation. • Different versions of things may need different identifiers/locators • Subsets may need identifiers or clear reference to sub-setting process (e.g. space and time). • Different representations (conceptual models) may need different identifiers/ locators. E.g Maps. What needs an identifier/locator? What needs to be cited?
  • 24. Why the DOI? • Not perfect but well understood by publishers • Thomson Reuters collaborating with DataCite to get data citations in their index. ! But... • What is the citable unit? • How do we handle different versions? • What about “retired” data? • When is a DOI assigned? Then Now
  • 25. Versioning approach recommended by DCC • “As DOIs are used to cite data as evidence, the dataset to which a DOI points should also remain unchanged, with any new version receiving a new DOI.” • “There are two possible approaches the data repository can take:
 time slices and snapshots.” Now
  • 26. When to assign a DOI? • First principle: Data should be citable as soon as they are available for use by anyone other than the original authors. • But... • Most people (falsely) believe that a DOI implies permanence so how do we cite transient data? • Some believe that a DOI should not be assigned until the data has undergone some level of review (e.g. Lawrence et al. 2010). So how do we cite data used before the review? • Data are often used by friends and collaborators in a raw, “unpublished” state. Should this use be cited with a DOI? • Near real time or preliminary data may only be available for a short un- curated period. There may not be a good match between the submission package and the distribution package. What gets the DOI? When? Now
  • 27. Versioning and locators: some suggestions from NSIDC • major version.minor version.[archive version] • Individual stewards need to determine which are major vs. minor versions and describe the nature and file/record range of every version. • Assign DOIs to major versions. • Old DOIs should be maintained and point to some appropriate page that explains what happened to the old data if they were not archived. • A new major version leads to the creation of a new collection-level metadata record that is distributed to appropriate registries. The older metadata record should remain with a pointer to the new version and with explanation of the status of the older version data. • Major and minor version (after the first version) should be exposed with the data set title and recommended citation. • Minor versions should be explained in documentation, ideally in file-level metadata. • Applying UUIDs or ARKs to individual files upon ingest aids in tracking minor versions and historical citations.
  • 29. Basic data citation form and content ! Author(s). ReleaseDate. Title, Version. [editor(s)]. Archive and/or Distributor. Locator. [date/time accessed]. [subset used]. ! ! ! The best solution is to have unique identifiers or query IDs for subsets, but that won’t be available for most data sets for a long time, so we need alternative solutions... NextThen Now
  • 30. ! February 8, 2011, 4:45 PM Page Numbers for Kindle Books an Imperfect Solution Amazon’s Kindle will have page numbers that correspond to real books and locations by passage. Neither solution is perfect—‘locations’ or page numbers—because the problem is unsolvable. The best we can hope for is a choice... http://pogue.blogs.nytimes.com/2011/02/08/page-numbers-for-kindle-books-an-imperfect-solution/
  • 31. • Bible • Koran • Bhagavad-Gita and Ramayana • other sacred texts ! • A “structural index” Chapter and Verse NextThen Now
  • 32. Doing it as best we can...? • Hall, Dorothy K., George A. Riggs, and Vincent V. Salomonson. 2007, updated daily. MODIS/Aqua Snow Cover Daily L3 Global 500m Grid V005.3, Oct. 2007- Sep. 2008, 84°N, 75°W; 44°N, 10°W. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-11-01 at http://dx.doi.org/ 10.1234/xxx. • Hall, Dorothy K., George A. Riggs, and Vincent V. Salomonson. 2007, updated daily. MODIS/Aqua Snow Cover Daily L3 Global 500m Grid V005.3, Oct. 2007- Sep. 2008, Tiles (15,2;16,0;16,1;16,2;17,0;17,1). Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-11-01 at http:// dx.doi.org/10.1234/xxx. • Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements, Version 2.0, shapefiles. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://dx.doi.org/ 10.5060/D4H41PBP. NextThen Now
  • 33. “Just-in-time” citation Approach being developed by an RDA Working Group • Ensure data is time-stamped and versioned • Assign PID to time-stamped query/selection expression ! https://rd-alliance.org/group/data-citation-wg.html Next
  • 34. Content negotiation—the details of identifier resolution
  • 35. So,  you  have  a  DOI,  or  a  handle,  …   • Resolution  service  or  URI?   • Landing  pages  (for  metadata,  citation   recommendations,  …)   • Content-­‐negotiation  (conneg)  for  embedding  in   – Other  pages   – Applications   – Ingest,  object  type  repositories,  communities
  • 36. “Landing Pages” For humans and machines
  • 37. Landing  page  –  a  short  form  http://data.rpi.edu/repository/handle/10833/24
  • 39. Content  Negotiation—Conneg • Many  examples,  but  what  follows  is  ~  from:   http://www.crosscite.org/cn/   • What  is  it?   – Es  ce  que  vous  parlez  Français?   – Do  you  speak  html  or  JSON  or  RDF?   • For  embedding  in…   – Other  pages   – Applications   – Ingest,  object  type  repositories,  communities
  • 42. Metadata • Title • Author • Author Email • Licence • Subject • Keyword • Data Type Dataset CDF DCO Object Deposit DCO Research Network DCO-ID Request DCO-ID Request Share Knowledge Join Network Allocate a universal accessible DCO-ID Register Metadata Upload Raw Data DCO Object Registration and Deposit DCO Research Community Network
  • 44. Get involved! • RDA Working Group on citing dynamic data. • https://rd-alliance.org/group/data-citation-wg.html • RDA WG on identifier “types” and PID Interest Group • https://rd-alliance.org/working-groups/pid-information-types-wg.html • https://rd-alliance.org/internal-groups/pid-interest-group.html • ESIP Preservation and Stewardship Committee • http://wiki.esipfed.org/index.php/Preservation_and_Stewardship • Implementation Team for the Joint Citation Principles • http://www.force11.org/node/4849
  • 45. Update of 2011 Talk • Purpose of Data Citation—evolving into more specific concerns (slowly) • How it’s currently done—consensus on needs and approach, but no substantial progress on implementation • Basic citation form and content—basics are solid and generally consistent • Identifiers and locators—consensus emerging, now looking at identifiers for everything and what to resolve. • Micro-citation—good technical progress hindered by social implementation and concept of citation vs. linking.
  • 46. Overall Summary • Use many metaphors and be cautious of them all. • We know how to cite data (for the most part), we just need to make it a cultural practice. Just do it. • Separate concerns • Reference/linking is different than credit • Location and identity are different but can be the same. • Data citation is not literature citation. Micro-citation is more important and the citation needs to be machine interpretable and operable. • Everything in data stewardship needs an identifier, but…
 due diligence is the underlying requirement. • These sort of socio-technical problems require collaborative, networked solutions. Participate.