SlideShare une entreprise Scribd logo
1  sur  24
Prepared for
MIT Libraries Informatics Program Brown Bag Talk
August2013
Emerging Data Citation Infrastructure
Dr. Micah Altman
<escience@mit.edu>
Director of Research, MIT Libraries
DISCLAIMER
These opinions are my own, they are not the opinions
of MIT, Brookings, any of the project funders, nor (with
the exception of co-authored previously published
work) my collaborators
Secondary disclaimer:
“It’s tough to make predictions, especially about the
future!”
-- Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, Winston Churchill,
Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert Einstein, Enrico Fermi,
Edgar R. Fiedler, Bob Fourer, Sam Goldwyn, Allan Lamport, Groucho Marx, Dan Quayle,
George Bernard Shaw, Casey Stengel, Will Rogers, M. Taub, Mark Twain, Kerr L. White,
etc.
Emerging Data Citation Practices
Collaborators & Co-Conspirators
• Merce Crosas, IQSS, Harvard U.
• Data-PASS Steering Committee
<data-pass.org>
• CODATA-ICSTI Task Group on Data Citation
Standards and Practices
<www.codata.org/taskgroups/TGdatacitation/>
• Research Support
– Thanks to the National Academies BRDI
Sponsors: Department of Energy (DOE). Institute
of Museum and Library Services (IMLS), The
Library of Congress (LOC). Microsoft Research.
National Institute of Standards and Technology
(NIST), National Institutes of Health
(NIH),National Oceanic and Atmospheric
Administration (NOAA), National Science
Foundation (NSF). U.S. Geological Survey
(USGS) & the Massachusetts Institute of
Technology. Emerging Data Citation Practices
Related Work
• CODATA-ICSTI Task Group on Data Citation Standards and Practices, 2013 , Out
of Cite, “Out of Mind: The Current State of Practice, Policy, and Technology for
the Citation of Data”, Data Science Journal. Forthcoming.
• P. F. Uhlir (Ed.), Developing Data Attribution and Citation Practices and
Standards Report from an International Workshop (p. Forthcoming).
National Academies Press.
• M. Altman,2008, "A Fingerprint Method for Verification of Scientific
Data" in, Advances in Systems, Computing Sciences and Software
Engineering, (Proceedings of the International Conference on Systems,
Computing Sciences and Software Engineering 2007) , Springer Verlag.
• Altman, M., & King, G. 2007. A Proposed Standard for the Scholarly
Citation of Quantitative Data. DLib Magazine, 13(3/4)
Most reprints available from:
informatics.mit.edu
Emerging Data Citation Practices
This Talk
• What is data citation? Why Cite?
• Emerging Principles
• On the horizon
Emerging Data Citation Practices
What’s Wrong with this Picture?
“To test Benet’s (1998) theory of “politically-induced
intelligence” (Benet 1999, pg 8), use a hierarchical
corrected contingency model (see Altman & Smith 2010;
Edgeworth 1863). We apply this model to a snowball
sample (Glass 1973) of eligible voters14, to which the
standard Stanford-Binet (Stanford & Binet 1766) has
been applied. Our results show that adoption of
Pastafarrianism can be expected to yield an increase
mean intelligence by 10.3 points. ”
Emerging Data Citation Practices
13 We thank Jon Sample, Director of the institute of the Pastaffarian
institute for supplying this dataset, which is available upon request.
“How much slower would scientific progress be if
the near universal standards for scholarly citation of
articles and books had never been
developed?Suppose shortly after publication only
some printed works could be reliably found by
other scholars; or if researchers were only
permitted to read an article if they first committed
not to criticize it, or were required to coauthor with
the original author any work that built on the
original … *If+ printed works existed in different
libraries under different titles; if researchers
routinely redistributed modified versions of other
authors' works without changing the title or author
listed; or if publishing new editions of books meant
that earlier editions were destroyed?...” – Altman &
King 2007
Emerging Data Citation Practices
“Citations to unpublished data and personal
communications cannot be used to support
claims in a published paper”
“All data necessary to understand, assess, and
extend the conclusions of the manuscript must
be available to any reader of Science.”
Ideal
Helping Journals Manage Data
Reality
Helping Journals Manage Data
 Compliance is low even in
best examples of journals
 Checking compliance
manually is tedious, hard
to scale
Attribution
• Cite data as first class work
• Identify contributors to data
Discovery
• Associate a persistent id with a
work
• Locate data via identifier
• Locate data integral to article
• Locate works related to data –
articles, derivatives, sources
Persistence
• Reference exists as long as referring object
• Evidence persists as long as assertions
based on evidence?
• Durability of data transparent?
Access
• Citation provides for mediated
access
• Access to surrogate
• On-line access to object
• Machine understandability
• Long-term human
understandability
Provenance
• Associate work with version of
evidence used
• Verify fixity of information
Principles for Data Citation
Theory: Use Cases Operational Constraints?
-Syntax
-Interoperability
-Technical contexts of use
Reference
• Formal syntax used within the text of a publication to denote a relationship
to an external object. May contain additional information about the
portion/subset of external object implicated. Also known as “in-text
reference”, “pin-cite”.
We applied contingency analysis to the greatest data ever. [Altman 2005]”
Citation
•Formal description of external object, used for location and attribution.
Micah Altman; Karin MacDonald; Michael P. McDonald, 2005, "Computer
Use in Redistricting", hdl:1902.1/AMXGCNKCLU
UNF:3:J0PkMygLPfIyT1E/8xO/EA==
http://id.thedata.org/hdl%3A1902.1%2FAMXGCNKCLU
Citation Metadata
•Metadata that is systematically associated with citation through well-
known public service, catalog, or protocol.
<component_list> <component parent_relation="isPartOf">
<description><b>Figure 1:</b> This is the caption of the first
figure...</description>
<format mime_type="image/jpeg">Web resolution image</format>
External Service
•Applications and services that consume, enhance, aggregrate citation
information.
Practice
Analysis Method
Emerging Data Citation Practices
2 Workshops
(70+ participants)
+ 1 Literature Review
(400+ resources)
+ 2 Task Groups
NAS & Co-Data
(25+ members)
+ 60 Interviews
+ 7 authors
Out of Cite, Out of Mind: The
Current State of Practice, Policy,
and Technology for the Citation of
Data
Principles for Data Citation
- Separate
- scientific principles
- use cases
- requirements
- Distinguish
- syntax
- semantics
- presentation
- Design for
- Ecosystem
- Lifecycle
- Stakeholders
- Implement
- Incremental value for incremental effort
- Think globally, act Locally
Analysis Approach
Principles for Data Citation
1. Status of Data: Data citations should be accorded the same importance in
the scholarly record as the citation of other objects.
2. Attribution: Citations should facilitate giving scholarly credit and legal
attribution to all parties responsible for those data.
3. Persistence: Citations should be as durable as the cited objects.
4. Access: Citations should facilitate access to data by humans and by machines.
5. Discovery: Citations should support the discovery of data and their
documentation.
6. Provenance: Citations should facilitate the establishment of provenance of
data.
7. Granularity: Citations should support the finest grained description
necessary to identify the data.
8. Verifiability: Citations should contain information sufficient to identify the
data unambiguously.
9. Metadata Standards: Citations should employ widely accepted metadata
standards.
10. Flexibility: Citation methods should be sufficiently flexible to accommodate
the variant practices among communities.
Data Citation Principles
Principles for Data Citation
• Author.
– The creator of the data set.
• Title.
– As well as the name of the cited resource itself, this may also include the name of a facility and the titles of the top collection and main
parent subcollection (if any) of which the data set is a part.
• Publisher.
– The organization (or repository) either hosting the data or performing quality assurance.
• Publication date.
– Whichever is later: the date the data set was made available, the date all quality assurance procedures were completed, or the date
the embargo period (if applicable) expired. In other standards an “Access Date” field is used to document the date the data set was
successfully accessed.
• Resource type.
– Examples: “database” or “data set.”
• Edition.
– The level or stage of processing of the data, indicating how raw or refined the data set is.
• Version.
– A number increased when the data changes, as the result of adding more data points or rerunning a derivation process, for example.
• Feature name and URI.
– The name of an ISO 19101:2002 “feature” (e.g., GridSeries, ProfileSeries) and the URI identifying its standard definition, used to pick
out a subset of the data.
• Verifier
– to verify the identity of the content.
• Identifier.
– A resolvable web identifier for the data, according to a persistent scheme. There are several types of persistent identifiers, but the
scheme that is gaining the most traction is the Digital Object Identifier (DOI).
• Location.
– A persistent URL or UNF from which the data set is available. Some identifier schemes provide these via an identifier resolver service.
Citation Metadata Elements
Gaps
• Metadata/Structural
– Granularity
– Version Control
– Microattribution
– Contributor ID
– Facilitation of reuse
• Practice
– Author: use of citations to data
– Journals: ad-hoc syntax and location
– Infrastructure: failure to index citations and references to
data, even when associated with DOI’s
– Tools: support for datasets in reference managers, etc.
Emerging Data Citation Practices
Harmonizing Principles & Requirements
DataCite
• DOI
• Creator
• Title
• Publisher
• Publication
Year
Emerging Data Citation Practices
Digital Curation Center
1. The citation itself must be able to identify
uniquely the object cited, though
different citations might use
different methods or schemes to do
so.
2. It must be able to identify subsets of
the data as well as the whole
dataset.
3.
a. It must provide the reader with
enough information to access the
dataset;
b. indeed, when expressed digitally
it should provide a mechanism for
accessing the dataset through the
Web infrastructure.
4.
a. It must be usable not only by
humans but also by software tools,
so that additional services may be
built using these citations.
b. In particular, there need to be
services that use the citations in
metrics to support the academic
reward system, and services that can
generate complete citations.- See
more at:
Force 11
• Data should be considered citable
products of research.
• Such data should be held in persistent
public repositories.
• If a publication is based on data not
included with the article, those data
should be cited in the publication.
• A data citation in a publication should
resemble a bibliographic citation and be
located in the publication’s reference list.
• Such a data citation should include a
unique persistent identifier (a DataCite
DOI recommended, or other persistent
identifiers already in use within the
community).
• The identifier should resolve to a page
that either provides direct access to the
data or information concerning its
accessibility. Ideally, that landing page
should be machine-actionable to
promote interoperability of the data.
• If the data are available in different
versions, the identifier should provide a
method to access the previous or related
versions.
• Data citation should facilitate attribution
of credit to all contributors
Current Infrastructure
FigShare
• Closed source
• No charge
• Archives data
• Supports DOI’s, ORCIDS
• Preserved in CLOCKSS
Emerging Data Citation Practices
Data Citation Index
• Commercial Service
(Thomson Reuters)
• Indexes many large
repositories
(e.g. Data-PASS)
• Beginning to extract
citations from TR
publications
Dataverse Network
• Open Source System
• Hubs run at Harvard
other universities
• Archives data
• Generates persistent
identifiers (handles, DOI’s
forthcoming)
• Generates resolvable
citations
• Versioned
• Harvard Library Dataverse
now part of DataCite,
Data-PASS preservation
network
DataCite
• DOI registry service
(DOI provider)
• Data DOI metadata
indexing service
(parallel to CrossRef)
• Not-for-profit
membership
Organization
• Collaborating with
ORCID-EU to embed
ORCIDs
Emerging Developments
Emerging Data Citation Practices
Open Journal Data
Publication
• Open source integration
of PKP-OJS and Dataverse
Network
• Uses SWORD
• Integrated data
submission/citation/publi
cation workflow for OJS
open journals
Journal Developments
• NISO Recommendations on
Supplementary Materials
• Sloan/ICPSR Data Citation Project
• Data-PASS Journal Outreach
• New journal types:
– Registered Replication journals
– Null results journals
– Data journals/data papers
Research Questions for Data Citation
and Management
Emerging Data Citation Practices
Research Areas Building on Richer Citations
Emerging Data Citation Practices
Brightening the “Dark Matter” of Scholarly
Communications
Researcher Identifiers: Developments, Opportunities &
Challenges
Research & Node Layout: Kevin Boyack and Dick
Klavans (mapofscience.com); Data: Thompson ISI;
Graphics & Typography: W. Bradford Paley
(didi.com/brad); Commissioned Katy Börner
(scimaps.org)
Seed Magazine, Mar 7, 2007
http://seedmagazine.com/content/article/scientific_m
ethod_relationships_among_scientific_paradigms/
22
• Bibliometric and network analysis are
the “telescopes” for exploring the
structure of science
• Researcher ID’s allow us to see more
connections, more reliably
• Identifiers for datasets, etc. reveal the
“dark matter” of science
Some potential questions:
• Are fields linked through evidence that are
not linked through publications?
• How is the practice of science changing – are
data scientists, statisticians, etc. making
bigger contributions?
• How would be the results of:
– Catalyzing new research collaborations among individuals,
organizations?
– Strengthening support for specific areas of
interdisciplinary research?
– Growing the evidence base in particular areas?
 Questions about how network of contributors and outputs
evolves over time
Additional Bibliography (Selected)
• Starr, J., & Gastl, A. (2011). IsCitedBy: A metadata scheme for datacite. D-Lib
Magazine, 17(½). doi:10.1045/january2011-starr
• Piwowar, H., Vision, T.J. (2013). Data reuse and the open data citation
advantage. PeerJ PrePrints. 1:e1v1. doi: 10.7287/peerj.preprints.1
• Cronin, B. (1984). The citation process: The role and significance of citations
in scientific publication. London, United Kingdom: Taylor Graham.
• Van Leunen, M. (1992). A handbook for scholars. New York, NY: Oxford
University Press.
Emerging Data Citation Practices
Questions?
E-mail: escience@mit.edu
Web: micahaltman.com
Twitter: @drmaltman
Emerging Data Citation Practices

Contenu connexe

Tendances

Going Full Circle: Research Data Management @ University of Pretoria
Going Full Circle: Research Data Management @ University of PretoriaGoing Full Circle: Research Data Management @ University of Pretoria
Going Full Circle: Research Data Management @ University of Pretoria
Johann van Wyk
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
Artificial Intelligence Institute at UofSC
 
Summary of data citation synthesis activity & Review
Summary of data citation synthesis activity & ReviewSummary of data citation synthesis activity & Review
Summary of data citation synthesis activity & Review
Micah Altman
 

Tendances (20)

NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
How to Execute A Research Paper
How to Execute A Research PaperHow to Execute A Research Paper
How to Execute A Research Paper
 
Data Sharing & Data Citation
Data Sharing & Data CitationData Sharing & Data Citation
Data Sharing & Data Citation
 
Executing the Research Paper
Executing the Research PaperExecuting the Research Paper
Executing the Research Paper
 
DataONE Education Module 03: Data Management Planning
DataONE Education Module 03: Data Management PlanningDataONE Education Module 03: Data Management Planning
DataONE Education Module 03: Data Management Planning
 
Dats nih-dccpc-kc7-april2018-prs-uoxf
Dats  nih-dccpc-kc7-april2018-prs-uoxfDats  nih-dccpc-kc7-april2018-prs-uoxf
Dats nih-dccpc-kc7-april2018-prs-uoxf
 
DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?
 
Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access Symposium
 
Data Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim ClarkData Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim Clark
 
DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data Sharing
 
Going Full Circle: Research Data Management @ University of Pretoria
Going Full Circle: Research Data Management @ University of PretoriaGoing Full Circle: Research Data Management @ University of Pretoria
Going Full Circle: Research Data Management @ University of Pretoria
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
McGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and ScalingMcGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and Scaling
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
DataONE Education Module 10: Legal and Policy Issues
DataONE Education Module 10: Legal and Policy IssuesDataONE Education Module 10: Legal and Policy Issues
DataONE Education Module 10: Legal and Policy Issues
 
In Search of a Missing Link in the Data Deluge vs. Data Scarcity Debate
In Search of a Missing Link in the Data Deluge vs. Data Scarcity DebateIn Search of a Missing Link in the Data Deluge vs. Data Scarcity Debate
In Search of a Missing Link in the Data Deluge vs. Data Scarcity Debate
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data Management
 
Summary of data citation synthesis activity & Review
Summary of data citation synthesis activity & ReviewSummary of data citation synthesis activity & Review
Summary of data citation synthesis activity & Review
 
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinaiDataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...
 

En vedette

En vedette (8)

Deliver Perfect Images At Any Size
Deliver Perfect Images At Any SizeDeliver Perfect Images At Any Size
Deliver Perfect Images At Any Size
 
a future where data citation Counts
a future where data citation Countsa future where data citation Counts
a future where data citation Counts
 
RDMRose 2.5 Metadata and data citation
RDMRose 2.5 Metadata and data citationRDMRose 2.5 Metadata and data citation
RDMRose 2.5 Metadata and data citation
 
[4.1] Data Citation and DOI's - Research Data Management - part of PhD course...
[4.1] Data Citation and DOI's - Research Data Management - part of PhD course...[4.1] Data Citation and DOI's - Research Data Management - part of PhD course...
[4.1] Data Citation and DOI's - Research Data Management - part of PhD course...
 
PLoS ONE Piwowar: Sharing Detailed Research Data Is Associated with Increa...
PLoS ONE Piwowar:    Sharing Detailed Research Data Is Associated with Increa...PLoS ONE Piwowar:    Sharing Detailed Research Data Is Associated with Increa...
PLoS ONE Piwowar: Sharing Detailed Research Data Is Associated with Increa...
 
Data Citation Standards and Practices - Paul Uhlir - RDAP12
Data Citation Standards and Practices - Paul Uhlir - RDAP12Data Citation Standards and Practices - Paul Uhlir - RDAP12
Data Citation Standards and Practices - Paul Uhlir - RDAP12
 
DataONE Education Module 09: Analysis and Workflows
DataONE Education Module 09: Analysis and WorkflowsDataONE Education Module 09: Analysis and Workflows
DataONE Education Module 09: Analysis and Workflows
 
Software-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedSoftware-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learned
 

Similaire à Emerging Data Citation Infrastructure

Data management plans
Data management plansData management plans
Data management plans
Brad Houston
 

Similaire à Emerging Data Citation Infrastructure (20)

FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
FAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingFAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data Sharing
 
State of the Art Informatics for Research Reproducibility, Reliability, and...
 State of the Art  Informatics for Research Reproducibility, Reliability, and... State of the Art  Informatics for Research Reproducibility, Reliability, and...
State of the Art Informatics for Research Reproducibility, Reliability, and...
 
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
 
Metadata and Metrics to Support Open Access
Metadata and Metrics to Support Open AccessMetadata and Metrics to Support Open Access
Metadata and Metrics to Support Open Access
 
Crediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCrediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teams
 
Data sharing as part of the research ecosystem
Data sharing as part of the research ecosystemData sharing as part of the research ecosystem
Data sharing as part of the research ecosystem
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
Scientific Data and peer review session at Dryad event, May 2015
Scientific Data and peer review session at Dryad event, May 2015 Scientific Data and peer review session at Dryad event, May 2015
Scientific Data and peer review session at Dryad event, May 2015
 
Recognising data sharing
Recognising data sharingRecognising data sharing
Recognising data sharing
 
Jonathan Breeze, Symplectic
Jonathan Breeze, SymplecticJonathan Breeze, Symplectic
Jonathan Breeze, Symplectic
 
BLC & Digital Science: Jonathan Breeze, Symplectic
BLC & Digital Science: Jonathan Breeze, SymplecticBLC & Digital Science: Jonathan Breeze, Symplectic
BLC & Digital Science: Jonathan Breeze, Symplectic
 
Hattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in MaterialsHattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in Materials
 
Data management plans
Data management plansData management plans
Data management plans
 
Privacy in Research Data Managemnt - Use Cases
Privacy in Research Data Managemnt - Use CasesPrivacy in Research Data Managemnt - Use Cases
Privacy in Research Data Managemnt - Use Cases
 
Best Practices for Sharing Economics Data
Best Practices for Sharing Economics DataBest Practices for Sharing Economics Data
Best Practices for Sharing Economics Data
 
Johnston - How to Curate Research Data
Johnston - How to Curate Research DataJohnston - How to Curate Research Data
Johnston - How to Curate Research Data
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositories
 
Doing research better: The role of meta‐data
Doing research better: The role of meta‐dataDoing research better: The role of meta‐data
Doing research better: The role of meta‐data
 
FORCE11: Creating a data and tools ecosystem
FORCE11:  Creating a data and tools ecosystemFORCE11:  Creating a data and tools ecosystem
FORCE11: Creating a data and tools ecosystem
 

Plus de Micah Altman

SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
Micah Altman
 
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsCreative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Micah Altman
 
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
Micah Altman
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Micah Altman
 

Plus de Micah Altman (20)

Selecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesSelecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategies
 
Well-Being - A Sunset Conversation
Well-Being - A Sunset ConversationWell-Being - A Sunset Conversation
Well-Being - A Sunset Conversation
 
Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...
 
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
 
Well-being A Sunset Conversation
Well-being A Sunset ConversationWell-being A Sunset Conversation
Well-being A Sunset Conversation
 
Can We Fix Peer Review
Can We Fix Peer ReviewCan We Fix Peer Review
Can We Fix Peer Review
 
Academy Owned Peer Review
Academy Owned Peer ReviewAcademy Owned Peer Review
Academy Owned Peer Review
 
Redistricting in the US -- An Overview
Redistricting in the US -- An OverviewRedistricting in the US -- An Overview
Redistricting in the US -- An Overview
 
A Future for Electoral Districting
A Future for Electoral DistrictingA Future for Electoral Districting
A Future for Electoral Districting
 
A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk  A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
 
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
 
Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:
 
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsCreative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
 
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
 
Ndsa 2016 opening plenary
Ndsa 2016 opening plenaryNdsa 2016 opening plenary
Ndsa 2016 opening plenary
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental Scan
 
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information Science
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Emerging Data Citation Infrastructure

  • 1. Prepared for MIT Libraries Informatics Program Brown Bag Talk August2013 Emerging Data Citation Infrastructure Dr. Micah Altman <escience@mit.edu> Director of Research, MIT Libraries
  • 2. DISCLAIMER These opinions are my own, they are not the opinions of MIT, Brookings, any of the project funders, nor (with the exception of co-authored previously published work) my collaborators Secondary disclaimer: “It’s tough to make predictions, especially about the future!” -- Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, Winston Churchill, Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert Einstein, Enrico Fermi, Edgar R. Fiedler, Bob Fourer, Sam Goldwyn, Allan Lamport, Groucho Marx, Dan Quayle, George Bernard Shaw, Casey Stengel, Will Rogers, M. Taub, Mark Twain, Kerr L. White, etc. Emerging Data Citation Practices
  • 3. Collaborators & Co-Conspirators • Merce Crosas, IQSS, Harvard U. • Data-PASS Steering Committee <data-pass.org> • CODATA-ICSTI Task Group on Data Citation Standards and Practices <www.codata.org/taskgroups/TGdatacitation/> • Research Support – Thanks to the National Academies BRDI Sponsors: Department of Energy (DOE). Institute of Museum and Library Services (IMLS), The Library of Congress (LOC). Microsoft Research. National Institute of Standards and Technology (NIST), National Institutes of Health (NIH),National Oceanic and Atmospheric Administration (NOAA), National Science Foundation (NSF). U.S. Geological Survey (USGS) & the Massachusetts Institute of Technology. Emerging Data Citation Practices
  • 4. Related Work • CODATA-ICSTI Task Group on Data Citation Standards and Practices, 2013 , Out of Cite, “Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data”, Data Science Journal. Forthcoming. • P. F. Uhlir (Ed.), Developing Data Attribution and Citation Practices and Standards Report from an International Workshop (p. Forthcoming). National Academies Press. • M. Altman,2008, "A Fingerprint Method for Verification of Scientific Data" in, Advances in Systems, Computing Sciences and Software Engineering, (Proceedings of the International Conference on Systems, Computing Sciences and Software Engineering 2007) , Springer Verlag. • Altman, M., & King, G. 2007. A Proposed Standard for the Scholarly Citation of Quantitative Data. DLib Magazine, 13(3/4) Most reprints available from: informatics.mit.edu Emerging Data Citation Practices
  • 5. This Talk • What is data citation? Why Cite? • Emerging Principles • On the horizon Emerging Data Citation Practices
  • 6. What’s Wrong with this Picture? “To test Benet’s (1998) theory of “politically-induced intelligence” (Benet 1999, pg 8), use a hierarchical corrected contingency model (see Altman & Smith 2010; Edgeworth 1863). We apply this model to a snowball sample (Glass 1973) of eligible voters14, to which the standard Stanford-Binet (Stanford & Binet 1766) has been applied. Our results show that adoption of Pastafarrianism can be expected to yield an increase mean intelligence by 10.3 points. ” Emerging Data Citation Practices 13 We thank Jon Sample, Director of the institute of the Pastaffarian institute for supplying this dataset, which is available upon request.
  • 7. “How much slower would scientific progress be if the near universal standards for scholarly citation of articles and books had never been developed?Suppose shortly after publication only some printed works could be reliably found by other scholars; or if researchers were only permitted to read an article if they first committed not to criticize it, or were required to coauthor with the original author any work that built on the original … *If+ printed works existed in different libraries under different titles; if researchers routinely redistributed modified versions of other authors' works without changing the title or author listed; or if publishing new editions of books meant that earlier editions were destroyed?...” – Altman & King 2007 Emerging Data Citation Practices
  • 8. “Citations to unpublished data and personal communications cannot be used to support claims in a published paper” “All data necessary to understand, assess, and extend the conclusions of the manuscript must be available to any reader of Science.” Ideal Helping Journals Manage Data
  • 9. Reality Helping Journals Manage Data  Compliance is low even in best examples of journals  Checking compliance manually is tedious, hard to scale
  • 10. Attribution • Cite data as first class work • Identify contributors to data Discovery • Associate a persistent id with a work • Locate data via identifier • Locate data integral to article • Locate works related to data – articles, derivatives, sources Persistence • Reference exists as long as referring object • Evidence persists as long as assertions based on evidence? • Durability of data transparent? Access • Citation provides for mediated access • Access to surrogate • On-line access to object • Machine understandability • Long-term human understandability Provenance • Associate work with version of evidence used • Verify fixity of information Principles for Data Citation Theory: Use Cases Operational Constraints? -Syntax -Interoperability -Technical contexts of use
  • 11. Reference • Formal syntax used within the text of a publication to denote a relationship to an external object. May contain additional information about the portion/subset of external object implicated. Also known as “in-text reference”, “pin-cite”. We applied contingency analysis to the greatest data ever. [Altman 2005]” Citation •Formal description of external object, used for location and attribution. Micah Altman; Karin MacDonald; Michael P. McDonald, 2005, "Computer Use in Redistricting", hdl:1902.1/AMXGCNKCLU UNF:3:J0PkMygLPfIyT1E/8xO/EA== http://id.thedata.org/hdl%3A1902.1%2FAMXGCNKCLU Citation Metadata •Metadata that is systematically associated with citation through well- known public service, catalog, or protocol. <component_list> <component parent_relation="isPartOf"> <description><b>Figure 1:</b> This is the caption of the first figure...</description> <format mime_type="image/jpeg">Web resolution image</format> External Service •Applications and services that consume, enhance, aggregrate citation information. Practice
  • 12. Analysis Method Emerging Data Citation Practices 2 Workshops (70+ participants) + 1 Literature Review (400+ resources) + 2 Task Groups NAS & Co-Data (25+ members) + 60 Interviews + 7 authors Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data
  • 13. Principles for Data Citation - Separate - scientific principles - use cases - requirements - Distinguish - syntax - semantics - presentation - Design for - Ecosystem - Lifecycle - Stakeholders - Implement - Incremental value for incremental effort - Think globally, act Locally Analysis Approach
  • 14. Principles for Data Citation 1. Status of Data: Data citations should be accorded the same importance in the scholarly record as the citation of other objects. 2. Attribution: Citations should facilitate giving scholarly credit and legal attribution to all parties responsible for those data. 3. Persistence: Citations should be as durable as the cited objects. 4. Access: Citations should facilitate access to data by humans and by machines. 5. Discovery: Citations should support the discovery of data and their documentation. 6. Provenance: Citations should facilitate the establishment of provenance of data. 7. Granularity: Citations should support the finest grained description necessary to identify the data. 8. Verifiability: Citations should contain information sufficient to identify the data unambiguously. 9. Metadata Standards: Citations should employ widely accepted metadata standards. 10. Flexibility: Citation methods should be sufficiently flexible to accommodate the variant practices among communities. Data Citation Principles
  • 15. Principles for Data Citation • Author. – The creator of the data set. • Title. – As well as the name of the cited resource itself, this may also include the name of a facility and the titles of the top collection and main parent subcollection (if any) of which the data set is a part. • Publisher. – The organization (or repository) either hosting the data or performing quality assurance. • Publication date. – Whichever is later: the date the data set was made available, the date all quality assurance procedures were completed, or the date the embargo period (if applicable) expired. In other standards an “Access Date” field is used to document the date the data set was successfully accessed. • Resource type. – Examples: “database” or “data set.” • Edition. – The level or stage of processing of the data, indicating how raw or refined the data set is. • Version. – A number increased when the data changes, as the result of adding more data points or rerunning a derivation process, for example. • Feature name and URI. – The name of an ISO 19101:2002 “feature” (e.g., GridSeries, ProfileSeries) and the URI identifying its standard definition, used to pick out a subset of the data. • Verifier – to verify the identity of the content. • Identifier. – A resolvable web identifier for the data, according to a persistent scheme. There are several types of persistent identifiers, but the scheme that is gaining the most traction is the Digital Object Identifier (DOI). • Location. – A persistent URL or UNF from which the data set is available. Some identifier schemes provide these via an identifier resolver service. Citation Metadata Elements
  • 16. Gaps • Metadata/Structural – Granularity – Version Control – Microattribution – Contributor ID – Facilitation of reuse • Practice – Author: use of citations to data – Journals: ad-hoc syntax and location – Infrastructure: failure to index citations and references to data, even when associated with DOI’s – Tools: support for datasets in reference managers, etc. Emerging Data Citation Practices
  • 17. Harmonizing Principles & Requirements DataCite • DOI • Creator • Title • Publisher • Publication Year Emerging Data Citation Practices Digital Curation Center 1. The citation itself must be able to identify uniquely the object cited, though different citations might use different methods or schemes to do so. 2. It must be able to identify subsets of the data as well as the whole dataset. 3. a. It must provide the reader with enough information to access the dataset; b. indeed, when expressed digitally it should provide a mechanism for accessing the dataset through the Web infrastructure. 4. a. It must be usable not only by humans but also by software tools, so that additional services may be built using these citations. b. In particular, there need to be services that use the citations in metrics to support the academic reward system, and services that can generate complete citations.- See more at: Force 11 • Data should be considered citable products of research. • Such data should be held in persistent public repositories. • If a publication is based on data not included with the article, those data should be cited in the publication. • A data citation in a publication should resemble a bibliographic citation and be located in the publication’s reference list. • Such a data citation should include a unique persistent identifier (a DataCite DOI recommended, or other persistent identifiers already in use within the community). • The identifier should resolve to a page that either provides direct access to the data or information concerning its accessibility. Ideally, that landing page should be machine-actionable to promote interoperability of the data. • If the data are available in different versions, the identifier should provide a method to access the previous or related versions. • Data citation should facilitate attribution of credit to all contributors
  • 18. Current Infrastructure FigShare • Closed source • No charge • Archives data • Supports DOI’s, ORCIDS • Preserved in CLOCKSS Emerging Data Citation Practices Data Citation Index • Commercial Service (Thomson Reuters) • Indexes many large repositories (e.g. Data-PASS) • Beginning to extract citations from TR publications Dataverse Network • Open Source System • Hubs run at Harvard other universities • Archives data • Generates persistent identifiers (handles, DOI’s forthcoming) • Generates resolvable citations • Versioned • Harvard Library Dataverse now part of DataCite, Data-PASS preservation network DataCite • DOI registry service (DOI provider) • Data DOI metadata indexing service (parallel to CrossRef) • Not-for-profit membership Organization • Collaborating with ORCID-EU to embed ORCIDs
  • 19. Emerging Developments Emerging Data Citation Practices Open Journal Data Publication • Open source integration of PKP-OJS and Dataverse Network • Uses SWORD • Integrated data submission/citation/publi cation workflow for OJS open journals Journal Developments • NISO Recommendations on Supplementary Materials • Sloan/ICPSR Data Citation Project • Data-PASS Journal Outreach • New journal types: – Registered Replication journals – Null results journals – Data journals/data papers
  • 20. Research Questions for Data Citation and Management Emerging Data Citation Practices
  • 21. Research Areas Building on Richer Citations Emerging Data Citation Practices
  • 22. Brightening the “Dark Matter” of Scholarly Communications Researcher Identifiers: Developments, Opportunities & Challenges Research & Node Layout: Kevin Boyack and Dick Klavans (mapofscience.com); Data: Thompson ISI; Graphics & Typography: W. Bradford Paley (didi.com/brad); Commissioned Katy Börner (scimaps.org) Seed Magazine, Mar 7, 2007 http://seedmagazine.com/content/article/scientific_m ethod_relationships_among_scientific_paradigms/ 22 • Bibliometric and network analysis are the “telescopes” for exploring the structure of science • Researcher ID’s allow us to see more connections, more reliably • Identifiers for datasets, etc. reveal the “dark matter” of science Some potential questions: • Are fields linked through evidence that are not linked through publications? • How is the practice of science changing – are data scientists, statisticians, etc. making bigger contributions? • How would be the results of: – Catalyzing new research collaborations among individuals, organizations? – Strengthening support for specific areas of interdisciplinary research? – Growing the evidence base in particular areas?  Questions about how network of contributors and outputs evolves over time
  • 23. Additional Bibliography (Selected) • Starr, J., & Gastl, A. (2011). IsCitedBy: A metadata scheme for datacite. D-Lib Magazine, 17(½). doi:10.1045/january2011-starr • Piwowar, H., Vision, T.J. (2013). Data reuse and the open data citation advantage. PeerJ PrePrints. 1:e1v1. doi: 10.7287/peerj.preprints.1 • Cronin, B. (1984). The citation process: The role and significance of citations in scientific publication. London, United Kingdom: Taylor Graham. • Van Leunen, M. (1992). A handbook for scholars. New York, NY: Oxford University Press. Emerging Data Citation Practices
  • 24. Questions? E-mail: escience@mit.edu Web: micahaltman.com Twitter: @drmaltman Emerging Data Citation Practices

Notes de l'éditeur

  1. This work. by Micah Altman (http://micahaltman.com) is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
  2. The structure and design of digital storage systems is a cornerstone of digital preservation. To better understand ongoing storage practices of organizations committed to digital preservation, the National Digital Stewardship Alliance conducted a survey of member organizations. This talk discusses findings from this survey, common gaps, and trends in this area.(I also have a little fun highlighting the hidden assumptions underlying Amazon Glacier&apos;s reliability claims. For more on that see this earlier post: http://drmaltman.wordpress.com/2012/11/15/amazons-creeping-glacier-and-digital-preservation )
  3. Data citation supports attribution, provenance, discovery, provenance, and persistence. It is not (and should not be) sufficient for all of these things, but its an important component. In the last 2 years, there have been several major efforts to standardize data citation practices, build citation infrastructure, and analyze data citation practices. This session presented as part of the the Program on Information Science seminar series, examines data citation from an information lifecycle approach: what are the use cases, requirements and research opportunities. And the session will also discuss emerging infrastructure and standardization efforts around data citation.A number of principles have emerged for citation -- the most central is that data citations should be treated consistently with citations to other objects:Data citations should at least provide the minimal core elements expected in other modern citations; should be included in the references section along with citations to other elements; and indexed in the same way.Adoption of data citation by journals can provide positive and sustainable incentives for more reproducible science and more complete attribution. This would act to brighten the dark matter of science -- revealing connections among evidence bases that are not now visible through citations of articles.