1. Panel 2 : Attribution: Managing
Provenance, Ethics, and Metrics
Problems of attribution
ORCID-Dryad Symposium, 23 May 2013
Christine L. Borgman
Oliver Smithies Visiting Fellow and Lecturer,
Balliol College, Oxford
Visiting Fellow, Oxford eResearch Centre
Visiting Fellow, Oxford Internet Institute
Professor and Presidential Chair in Information Studies,
University of California, Los Angeles
1990, Sage Publications
2. Data Attribution and Citation
2
CODATA-ICSTI Task Group on Developing Data
Attribution and Citation Practices and Standards,
2010-?
International Council of Scientific Unions
Committee on Data: CODATA
International Council for Scientific and
Technical Information
Task Group Co-Chairs
Christine Borgman, UCLA – US
Sarah Callaghan, BADC – UK
Jan Brase, DataCite – Germany
3. CODATA-ICSTI Task Group on Data
Citation Standards and Practices
April 2013 DRAFT
3
Citation of Data:
The Current State
of
Practice, Policy, a
nd Technology
2012
4. Driving questions for symposium
1. What are the major technical issues that need to be considered in developing and
implementing scientific data citation standards and practices?
2. What are the major scientific issues that need to be considered in developing and
implementing scientific data citation standards and practices? Which ones are
universal for all types of research and which ones are field- or context- specific?
3. What are the major institutional, financial, legal, and socio-cultural issues that need
to be considered in developing and implementing scientific data citation standards
and practices? Which ones are universal for all types of research and which ones
are field- or context-specific?
4. What is the status of data attribution and citation practices in individual fields in the
natural and social (economic and political) sciences in United States and
internationally? Case Studies.
5. Institutional Roles and Perspectives: What are the respective roles and
approaches of the main actors in the research enterprise and what are the
similarities and differences in disciplines and countries? The roles of research
funders, universities, data centers, libraries, scientific societies, and publishers will
be explored.
4
5. Infrastructure for digital objects
Social practice
Usability
Identity
Persistence
Discoverability
Provenance
Relationships
Intellectual property
Policy
http://datalib.ed.ac.uk/GRAPHICS/blue_data.gif
5
Borgman, C. L. (2012). Why Are the Attribution and Citation of Scientific Data
Important? In For Attribution -- Developing Data Attribution and Citation
Practices and Standards: Summary of an International Workshop (pp. 1–10).
Washington, D.C.: The National Academies Press.
6. Scholarly practice
Why cite data?
Document evidence
Support discovery
Assign credit
Why attribute data?
Social expectation
Legal responsibility
How to cite data?
Bibliographic reference
Identifier
Link
6
http://inventionmachine.com/the-Invention-Machine-
Blog/bid/51703/Three-Key-Challenges-to-Entering-New-
Markets
7. Attribution of data
Legal responsibility
Licensed data
Specific attribution required
Scholarly credit: contributorship
Author of data
Contributor of data to this publication
Colleague who shared data
Software developer
Data collector
Instrument builder
Data curator
Data manager
Data scientist
Field site staff
Data calibration
Data analysis, visualization
Funding source
Data repository
Lab director
Principal investigator
University research office
Research subjects
Research workers, e.g., citizen science…7
8. Finding and following digital objects
Discoverability
Identify existence
Locate
Retrieve
Provenance
Chain of custody
Transformations from original state
Relationships
Units identified
Links between units
Actions on relationships
http://chicagoist.com/2008/10/09/a_gourmet_
oasis_provenance_food_and.php
8
9. Metrics
9
What to count?
How to count them?
What do the counts mean?
Theory of relationships?
Hypotheses?
Model of scholarly attribution?
Is the method scientifically rigorous?
Is it valid?
Is it reliable?
Emily Haines and Jimmy Shaw of Metric
Photo Credit: Claire Lorenzo. Flickr
10. Managing Provenance, Ethics, and Metrics
Problems of attribution
Attribution
What are contributor roles?
Who deserves credit?
What are legal obligations?
Citation functions
Evidentiary
Credit, attribution
Discovery, linking
Provenance
Workflows, transformations
Software to interpret, analyze
Metrics
What, how, and why to count
Need a theory of citation behavior
http://www.dlorg.eu/index.php/publications/ec-
policy/ec-publications-policy
10
Notes de l'éditeur
Much current work as well! This group established about 3 years ago, after about 3 years of discussion about the need for it, coordinating with RDA and others.Note that these are old and international organizations, 1960s eraSarah and I are here; Simon Hodson a senior player in CODATA
What do we do?We write REALLY LONG reports!
These are very wordy – you have them in your agenda; won’t parse them further here! Highlighted key phrases in blue.Origins: months of discussion within BRDI, CODATA, Task Group, and Steering committee to arrive at these.The steering cmte, in consultation with the task group, assembled the distinguished group of speakers from around the world Data management, use, reuse, and incentives to share and reuse are hot topics. We cannot address all of them in two days – we have tried to keep our attention narrowly focused on these questions – we are not ignoring the other ones; just taking this rare opportunity to identify answers to key questions that will help the community move forward on standards, practices, and policy.
An infrastructure for digital objects has many features – we’re concerned at this meeting with how they apply to data, attribution, and citation – but must remember that they are part of a larger internet architecture of digital objectsThis is the framing for the workshop, and the one in which today’s issues exist.
Let me spend my few minutes here: Why cite data in the first place?Citation and attribution are not the same thing, as was very clear in the 2011 symposium.The means to cite also vary widely, and the technical infra is developing rapidly.Explain each of these briefly.
We’re here to talk about attribution. In CC terms, that means giving credit – but credit for what?If you have obtained data through a license, it may require that the data be cited in a certain wayThe broader issue, is attribution for what?We heard yesterday that data citation is an incentive for data release. That’s an untested hypothesis – and needs to be testedWhat we found at the symposium was that everyone down the line had their hand out! The mechanism for citation will vary by who is getting credit and the reason for making the reference. These are but a few of the many stakeholders that might deserve credit in some situations.
But data citation serves many other purposes as well.Explain these brieflyOld notes below.The ability to discover the existence of data is a critical requirement for a data-sharing infrastructure. We can define discovery as being the ability to determine the existence of a set of data objects with specified attributes or characteristics. The attributes of interest include aspects such as the producer of the data, the date of production, the method or production, a description of its contents, its representation. Discovery may also include aspects such as levels of quality, certification, or validation by third parties. Discoverability depends both on the description and representation of data and on tools and services to search for data objects. Data rarely are self-describing . Description and representation usually take the form of metadata, some of which may be automated if data are generated by instruments such as sensor networks or telescopes. Much metadata creation requires human intervention, making it an expensive process that is often avoided by researchers.The lack of standards and practices for citing data, akin to citing publications, is a barrier to discoverability]. A variety of approaches to discovery are possible. Web search engines that walk the visible internet are one possibility assuming that data descriptions are reachable via standard web protocols. With the introduction of semantic web technologies and associated crawlers and search engines, location of data-sets of interest based on semantic content becomes possible. Alternatively, more discipline-specific and structured catalogs can be created.Provenance, which will be discussed in the first panel, has different meanings in the archival and computer science communityFollowing objects requires that they be in some units, that we can make relations between those units, and often some actions thereby.
This is a huge concern with regard to using citations of any kind for evaluating people.We need to apply more scientific rigor to the metrics process if we are going to design metrics appropriately.(the band, Metric, is the first set of images for “metrics” on Flickr – for CC use)
To conclude with the problems of attribution for this panelSummarize each quickly.