Presented at ASIST & ISSI Pre-Conference
Symposium on Informetrics and Scientometrics on Nov 7, 2009
http://www.sois.uwm.edu/MetricsPreCon/program.html
Public Sharing of Research Datasets: A Pilot Study of Associations
1. Public sharing of
research datasets:
a pilot study of
associations
Heather Piwowar and Wendy Chapman
Department of Biomedical Informatics
University of Pittsburgh
8. Prior work has focused on surveys and
studies of intention.
Our aim: measure associations between
observed data sharing behaviour and
environmental variables
aim
9. Funder Journal Investigator Institution Study
Is research data shared
after publication?
aim
10. Funder Journal Investigator Institution Study
Is research data shared
after publication?
aim
11. http://en.wikipedia.org/wiki/DNA_microarray
http://en.wikipedia.org/wiki/Image:Heatmap.png
http://commons.wikimedia.org/wiki/
File:DNA_double_helix_vertikal.PNG
microarray
data
13. Ochsner et al. (2008). Much room for improvement in
deposition rates of expression microarray datasets. Nature
Methods, 5(12), 991.
Manually reviewed 20 journals for 2007:
400 studies
200 shared their microarray data
data sample
14. Journal
Funder Journal Investigator
impact
mandates mandates “experience”
factor
Is research data shared
after publication?
variables
16. Funder
mandates
NIH 2003 Data Sharing Requirement
Requires a data sharing plan
for studies funded after October 2003
that receive more than $500 000 in direct funding per year
variables
17. Funder
mandates
Assumed data sharing requirement was applicable if:
the NIH grant numbers associated with PubMed entry had
$750 000 in total funding any year since 2004
plus
a NIH grant number with a leading “1” or “2” since 2004
variables
19. Journal
mandates
Piwowar and Chapman.
A review of journal policies for sharing research data.
International Conference on Electronic Publishing (ELPUB) 2008
Journal Policy Strength: Strong, Weak, or None
variables
22. Author
experience
“experience and impact” proxy:
•
years since first publication
•
h-index estimate
•
a-index estimate
Scriptable, to allow scaling up to
thousands of authors?
variables
25. Author
experience
Author name disambiguation
Author-ity web service:
Torvik & Smalheiser. (2009). Author Name
Disambiguation in MEDLINE. ACM Transactions on
Knowledge Discovery from Data, 3(3):11.
variables
26. Author
experience
PubMed + PubMed Central +
Author-ity to compute
pubmedi citation estimates
➡ not comprehensive account of
publication accomplishments
➡ for aggregate analysis: free, open,
scriptable, flexible, reproducible.
variables
27. Author
experience For each first and last author,
we used the first principal
component of:
• years since first publication
• pubmedi h-index estimate
• pubmedi a-index estimate
variables
28. Journal
Funder Journal Investigator
impact
mandates mandates “experience”
factor
Is research data shared
after publication?
variables
31. Not statistically significant Statistically significant
Journal
Funder Journal Investigator
impact
mandates mandates “experience”
factor
Is research data shared
after publication?
results
40. http://www.flickr.com/photos/vlastula/300102949/
• Association does not imply causation
• Only one datatype
• Small sample, limited variables
• Dataset contains disproportionate
number of high-impact studies
limitations
41. • NIH data sharing plan applies to a
minority of NIH microarray studies
• NIH data sharing plan does not seem
to increase frequency of data sharing
• More experienced investigators are
more likely to share data
prelim
conclusions
44. Dept of Biomedical Informatics at U of Pittsburgh
NLM for training grant funding
Open science online community and those who release their
articles, datasets and photos openly
Dr Wendy Chapman for her support and feedback
thanks
47. Journal Policy strength
mandates categorization:
None: No applicable mention of data sharing
Weak: Request or unenforceable requirement
Strong: Require data deposit accession number
as a condition of publication
variables