2. What is the Data Services Center?
• Numeric and statistical data services
• Finding and providing access to datasets
• Planned: statistical consulting
• Spatial data services
• Creating and acquiring GIS data
• Research data services
5. Open Data citation advantage
• Papers that make data available are cited 9 – 69% more
(Dorch, 2012; Sears, 2011; Henneken and Accomazzi, 2011; Pienta et al.,
2010; Piwowar et al., 2007)
• Why? (Piwowar and Vision, 2013)
• Data reuse
• Credibility signaling
• Increased visibility
• Early view
• Selection bias
7. You don’t have to share all your data
with anyone who wants it
“at no more than incremental cost and
within a reasonable time” (NSF)
“indicate the criteria for deciding who
can receive your data” (NIH)
“All data necessary to understand, assess, and
extend the conclusions of the manuscript must be
available to any reader.” (Science)
8. Consider granularity:
• What would someone need to reproduce your results?
Data
Processed FinalRaw
Scripts, code libraries, etc.
Metadata
9. Consider timing:
• Before publication? At the time of publication?
• Consider restrictions, embargo, etc. for data that can’t be
immediately shared freely
• Check with UR Ventures if you have concerns about protecting
patent interests
• Staggered release: metadata, then data later
10. Consider usability:
• Could someone with comparable expertise look at your
data and understand how to use it?
• Is it clear how different files relate to each other?
• Are your variable names meaningful? File names descriptive?
• Include README.txt file or codebook in top level of directory
• Are special tools or software needed to use your data?
• Are your files in a proprietary format? Will future users be able to
open them?
• Include the necessary tools, or make the data available in open
formats
12. Why can’t I keep in on my computer?
• Poor success rates for data sharing requests (Vines et al.,
2013; Savage and Vickers, 2009; Wicherts et al., 2006)
• The older the article, the harder to get the data (Vines et al,
2014):
• Odds of a dataset being reported as extant decline by 17% per
year
• Odds of finding a working email for first, last, or corresponding
author decline by 7% a year
13. Why can’t I keep it on my computer?
“Sure I will send you those data, but it's like seven
computers ago, and so please allow me some time to hunt
them down” (Wicherts and Bakker, 2012)
• Most refusals are not to protect ongoing work, but
because (Vines et al., 2014):
• The data are on a computer that got stolen…
• The data are in my parents’ attic…
• The data are definitely on one of these zip disks…
• …and it will take hours for me to get them, if I can get them at all.
14. Set it and forget it: put your data in a
repository
• Long-term commitment to data preservation
• Reuse tracking and usage statistics
• Permanent URL / DOI enables data citation
15. Set it and forget it: put your data in a
repository
1. Find a disciplinary repository or database
• Repository directories: re3data.org; biosharing.org
• Typically managed by specialists in the field
16. Set it and forget it: put your data in a
repository
1. Find a disciplinary repository or database
• Repository directories: re3data.org; biosharing.org
• Typically managed by specialists in the field
2. Use a general-purpose repository
• UR Research: https://urresearch.rochester.edu/home.action
18. Set it and forget it: put your data in a
repository
1. Find a disciplinary repository or database
• Repository directories: re3data.org; biosharing.org
• Typically managed by specialists in the field
2. Use a general-purpose repository
• UR Research: https://urresearch.rochester.edu/home.action
• Dryad: http://datadryad.org
20. • Integration with journal submission
processes
(http://datadryad.org/pages/integratedJ
ournals)
• Not free: $80/submission. But we
provide vouchers!
21. How to get a voucher
• Proposal should include:
• A description of the project to which the data is related;
• A description of the data to be archived, including the format(s)
and approximate total size. The RCL will fully fund datasets up to
10GB, with larger data considered on a case-by-case basis.
• Send proposal to kathleen.fear@rochester.edu
22. But my data’s bigger than that…
• An upcoming option: REACTUR (Research data Archiving
and Curation at the University of Rochester)
• River Campus Libraries + CIRC = easy data sharing for
large datasets
• $200 / TB / year
• Piloting now, hope to be available for all in Spring 2015
23. Set it and forget it: put your data in a
repository
1. Find a disciplinary repository or database
• Repository directories: re3data.org; biosharing.org
• Typically managed by specialists in the field
2. Use a general-purpose repository
• UR Research: https://urresearch.rochester.edu/home.action
• Dryad: http://datadryad.org
• REACTUR
25. A little help:
• Call me! (Or email, or drop by.)
5-6882
Carlson 313E
kathleen.fear@rochester.edu
• At URMC, contact:
Donna Berryman
5-6877
Donna_Berryman@urmc.rochester.edu
Linda Hasman
5-3399
Linda_Hasman@urmc.rochester.edu
26. Data Workshops
• 1st and 3rd Thursdays @ noon, Carlson Library Rm. 310
Fall 2014 Spring 2015
September
Writing a successful data
management plan January
R 101
Intro to R SpatialIntro to GIS I
October
Sharing your data
February
Using the DMPTool
Intro to GIS II Georeferencing maps
November
Finding and using data
from ICPSR
March
Basic database design
Web mapping: Google
Refine, Open LayersIntro to GIS III
December
Data visualization
April
Tools for qualitative
research
--- Mapping real-world data
27. References
• Dorch, B. (2012). On the Citation Advantage of linking to data. Retrieved from http://hprints.org/hprints-
00714715
• Henneken, E. A., & Accomazzi, A. (2011). Linking to Data - Effect on Citation Rates in Astronomy.
arXiv:1111.3618 [astro-Ph]. Retrieved from http://arxiv.org/abs/1111.3618
• Pienta, A. M., Alter, G. C., & Lyle, J. A. (2010). The Enduring Value of Social Science Research: The Use
and Reuse of Primary Research Data. Retrieved from http://deepblue.lib.umich.edu/handle/2027.42/78307
• Piwowar, H. A., Day, R. S., & Fridsma, D. B. (2007). Sharing Detailed Research Data Is Associated with
Increased Citation Rate. PLoS ONE, 2(3). doi:10.1371/journal.pone.0000308
• Piwowar, H. A., & Vision, T. J. (2013). Data reuse and the open data citation advantage. PeerJ, 1.
doi:10.7717/peerj.175
• Sears, J. R. (2011). Data Sharing Effect on Article Citation Rate in Paleoceanography. AGU Fall Meeting
Abstracts, 53, 1628.
• Savage, C. J., & Vickers, A. J. (2009). Empirical Study of Data Sharing by Authors Publishing in PLoS
Journals. PLoS ONE, 4(9), e7078. doi:10.1371/journal.pone.0007078
• Vines, T. H., Albert, A. Y. K., Andrew, R. L., Debarre, F., Bock, D. G., Franklin, M. T., … Rennison, D. J.
(2014). The Availability of Research Data Declines Rapidly with Article Age. Current Biology, 24(1), 94–97.
doi:10.1016/j.cub.2013.11.014
• Vines, T. H., Andrew, R. L., Bock, D. G., Franklin, M. T., Gilbert, K. J., Kane, N. C., … Yeaman, S. (2013).
Mandated data archiving greatly improves access to research data. The FASEB Journal, 27(4), 1304–
1308. doi:10.1096/fj.12-218164
• Wicherts, J. M., & Bakker, M. (2012). Publish (your data) or (let the data) perish! Why not publish your data
too? Intelligence, 40(2), 73–76. doi:10.1016/j.intell.2012.01.004
• Wicherts, J. M., Borsboom, D., Kats, J., & Molenaar, D. (2006). The poor availability of psychological
research data for reanalysis. The American Psychologist, 61(7), 726–728. doi:10.1037/0003-066X.61.7.726