Presented at American Institute for Biological Sciences council meeting 8 December 2015. I focus on anecdotes from multiple domains on the kinds of skills and trajectories that empower scientists at multiple levels to become engaged in data-intensive science as data wranglers or tool-builders. Even if they don't have lots of funding from NSF or NIH.
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Preparing for data-intensive science across domains.
1. Cynthia Parr @cydparr
US Department of Agriculture
National Agricultural Library
8 December 2015
Preparing for data-intensive
science across domains
3. Real-time
Automated
Modified from Peter Wittenberg, Research Data Alliance
https://rd-alliance.org/group/data-fabric-ig.html
raw data
collection
Exploration, cleaning,
enrichment, analysis
registration, preservation
temporary
data
referable
data
citable
data
citable
publication
15. What: Data & metadata standards
From Fig. 2. Wieczorek, et al. (2012). "Darwin Core: An Evolving Community-developed
Biodiversity Data Standard.". PLoS ONE 7 (1). doi:10.1371/journal.pone.0029715.
21. How: Be Agile
“The tools needed for doing data-
intensive science are in a constant state
of flux – it’s hard for practitioners to keep
up, let alone keep a curriculum current.”
– Leslie Ries
• Undergraduate exercises in re-analysis
• Graduate curriculum as scaffolded
exploration
24. Images are my own unless otherwise credited
cynthia.parr@ars.usda.gov
Ag Data Commons
data.nal.usda.gov
eol.org/traitbank
tdwg.org
@cydparr
Notes de l'éditeur
Preparing for data intensive science across domains
I will outline a series of progressive training needs, gleaned through my interactions with long-tail and big data scientists and data managers from a variety of domains. Some needs, such as basic data management and intellectual property concepts, are being addressed by forward-thinking libraries. Other needs, such as basic programming, data standards and semantics, typically are met through participation in short term research sprints, workshops, or courses. Currently, successful researchers expect change and learn informatics informally via their own projects, a perspective that can permeate all training and scientific practice.
Will keep most of the important URLs to the end
Level 1: What everyone doing research should know, whatever the domain
Basic data life cycle stuff
The need to have data management plans
Best practices …even the fact that best practices exist
Level 2: What many researchers should know, get away from spreadsheets, learn some basic programming, some understanding of database principles, maybe even some application programming interfaces. These are very general skills.
Not everyone need to learn this but for those in many environmental domains some basic GIS understanding is essential, for evolutionary biologists or molecular biologists there are other specialist tools
While these can be pretty intense, these like basic programming, can increasingly be taught at high school level
Useful for data integration and machine learning and text or data mining
And high performance computing
Coursera and other courses
Titus Brown
Training in Data intensive biology
Like a hackathon
New data
New techniques
New tools
Level 1: New acronyms
Enough to make your head explode
But we have gotten used to new phones every couple of years, this isn’t so bad
Don’t really mean to use agile software development, though that is the method du jour.
Rather, learning programming or learning statistics really isn’t about the specific language or algorithm, but learning the general skill of how to learn and enough scaffolding. She also teaches classes with modules where students revisit some published result. In order to do that they have to learn enough about the question, the data, the analysis methods, etc, to redo them. They may or may not take the same approach as the paper. Leslie’s plan for a graduate seminar is to have students each research a new statistical or analysis technique and present it to the class.
Not to be afraid of using the wrong words
Go to meetings (tell story of Harry Hochheiser)
Pick up the lingo
Get to know the sys admin
Paired programming