Slides for a discussion on a brief Nature comment on Bioinformatics Cores and an older Plos One perspective that covers suggested best practices for Bioinformatics Cores.
1. Bioinformatics Core Facilities discussion:
Presentation by Jennifer M Shelton (2015)
Core services: Reward bioinformaticians (2015)
Jeffrey Chang
2. Issue: Most analysis is unique
• biological data are accumulating faster than people's capacity to analyze them
• We documented the projects that we took on over 18 months (1 full-time & 3 part-
time staff with the author included as 1 part-time).
• Forty-six of them required 151 data analysis tasks.
• 79% of techniques applied to fewer than 20% of the projects.
• most researchers came to the bioinformatics core seeking customized analysis,
not a standardized package
3. Issue: Analysis is not treated as unique
• applied bioinformaticians do not lead their own research
projects
• fewer opportunities for advancement and leadership
• output is not captured by the standard metrics of achievement.
• This is why it can take more than six months to fill positions at a
core, why many of biology's brightest are leaving science for
technology companies, and why conventional biologists wait
nine months to get help to dissect their data.
4. Questions : "Core services: Reward bioinformaticians
(2015)"
• To what degree can small cores (3-4 people) provide analysis for all of the big-
datasets being generated in our Universities?
• How often does your core work on a project that requires custom analysis?
• Can the researchers we run analysis for do anything with there results once we
are done?
• Do they still need some applied bioinformatics skills to query or make use of the
results?
• If we have an actual limit of time and number of projects we can process then
what about the rest of the research community?
5. Bioinformatics Core Facilities discussion
Establishing a Successful Bioinformatics Core
Facility Team (2009)
Fran Lewitter, Michael Rebhan
6. Outline of best practices
a) allowing the bioinformaticians to spend 20%–40% of their time to develop mid-
term focus areas that combine certain types of biological questions with related
bioinformatics approaches;
b) encouraging regular discussion on best practice within the unit, in particular the
pros and cons of different approaches, and related resources;
c) careful selection of the most relevant datasets and methods for a given problem
(which can take some time if there isn’t sufficient overlap with previous projects);
d) designing solutions that, if possible, combine independent lines of evidence to
make results as reliable and informative as possible;
e) meaningful communication with the experimentalists on the scientific goals and
their context (in many cases the formulation of the original request is the starting point
of a discussion that results in solutions that address the main underlying problems
more effectively), and what can be expected from the Facility (to avoid
disappointments due to unrealistic expectations, which can be a major problem);
f) communicating the results to the experimentalist in a way that works for the
target audience (often requiring many iterations of analyses and lab work).
7. Issue raised:
"Many Bioinformatics Core Facilities, however, do not reach that mature
stage, and are caught, in extreme cases, in a ‘‘firefighting mode’’, a vicious
cycle between highly diverse, mostly urgent and hardly prioritized requests,
and insufficient resources for developing high-quality solutions that make
significant contributions to the output of the institution. "
This quote sounds familiar and I think this is one of the things that
makes bioinformatics analysis an exciting and challenging job.
I do think that if we discuss this issue frankly within and without our
respective cores we could find more incremental improvements to
our current firefight-related issues.
Thoughts? Do you agree/disagree?
8. Biggest Bioinformatics Difficulty Most useful thing BRAEMBL could do
Survey by Bioinformatics Resource Australia – EMBL
http://braembl.org.au/news/braembl-community-survey-report-2013
Researchers see a lack of expertise in how to develop software and analyze
data effectively and efficiently as a significant constraint on their work.
This can only be addressed through high quality, widely available training,
which is the resource most highly desired by researchers.
Question : Is training a part of the answer?
9. Teaches basic lab skills
for scientific computing
so that researchers can do
more
in less time and with less pain.
Teach basic concepts, skills and tools
for working more effectively with data.
Workshops are designed for people
with little to no prior computational
experience.
Question : Is training a part of the answer?
10. Since January 2012
- Over 270 two-day workshops
- For over 10,000 learners
- Taught by over 250 volunteers
- In over 20 countries
Curriculum (all CC-BY licensed)
l
- Unix shell (task automation)
l
- Python or R (modular programming)
l
- Git/GitHub (reproducibility & collaboration)
l
- SQL (data management)
North American Workshops 2012-2014
Outcomes
1. Save half a day a week (or more) for the rest of their careers
2. Prepare them for reproducible research, HPC, and open science
3. Enable them to tackle entirely new kinds of problems
http://software-carpentry.org
Question : Is training a part of the answer?
11. http://datacarpentry.org
-
Sister organization of Software Carpentry
-
Officially started November, 2014
-
Developed and ran workshops prior to
November with NSF support
Curriculum
•
Focused on data - teaches how to manage and analyze data in an
effective and reproducible way.
•
Initial focus is on workshops for novices - there are no prerequisites,
and no prior knowledge computational experience is assumed.
•
Domain specific by design – currently have lessons in biology and are
developing lessons for genomics, geosciences and social sciences.
Planning 24 workshops in 2015 and the development of materials in
more domains and for more advanced data analysis topics.
Question : Is training a part of the answer?