The presentation slides from a professional development session on computational biology at the 2014 Society for Advancement of Chicanos and Native Americans in Science (SACNAS) Conference.
Panelists included Tracy Heath, Emilia Huerta-Sanchez, Conner Sandefur, and Felipe Zapata.
The website associated with this panel: http://crackingthebiocode.github.io/
Cracking the (bio)code -- Professional Development Session at SACNAS 2014
1. Cracking the (bio)code
Resources for research careers in computational biology & bioinformatics
Felipe Zapata, PhD
Brown University
@zapata_f
Conner Sandefur, PhD
Univ. North Carolina
@oshehoma
Emilia Huerta-Sanchez, PhD
Univ. California, Merced
@emiliahsc
Tracy Heath, PhD
Iowa State Univ.
@trayc7
Visit our website: crackingthebiocode.github.io
● Information about the session
● Resources for learning to program: workshops, online courses, tutorials, etc.
● Links to many degree programs in the U.S. for studying computational
biology/bioinformatics
● Profiles of computational biologists and bioinformaticians
2. How small changes can make a big difference Bioinformatics @UNC-Pembroke
Investigating how changes in gene
expression drive system-wide behaviorComputational Biology @UNC-Chapel Hill
Predicting therapies to improve mucus clearance in cystic
fibrosis (CF) and chronic obstructive pulmonary disease (COPD) 1 hr 24 hrs
-4 0 4
Tools I use:
3. Dr. Conner I. Sandefur
SPIRE Postdoctoral Scholar at UNC-CH
Visiting Assistant Professor at UCNP
PhD Bioinformatics
University of Michigan
Ann Arbor, Michigan
BA Computer Science
George Washington University
Washington, DC
email: sandefur@email.unc.edu
web: http://www.unc.edu/~sandefur
twitter: @oshehoma
4. What is the evolutionary history of species?
Using transcriptomes and genomes to
resolve ancient animal radiations
Phylogeny of snails, slugs, and relatives
What genes are homologous?
Using graph-based approaches to infer homology
Gene clusters inferred to be the “same” gene family
across multiple species
AGALMA: https://bitbucket.org/caseywdunn/agalmaBitBucket
(Git)
5. Dr. Felipe Zapata
Postdoctoral Research Associate
Brown University
COLOMBIA
email: felipe_zapata@brown.edu
web: http://felipezapata.me
twitter: @zapata_f
PhD Ecology, Evolution & Systematics
University of Missouri-St. Louis
St. Louis, Missouri
BSc Biology
Universidad de Los Andes
Bogotá, Colombia
7. Dr. Emilia Huerta Sanchez
Assistant Professor
UC Merced
email: ehuerta-sanchez@ucmerced.edu
web: http://www.stat.berkeley.edu/~emiliahs
twitter: @emiliahsc
Postdoc in Integrative Biology and
Statistics, UC Berkeley, Berkeley, CA
PhD Applied Mathematics
Cornell University, Ithaca, NY
BA Mathematics & French
Mills College, Oakland, CA
8. Modeling macro- & molecular evolutionary processes to infer
phylogenetic relationships
● How have rates of molecular and morphological
evolution changed across the tree of life?
● How do patterns of fossilization, preservation, and
recovery change across different taxa?
● Can we detect relationships between geological
events and species diversification?
● What are the evolutionary processes acting on
different regions of the genome and how have those
factors shaped the evolution of different genes?
C++
RevBayes
Probabilistic
graphical models
9. Dr. Tracy A. Heath
Assistant Professor (Jan. 2015)
Iowa State University
email: trayc7@gmail.com
web: phyloworks.org
twitter: @trayc7
Postdoctoral Fellow
U. Kansas & U.C. Berkeley
PhD Ecology, Evolution & Behavior
University of Texas at Austin
BA Biology
Boston University
10. What is Computational Biology?
What is Bioinformatics?
http://crackingthebiocode.github.io/
12. Compartmental models are one type of mathematical model used to
investigate the spread of infectious disease
Rate of infection
Rate of recovery
Change in proportion of Susceptible (S) people over time = - Susceptible (S) X Infected (I) X β
Susceptible Infected Recovered
=
13. Infection dynamics for different diseases can be simulated by selecting
appropriate parameters
14. We can use models to predict how interventions change disease
transmission dynamics
Infection dynamics with R0
= 2
Infection dynamics after intervention at day
10, which reduced R0
to 0.8
R0
> 1, infection peaks then disappears R0
< 1, infection dies out
Simulations run in Python 3.4 (downloaded as part of Anaconda package: http://continuum.io/downloads)
16. From…
a few key genes (e.g. 16S RNA,
mitochondria, chloroplasts)
across many species
To…
High-Throughput Sequencing of
1000s of genes across many
species
genes
speciesspecies
genes
Phylogenetics
17. Challenges to phylogenetics
• Many steps
• Many programs must be used together
• Computationally intensive
• Difficult to reproduce
18. Challenges to phylogenetics
• Many steps
• Many programs must be used together
• Computationally intensive
• Difficult to reproduce
Automate!
19. Why automate?
• Results are reproducible
• Results can be easily explored and extended
• Methods can be compared in a controlled setting
• Facilitate method development without reinventing
everything
23. For each transcriptome:
• Quality control
• Assemble transcriptome
• Translate and annotate genes
• Quantify gene expression
• Put sequences in database
Can also:
• Import DNA sequences from national databases (e.g., NCBI)
• Process externally produced assemblies
24. Across transcriptomes (many species):
• Identify homologous genes
• Build phylogenies using all genes!
silhouetteimagesfromhttp://phylopic.org/
25. What tools do you need?
http://crackingthebiocode.github.io/
A biological question
programming skills
statistical modeling
C++
a mathematical model
26. Questions?
• What programming language should I learn?
• How do I get started learning a programming language?
• What is the best way to become proficient in a programming language?
• What is the difference between C++ and python and java and R and
MatLab and ruby and ...?
• What is version control? Do I need to know it?
• Do I need a GitHub account?
• Where are jobs or degree programs in computational
biology/bioinformatics listed?
• What does it mean to be open source? Why is it important?
• and ...?
http://crackingthebiocode.github.io/
27. Take-Home Messages
• You don’t have to be an expert programmer to do computational
biology.
• Anyone can learn to program, it’s just a matter of getting started.
• Computational skills are extremely helpful for streamlining biology
research.
• The skills you need to learn depend heavily on you background and
your research interests.
• Quantitative skills – a firm understanding of math and statistics – are
important for any research field.
• Don’t be overwhelmed by all there is to know, these skills grow over
time. If you consistently seek to improve them & use them for your work
you will be amazed at how your expertise will develop.
http://crackingthebiocode.github.io/