#ievobio Keynote - June 26, 2013

Visualizing biodiversity in the
era of high-throughput
sequencing
Holly Bik, UC Davis
@Dr_Bik

Our ability to visualize high-
throughput sequencing data is as
bad as my title slide

$250k, 1 year
“A Research-Driven Data
Visualization Framework for High-
Throughput Environmental
Sequence Data”

http://pitchinteractive.com @pitchinc

“Pitch Interactive dissects large
data sets in search of meaningful
and often hidden patterns that
serve to determine the shape and
form that best tells a story.”

Diverse marine community!
EASY!
EASY!
EASY!
VERY
Difﬁcult!!

Mark Rothko,
No. 14, 1960
rectangles of orange and
purple with soft edges

h"p://pippascabinet.blogspot.com/2012/11/on6true6love.html:

Challenge 1: Environmental data is
terrible at revealing ﬁne-scale
taxonomic patterns

ShallowGulf:
ShallowCalif:
AtlanAc22#1:
AtlanAc25#2:
AtlanAc29:
AtlanAc43: Pacific128:
Pacific528:
Pacific422:
Pacific321:
Pacific237:AtlanAc45:
PC2:(12.21%):
PC3:(10.54%): PC1:(13.03%):
Overarching Community Patterns!
Bik et al. 2012, Molecular Ecology,!
21(5):1048-59 !

0:
0.1:
0.2:
0.3:
0.4:
0.5:
0.6:
0.7:
0.8:
0.9:
1:
Post-spill
Fungal
Dominance
Nematode
Dominance
Pre-spill
Bik et al. 2012, PLoS ONE, 7(6):e38550 !

Algae:
Environmental:
Fungi:
Metazoa::Annelida:
Metazoa::Arthropoda:
Metazoa::Gastrotricha:
Metazoa::Nematoda:
Metazoa::Platyhelminthes:
No:Match:
Stramenopiles:
Unicellular:Eukaryotes:
Metazoa::Acanthocephala:
Metazoa::Brachiopoda:
Metazoa::Bryozoa:
Metazoa::Chordata:
Metazoa::Cnidaria:
Metazoa::Echiura:
Metazoa::Entoprocta:
Metazoa::Mollusca:
Fungi
Grand&Isle,&Louisiana&
:
Bik et al. 2012, PLoS ONE, 7(6):e38550 !

Exploring Trees
Ecologically,
what are these
reference taxa
doing??!

Pertinent info for biological
interpretations of DNA data!!!

Challenge 2: Taxonomic, phylogenetic,
and ecological knowledge is imperative for
making meaningful interpretations of
high-throughput sequence datasets

Enoplus spp.
Daptonema spp.
Robbea spp.
Caenorhabditis elegans
Actinomyces spp.
Clostridium spp.
Listeria spp.

Synechococcus spp.

Challenge 3: Extreme
bioinformatics bottleneck for
microbial eukaryote data

rDNA copy number & genome size in eukaryotes
Prokopowich CD, Gregory TR, Crease TJ. (2003) Genome, 46(1):48–50.

Bik et al., in revision
…and in ONE genus of nematodes
Caenorhabditis brenneri ~323 rRNA gene copies

Caenorhabditis briggsae ~56 rRNA gene copies

OCTU Reads OCTU Length Bit Score E-Value Match bp Total bp % Similarity Chimera DB match
27 63 266 525 e-146 265 265 100 -1 B. seani 175
12 9 265 500 e-138 261 264 98.86 -1 B. seani 175
170 8 264 496 e-137 261 264 98.86 0 B. seani 175
513 1 264 494 e-136 259 262 98.85 -2 B. seani 175
579 2 263 492 e-136 258 261 98.85 -2 B. seani 175
570 1 262 492 e-136 258 261 98.85 -1 B. seani 175
394 1 263 490 e-135 260 264 98.48 1 B. seani 175
19 2 269 488 e-135 264 269 98.14 0 B. seani 175
658 1 266 486 e-134 260 265 98.11 -1 B. seani 175
412 2 264 480 e-132 260 265 98.11 1 B. seani 175
465 9 254 478 e-132 251 254 98.82 0 B. seani 175
1164 1 268 478 e-132 261 267 97.75 -1 B. seani 175
304 1 261 474 e-130 255 260 98.08 -1 B. seani 175
868 1 244 460 e-126 242 245 98.78 1 B. seani 175
514 2 274 458 e-126 263 272 96.69 -2 B. seani 175
683 1 250 426 e-116 241 249 96.79 -1 B. seani 175
627 1 230 422 e-115 223 226 98.67 -4 B. seani 175
171 3 212 400 e-108 209 211 99.05 -1 B. seani 175
1223 1 202 355 5.00E-95 198 204 97.06 2 B. seani 175
Porazinska et al. 2010 Zootaxa
Intragenomic variation in Eukaryotic rRNA
Tail!
Head!
Artiﬁcial control community containing known nematode
species, all with corresponding full length reference 18S
Head-Tail Pattern in Nematode OTUs

99% cutoff
OTUs as ‘Clouds’
97% cutoff
How to correlate OTUs
with biological species?

Sparse Databases for Eukaryotes
SILVA&108&Ref&rRNA&Database&(16S/18S)&
Bacteria: 530,197:
Archaea: 25,658:
Eukaryotes: 62,587:

Ambiguous Taxonomy
Taxa Region 1
95%
Region 2
95%
Region 1
99%
Region 2
99%
Metazoa (20 Phyla) 1360 1461 43255 25668
Nematoda 765 879 27020 15518
Annelida 217 197 7073 3869
Arthropoda 128 178 2280 2323
Unicellular eukaryotes 738 1257 15198 22020
Environmental isolates 774 686 12687 9775
No match 480 354 11345 1868
Fungi 225 163 9984 2445
Stramenopiles 137 146 1771 1583
Algae 111 96 975 861
Total (all taxa) 3825 4163 95215 64220
!
Deep sea and shallow water marine sediment
1.2 million reads, 454 GS FLX Titanium
Bik et al. 2012, Molecular Ecology,
21(5):1048-59

Goal 1: A web-based, scalable
visualization framework for
standard data formats

Tier One
Standard outputs from bioinformatic pipelines

•  BIOM (json) ﬁles – OTU tables, metagenome datasets
•  Tab-delimited metadata ﬁles

Goal 2: Destroy biologists’
addiction to pie charts

A pie chart is not the most
informative way to interpret
biodiversity data!

Bacteria: Archaea:
Nematodes:
Cilliates:
Crustaceans:
Circle:size:=:species:abundance:
Circle:color:=:metadata:(sample,:temprature,:pH,:etc.):
Mockup:example:take:from:h"p://www.wefeelﬁne.org/:

Goal 4: Find intuitive ways to
visualize new data outputs

Explicitly Phylogenetic Approaches!
Aligned:
environmental:
sequences:
Guide:Tree:
EvoluAonary:
Placement:of:
short:reads:
:::::::::

http://phylosift.wordpress.com!

Input Sequences
rRNA workflow
protein workflow
profile HMMs used to align
candidates to reference alignment
Taxonomic
Summaries
parallel option
hmmalign
multiple alignment
LAST
fast candidate search
pplacer
phylogenetic placement
LAST
LAST
search input against references
hmmalign
multiple alignment
hmmalign
multiple alignment
Infernal
multiple alignment
LAST
<600 bp
>600 bp
Sample Analysis &
Comparison
Krona plots,
Number of reads placed
for each marker gene
Edge PCA,
Tree visualization,
Bayes factor tests
eachinputsequencescannedagainstbothworkflows

Probability Distributions:
when a pie chart is not a pie chart

Great! !
Not Bad !
Getting Tricky…

Marine:
Metagenome:
Tree:Placement:
Sing:Tree:6:Guppy:

Goal 5: Pester other people
Solicit case study participants

Goal 6: (Phase 2) Build a user and
developer community

Acknowledgements
:
:
Jonathan Eisen Aaron Darling Guillaume Jospin Dongying Wu
David Coil
:
: Further Information

•  hbik@ucdavis.edu
•  @Dr_Bik – updates posted to Twitter
•  Grant proposal now posted on Figshare!

#ievobio Keynote - June 26, 2013

Recommandé

Recommandé

Contenu connexe

Similaire à #ievobio Keynote - June 26, 2013

Similaire à #ievobio Keynote - June 26, 2013 (20)

Dernier

Dernier (20)

#ievobio Keynote - June 26, 2013