SlideShare une entreprise Scribd logo
1  sur  84
NON-INTENSIVE BIOLOGY: 
OPPORTUNITIES AND 
CHALLENGES OF NEXT-GEN 
SEQUENCING 
C. Titus Brown 
Assistant Professor 
MMG / CSE
Lansing, Michigan -> Davis, California
We practice open science! 
Everything discussed here: 
• Code: github.com/ged-lab/ ; BSD license 
• Blog: http://ivory.idyll.org/blog (‘titus brown blog’) 
• Twitter: @ctitusbrown 
• Grants on Lab Web site: http://ged.msu.edu/research.html 
• Papers available as preprints. 
• All my talks are available at slideshare.net/c.titus.brown/
Sequencing! 
• Sequencing of DNA and RNA. 
• Single genomes 
• Transcriptomes 
• Natural populations (tags) 
• Environmental samples/microbial populations (metagenomics) 
• Cheap and massively scalable sequencing of DNA and 
RNA.
Sequencing technology 
• Major, dramatic changes in our ability to sequence DNA 
and RNA quickly and cheaply. 
• Majority of deployed techniques depend on (variations of) 
a single trick: “polony” sequencing. No cloning. 
• Single-molecule sequencing coming along fast, but not 
yet ready for prime time.
Two specific concepts: 
• First, sequencing everything at random is very much 
easier than sequencing a specific gene region. (For 
example, it will soon be easier and cheaper to shotgun-sequence 
all of E. coli then it is to get a single good 
plasmid sequence.) 
• Second, if you are sequencing on a 2-D substrate (wells, 
or surfaces, or whatnot) then any increase in density 
(smaller wells, or better imaging) leads to a squared 
increase in the number of sequences.
Novel genome sequencing
Some numbers 
• For under $1,000 per sample, the Illumina HiSeq machine 
will generate: 
• 200,000,000 reads 
• Each of length ~150 
• In under a week. 
• x 16 samples/run. 
• That’s almost 500 Gbp of sequence, or just over 160x 
human genome…
Shotgun sequencing 
• Collect samples; 
• Extract DNA or RNA; 
• Feed into sequencer; 
• Computationally analyze. 
“Sequence it all and let the 
bioinformaticians sort it 
Wikipedia: Environmental shotgun 
sequencing.png 
out”
The challenges of non-model sequencing 
• Missing or low quality genome reference. 
• Evolutionarily distant. 
• Most extant computational tools focus on model 
organisms – 
• Assume low polymorphism (internal variation) 
• Assume reference genome 
• Assume somewhat reliable functional annotation 
• More significant compute infrastructure 
…and cannot easily or directly be used on critters of interest.
Shotgun sequencing analysis goals: 
• Assembly (what is the text?) 
• Produces new genomes & transcriptomes. 
• Gene discovery for enzymes, drug targets, etc. 
• Counting (how many copies of each book?) 
• Measure gene expression levels, protein-DNA 
interactions 
• Variant calling (how does each edition vary?) 
• Discover genetic variation: genotyping, linkage 
studies… 
• Allele-specific expression analysis.
Shotgun sequencing & assembly 
http://eofdreams.com/library.html; 
http://www.theshreddingservices.com/2011/11/paper-shredding-services-small-business/; 
http://schoolworkhelper.net/charles-dickens%E2%80%99-tale-of-two-cities-summary-analysis/
Shotgun sequencing analysis goals: 
• Assembly (what is the text?) 
• Produces new genomes & transcriptomes. 
• Gene discovery for enzymes, drug targets, etc. 
• Counting (how many copies of each book?) 
• Measure gene expression levels, protein-DNA 
interactions 
• Variant calling (how does each edition vary?) 
• Discover genetic variation: genotyping, linkage 
studies… 
• Allele-specific expression analysis.
Assembly 
It was the best of times, it was the wor 
, it was the worst of times, it was the 
isdom, it was the age of foolishness 
mes, it was the age of wisdom, it was th 
It was the best of times, it was the worst of times, it was the 
age of wisdom, it was the age of foolishness 
…but for lots and lots of fragments!
Mapping: locate reads in reference 
http://en.wikipedia.org/wiki/File:Mapping_Reads.png
Variant detection after mapping 
http://www.kenkraaijeveld.nl/genomics/bioinformatics/
Looking forward 5 years… 
Navin et al., 2011
Some basic math: 
• 1000 single cells from a tumor… 
• …sequenced to 40x haploid coverage with Illumina… 
• …yields 120 Gbp each cell… 
• …or 120 Tbp of data. 
• HiSeq X10 can do the sequencing in ~3 weeks. 
• The variant calling will require 2,000 CPU weeks… 
• …so, given ~2,000 computers, can do this all in one 
month.
Similar math applies: 
• Pathogen detection in blood; 
• Environmental sequencing; 
• Sequencing rare DNA from circulating blood. 
• Two issues: 
•Volume of data & compute 
infrastructure; 
• Latency for clinical applications.
The Data Deluge 
(a traditional requirement for these talks)
Lab approach: Lossy compression 
(Reduce volume of data & compute infrastructure 
requirements) 
Raw data 
(~10-100 GB) Analysis 
"Information" 
~1 GB 
"Information" 
"Information" 
"Information" 
"Information" 
Database & 
integration 
Compression 
(~2 GB) 
Lossy compression can substantially 
reduce data size while retaining 
information needed for later (re)analysis.
http://en.wikipedia.org/wiki/JPEG 
Lossy compression
http://en.wikipedia.org/wiki/JPEG 
Lossy compression
http://en.wikipedia.org/wiki/JPEG 
Lossy compression
http://en.wikipedia.org/wiki/JPEG 
Lossy compression
http://en.wikipedia.org/wiki/JPEG 
Lossy compression
Outline 
• The Molgulid story: investigating non-model 
ascidians ( this is the biology) 
• Meditations on data analysis. 
• Methods, methods, methods. 
•Training, training, training. 
• Concluding thoughts
The Molgula Story – an int’l collaboration 
Elijah Lowe 
(MSU; Naples?) 
Billie Swalla (UW, BEACON) 
Lionel Christiaen (NYU); 
Claudia Racioppi (Naples; NYU)
…to the urochordateswe go! 
Putnam et al., 2008, 
Modified from SwaNllaa t2u0r0e1.
Filter feeding adults 
Molgula oculata 
Molgula occulta 
Molgula oculata Ciona intestinalis 
Elijah Lowe; collaboration w/Billie Swalla
Challenging organisms to work on! 
Molgula occulta & M. oculata: 
• Only spawn ~1 month out of the year 
• Located off the northern coast of France 
• Hybrids not found outside of lab conditions 
• Species cannot be cultured 
•Wet lab techniques are not fully developed for 
species 
• No genomic resources (as of 2008).
Billie Swalla, Nadine Peyriéras, Alberto Stolfi
Tail loss and notochord 
a) M. oculata b) hybrid (occulta egg x oculata sperm) c) M. occulta 
Notochord cells in orange Swalla, B. et al. Science, Vol 274, Issue 5290, 1205-1208 , 15 November 1996
Molgula clades – tail loss is derived
Solitary ascidians 
have determinant 
and invariant cleavage. 
Some species have 
colored cytoplasms. 
(Boltenia villosa) 
The cell lineage is very 
similar in Ciona, Phallusia, 
Halocynthia roretzi & 
Molgula oculata.
Molgula occidentalis 
Ciona intestinalis
Notochord formation (convergence & 
extension) in ascidians is highly 
conserved. 
Ciona savignyi Jiang and Smith, 2007
Notochord Formation 
in Molgulids 
Molgula oculata notochord 
(40 cells, converged & extended) 
Molgula occulta no notochord 
(20 cells, not converged & extended) 
Hybrid notochord 
(20 cells, converged & extended) 
Swalla and Jeffery, 1996
First we applied mRNAseq… 
Lowe et al., in review (PeerJ). https://peerj.com/preprints/505/
…which gave us entire transcriptomes… 
Lowe et al., in review (PeerJ). https://peerj.com/preprints/505/
…then we sequenced their genomes... 
• 3 species: 
Molgula occidentalis (tailed) – “MOXI” 
Molgula oculata (tailed) – “MOCU” 
Molgula occulta (tail-less) – “MOCC” 
• 3 lanes: 300-400 bp; 650-750 bp; 900-1000 bp 
• ≥ 200X coverage each genome 
De novo assembly by Elijah Lowe (MSU) 
Stolfi et al., eLife, 2014; http://dx.doi.org/10.7554/eLife.03728
…which gave us most of their genes (and 
regulatory elements?) 
Genome assembly statistics: 
Stolfi et al., eLife, 2014; http://dx.doi.org/10.7554/eLife.03728
Shift in differentially expressed genes from 
gastrulation to neurulation 
M. ocu vs. M. occ gastrula M. ocu vs. M. occ neurula 
Differentially expressed during neurulation in M. ocu vs M. occ 
Elijah Lowe
Notochord gene expression similar to tailed 
species 
-10 -5 0 5 10 15 
-10 -5 0 5 10 15 
Expression difference Hybrid vs Parent species 
log2(hybrid)-log2(oculata) 
log2(hybrid)-log2(occulta) 
Elijah Lowe
Heterochronic Shift in MolgulidaeDevelopment 
*79 genes examined 
across six species
Transgenics of reporter constructs 
(“Mutual intelligibility” across ~350 my) 
Stolfi et al., eLife, 2014; http://dx.doi.org/10.7554/eLife.03728
Prickle is a key part of the notochord program. 
Veeman, M., et al., 2007 
•Planar cell 
polarity (PCP) 
pathway 
•Involved in 
convergence and 
extension
Prickle expressed in notochord cells of 
tailless ascidians. 
Mita et al Zool. Sci., 2010 
M. occulta gastrulation 
Ciona intestinalis 
Satoh Nature Reviews Genetics 4, 2003 
FGF Bra Pk 
Elijah Lowe
(Re)booting the Molgula -- 
• Determined conservation of cardiopharyngeal 
developmental program, despite shifts in cis-regulatory 
sequences (Stolfi et al, eLife, 2014). 
• Examining heterochronic shifts in developmental timing 
(tail loss) (Maliska et al., in preparation). 
• Connecting evolutionary shifts in developmental gene 
regulatory networks with conserved molecular profiles 
(Lowe et al, submitted; Lowe et al., in preparation).
More thoughts on Molgula 
• One grad student, two transcriptomes, three genomes, 
four years… 
• Genomic resources are enabling a sprawling international 
collaboration (UW/BEACON, MSU/BEACON, NYU, 
Naples, Paris) 
• ! Methods development key!
How Science Works 
Data 
Analysis 
Data generation
Luckily, data analysis is cheap and easy!
Err, well, actually… 
Data generation 
Data Analysis 
http://www.pixelpog.com/ftpimages/GnomesAttack.jpg
It is now easy to generate sequencing 
data sets of such a size and scale that 
the first round analysis cannot even be 
completed.
My research: 
theoretical => applied solutions to scale. 
Theoretical advances 
in data structures and 
algorithms 
Practically useful & usable 
implementations, at scale. 
Demonstrated 
effectiveness on real data.
My research: three methods. 
1. Adaptation of a suite of probabilistic data structures for 
representing set membership and counting (Bloom filters 
and CountMin Sketch). (Zhang et al., PLoS One, 2014.) 
2. An online streaming approach to lossy compression of 
sequencing data. (Brown et al., arXiv, 2012; Howe et al., PNAS, 2014.) 
3. Compressible de Bruijn graph representation for 
assembly. (Pell et al., PNAS, 2012.)
Method #2 - Digital normalization 
(a computational version of library normalization) 
Suppose you have a 
dilution factor of A (10) to 
B(1). To get 10x of B you 
need to get 100x of A! 
Overkill!! 
This 100x will consume 
disk space and, because 
of errors, memory. 
We can discard it for 
you…
Digital normalization
Digital normalization
Digital normalization
Digital normalization
Digital normalization
Digital normalization
Digital normalization retains information, while 
discarding data and errors
Digital normalization approach 
A digital analog to cDNA library normalization, diginorm: 
• Streaming & single pass: looks at each read at most 
once; 
• Does not “collect” the majority of errors; 
• Keeps all low-coverage reads; 
• Smooths out coverage of sequencing. 
=> 
Enables analyses that are otherwise completely 
impossible.
Witness the power of this fully operational 
set of sequence analysis methods: 
1. Assembling soil metagenomes. 
Howe et al., PNAS, 2014 (w/Tiedje) 
2. Understanding bone-eating worm symbionts. 
Goffredi et al., ISME, 2014. 
3. An ultra-deep look at the lamprey transcriptome. 
Scott et al., in preparation (w/Li) 
4. Understanding development in Molgulid ascidians. 
Stolfi et al, eLife 2014; etc.
Open science 
Guiding principle: methods that aren’t broadly 
available aren’t very useful. 
(=> Preprints, open source code, blog posts, Twitter, 
training, etc.) 
Estimated ~1000 users of our software. 
Diginorm now included in Trinity software from Broad 
Institute (~10,000 users) 
Illumina TruSeq long-read technology now 
incorporates our approach (~100,000 users)
Current research: 
Compressive algorithms for sequence 
analysis 
Raw data 
(~10-100 GB) Analysis 
"Information" 
~1 GB 
"Information" 
"Information" 
"Information" 
"Information" 
Database & 
integration 
Compression 
(~2 GB) 
Can we enable and accelerate sequence-based 
inquiry by making all basic analysis 
easier and some analyses possible?
The data challenge in biology 
In 5-10 years, we will have nigh-infinite data. 
(Genomic, transcriptomic, proteomic, metabolomic, 
…?) 
We currently have no good way of querying, 
exploring, investigating, or mining these data sets, 
especially across multiple locations.. 
Moreover, most data is unavailable until after 
publication… 
…which, in practice, means it will be lost.
Infrastructure: distributed graph database server 
Web interface + API 
Compute server 
(Galaxy? 
Arvados?) 
Data/ 
Info 
Raw data sets 
Public 
servers 
"Walled 
garden" 
server 
Private 
server 
Graph query layer 
Upload/submit 
(NCBI, KBase) 
Import 
(MG-RAST, 
SRA, EBI)
“Data Intensive Biology” 
• Increasingly, relevant data is out there or can be 
generated fairly inexpensively. 
• But what does the data mean? How can we get it to yield 
putative answers? How can we integrate it with other 
people’s data? 
• Virtually nobody in biology is trained to do this. 
• Virtually nobody in biology is being trained in how to do 
this.
Summer NGS workshop (2010-2017)
Perspectives on training 
• Prediction: The single biggest 
challenge facing biology over the 
next 20 years is the lack of data 
analysis training (see: NIH DIWG 
report) 
• Data analysis is not turning the 
crank; it is an intellectual exercise 
on par with experimental design or 
paper writing. 
• Training is systematically 
undervalued in academia (!?)
Training - looking forward 
• NIH “Big Data 2 Knowledge” (BD2K) will be investing 
~$20-40m in training each year (my estimate). 
Biomedical science increasingly depends on data 
analysis. 
• Moore, Sloan Foundations are investing heavily in training 
(see: Software Carpentry) 
• NSF BIO Centers have stated that “training is the second 
most important problem that all of us have”. 
We need to figure out solutions…
Funding
Students and postdocs 
Former: 
• Dr. Jason Pell (Google NYC) 
• Asst Professor Adina Howe (Iowa State) 
• Current: 
• Dr. Likit Preeyanon (MMG) 
• Elijah Lowe (CSE) 
• Qingpeng Zhang (CSE) 
• Jaron Guo (MMG) 
• Camille Scott (CSE) 
• Michael Crusoe 
• Luiz Irber (CSE) 
• Dr. Sherine Awad (MMG)
Students and postdocs 
Former: 
• Dr. Jason Pell (Google NYC) 
• Asst Professor Adina Howe (Iowa State) 
• Current: 
• Dr. Likit Preeyanon (MMG) 
• Elijah Lowe (CSE) 
• Qingpeng Zhang (CSE) 
• Jaron Guo (MMG) 
• Camille Scott (CSE) 
• Michael Crusoe 
• Luiz Irber (CSE) 
• Dr. Sherine Awad (MMG)

Contenu connexe

Tendances

2013 stamps-intro-assembly
2013 stamps-intro-assembly2013 stamps-intro-assembly
2013 stamps-intro-assemblyc.titus.brown
 
2013 caltech-edrn-talk
2013 caltech-edrn-talk2013 caltech-edrn-talk
2013 caltech-edrn-talkc.titus.brown
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsmikaelhuss
 
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...c.titus.brown
 
2014 anu-canberra-streaming
2014 anu-canberra-streaming2014 anu-canberra-streaming
2014 anu-canberra-streamingc.titus.brown
 
2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grcc.titus.brown
 
2012 hpcuserforum talk
2012 hpcuserforum talk2012 hpcuserforum talk
2012 hpcuserforum talkc.titus.brown
 
2013 stamps-assembly-methods.pptx
2013 stamps-assembly-methods.pptx2013 stamps-assembly-methods.pptx
2013 stamps-assembly-methods.pptxc.titus.brown
 
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...Keith Bradnam
 
Hail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton Seed
Hail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton SeedHail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton Seed
Hail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton SeedSpark Summit
 
Jan2016 dnanexus giab uses andrew carroll
Jan2016 dnanexus giab uses andrew carrollJan2016 dnanexus giab uses andrew carroll
Jan2016 dnanexus giab uses andrew carrollGenomeInABottle
 
Intro to metagenomic binning
Intro to metagenomic binningIntro to metagenomic binning
Intro to metagenomic binningA. Murat Eren
 
Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Keith Bradnam
 
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and MetagenomicsCross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics Christopher Mason
 

Tendances (20)

2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
2013 stamps-intro-assembly
2013 stamps-intro-assembly2013 stamps-intro-assembly
2013 stamps-intro-assembly
 
2013 caltech-edrn-talk
2013 caltech-edrn-talk2013 caltech-edrn-talk
2013 caltech-edrn-talk
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
2013 alumni-webinar
2013 alumni-webinar2013 alumni-webinar
2013 alumni-webinar
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
 
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...
2014 talk at NYU CUSP: "Biology Caught the Bus: Now what? Sequencing, Big Dat...
 
2014 anu-canberra-streaming
2014 anu-canberra-streaming2014 anu-canberra-streaming
2014 anu-canberra-streaming
 
2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grc
 
2012 hpcuserforum talk
2012 hpcuserforum talk2012 hpcuserforum talk
2012 hpcuserforum talk
 
2013 stamps-assembly-methods.pptx
2013 stamps-assembly-methods.pptx2013 stamps-assembly-methods.pptx
2013 stamps-assembly-methods.pptx
 
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
 
Hail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton Seed
Hail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton SeedHail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton Seed
Hail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton Seed
 
Jan2016 pac bio giab
Jan2016 pac bio giabJan2016 pac bio giab
Jan2016 pac bio giab
 
Jan2016 dnanexus giab uses andrew carroll
Jan2016 dnanexus giab uses andrew carrollJan2016 dnanexus giab uses andrew carroll
Jan2016 dnanexus giab uses andrew carroll
 
Intro to metagenomic binning
Intro to metagenomic binningIntro to metagenomic binning
Intro to metagenomic binning
 
Folker Meyer: Metagenomic Data Annotation
Folker Meyer: Metagenomic Data AnnotationFolker Meyer: Metagenomic Data Annotation
Folker Meyer: Metagenomic Data Annotation
 
Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...
 
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and MetagenomicsCross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 

En vedette

Circles of San Antonio Community Coalition Bexar County Needs Assessment Sept...
Circles of San Antonio Community Coalition Bexar County Needs Assessment Sept...Circles of San Antonio Community Coalition Bexar County Needs Assessment Sept...
Circles of San Antonio Community Coalition Bexar County Needs Assessment Sept...Circles of San Antonio Community Coalition
 
Il vino da socievole a sociale
Il vino da socievole a socialeIl vino da socievole a sociale
Il vino da socievole a socialeSlawka G. Scarso
 
Virtualizing the Next Generation of Server Workloads with AMD™
Virtualizing the Next Generation of Server Workloads with AMD™Virtualizing the Next Generation of Server Workloads with AMD™
Virtualizing the Next Generation of Server Workloads with AMD™James Price
 
Bloggingforbusiness2
Bloggingforbusiness2Bloggingforbusiness2
Bloggingforbusiness2Andre Kleist
 
OSHA Goes On the Attack as the Obama Administration Winds Down: Are You Prepa...
OSHA Goes On the Attack as the Obama Administration Winds Down: Are You Prepa...OSHA Goes On the Attack as the Obama Administration Winds Down: Are You Prepa...
OSHA Goes On the Attack as the Obama Administration Winds Down: Are You Prepa...Kegler Brown Hill + Ritter
 
Undangan (Kak Melly n Kak Dicky)
Undangan (Kak Melly n Kak Dicky)Undangan (Kak Melly n Kak Dicky)
Undangan (Kak Melly n Kak Dicky)@rtNya
 
Trainings Evaluation Reports WPS Phase-II Layyah
Trainings Evaluation Reports WPS Phase-II LayyahTrainings Evaluation Reports WPS Phase-II Layyah
Trainings Evaluation Reports WPS Phase-II LayyahZafar Ahmad
 
Grandparents day
Grandparents day Grandparents day
Grandparents day Takahe One
 
The tsunami that washed time away
The tsunami that washed time awayThe tsunami that washed time away
The tsunami that washed time awayTakahe One
 
Section 1031 For Legistlative Review 12.16.09
Section 1031 For Legistlative Review 12.16.09Section 1031 For Legistlative Review 12.16.09
Section 1031 For Legistlative Review 12.16.09Edmund_Wheeler
 
New Conferencing - the changing role of events
New Conferencing - the changing role of eventsNew Conferencing - the changing role of events
New Conferencing - the changing role of eventsLive Union
 
Key Compliances in Investing Abroad | Vinita Bahri-Mehra
Key Compliances in Investing Abroad | Vinita Bahri-MehraKey Compliances in Investing Abroad | Vinita Bahri-Mehra
Key Compliances in Investing Abroad | Vinita Bahri-MehraKegler Brown Hill + Ritter
 
Lakewood Lodge Information
Lakewood Lodge Information Lakewood Lodge Information
Lakewood Lodge Information Takahe One
 

En vedette (20)

Circles of San Antonio Community Coalition Bexar County Needs Assessment Sept...
Circles of San Antonio Community Coalition Bexar County Needs Assessment Sept...Circles of San Antonio Community Coalition Bexar County Needs Assessment Sept...
Circles of San Antonio Community Coalition Bexar County Needs Assessment Sept...
 
What is electricity
What is electricityWhat is electricity
What is electricity
 
Il vino da socievole a sociale
Il vino da socievole a socialeIl vino da socievole a sociale
Il vino da socievole a sociale
 
TLC History
TLC HistoryTLC History
TLC History
 
Virtualizing the Next Generation of Server Workloads with AMD™
Virtualizing the Next Generation of Server Workloads with AMD™Virtualizing the Next Generation of Server Workloads with AMD™
Virtualizing the Next Generation of Server Workloads with AMD™
 
Bloggingforbusiness2
Bloggingforbusiness2Bloggingforbusiness2
Bloggingforbusiness2
 
OSHA Goes On the Attack as the Obama Administration Winds Down: Are You Prepa...
OSHA Goes On the Attack as the Obama Administration Winds Down: Are You Prepa...OSHA Goes On the Attack as the Obama Administration Winds Down: Are You Prepa...
OSHA Goes On the Attack as the Obama Administration Winds Down: Are You Prepa...
 
Croisements11
Croisements11Croisements11
Croisements11
 
Autograf_comm_f
Autograf_comm_fAutograf_comm_f
Autograf_comm_f
 
Undangan (Kak Melly n Kak Dicky)
Undangan (Kak Melly n Kak Dicky)Undangan (Kak Melly n Kak Dicky)
Undangan (Kak Melly n Kak Dicky)
 
Trainings Evaluation Reports WPS Phase-II Layyah
Trainings Evaluation Reports WPS Phase-II LayyahTrainings Evaluation Reports WPS Phase-II Layyah
Trainings Evaluation Reports WPS Phase-II Layyah
 
Grandparents day
Grandparents day Grandparents day
Grandparents day
 
2012 stamps-mbl-1
2012 stamps-mbl-12012 stamps-mbl-1
2012 stamps-mbl-1
 
2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talk
 
The tsunami that washed time away
The tsunami that washed time awayThe tsunami that washed time away
The tsunami that washed time away
 
Section 1031 For Legistlative Review 12.16.09
Section 1031 For Legistlative Review 12.16.09Section 1031 For Legistlative Review 12.16.09
Section 1031 For Legistlative Review 12.16.09
 
New Conferencing - the changing role of events
New Conferencing - the changing role of eventsNew Conferencing - the changing role of events
New Conferencing - the changing role of events
 
Key Compliances in Investing Abroad | Vinita Bahri-Mehra
Key Compliances in Investing Abroad | Vinita Bahri-MehraKey Compliances in Investing Abroad | Vinita Bahri-Mehra
Key Compliances in Investing Abroad | Vinita Bahri-Mehra
 
Formulacion del pei
Formulacion del peiFormulacion del pei
Formulacion del pei
 
Lakewood Lodge Information
Lakewood Lodge Information Lakewood Lodge Information
Lakewood Lodge Information
 

Similaire à 2014 bangkok-talk

Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Manikhandan Mudaliar
 
2013 bms-retreat-talk
2013 bms-retreat-talk2013 bms-retreat-talk
2013 bms-retreat-talkc.titus.brown
 
Working with Chromosomes
Working with ChromosomesWorking with Chromosomes
Working with ChromosomesIoanna Leontiou
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08Russ Altman
 
Studying the microbiome
Studying the microbiomeStudying the microbiome
Studying the microbiomeMick Watson
 
Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomicsNikhil Aggarwal
 
2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotesc.titus.brown
 
Introduction to Bioinformatics (1).pptx
Introduction to Bioinformatics (1).pptxIntroduction to Bioinformatics (1).pptx
Introduction to Bioinformatics (1).pptxzzzzzz83
 
Ewan Birney Biocuration 2013
Ewan Birney Biocuration 2013Ewan Birney Biocuration 2013
Ewan Birney Biocuration 2013Iddo
 
Microbial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
Microbial Phylogenomics (EVE161) Class 10-11: Genome SequencingMicrobial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
Microbial Phylogenomics (EVE161) Class 10-11: Genome SequencingJonathan Eisen
 
Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.Monica Munoz-Torres
 
Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnected Data World
 
Towards Incidental Collaboratories For Experimental Data
Towards Incidental Collaboratories For Experimental DataTowards Incidental Collaboratories For Experimental Data
Towards Incidental Collaboratories For Experimental DataAnita de Waard
 
L1 intro biology-pdf
L1 intro biology-pdfL1 intro biology-pdf
L1 intro biology-pdfOliver Lovell
 
Apollo Workshop AGS2017 Introduction
Apollo Workshop AGS2017 IntroductionApollo Workshop AGS2017 Introduction
Apollo Workshop AGS2017 IntroductionMonica Munoz-Torres
 

Similaire à 2014 bangkok-talk (20)

2014 mmg-talk
2014 mmg-talk2014 mmg-talk
2014 mmg-talk
 
2014 naples
2014 naples2014 naples
2014 naples
 
2014 ucl
2014 ucl2014 ucl
2014 ucl
 
2014 villefranche
2014 villefranche2014 villefranche
2014 villefranche
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
 
2013 bms-retreat-talk
2013 bms-retreat-talk2013 bms-retreat-talk
2013 bms-retreat-talk
 
Introduction to 16S Microbiome Analysis
Introduction to 16S Microbiome AnalysisIntroduction to 16S Microbiome Analysis
Introduction to 16S Microbiome Analysis
 
Working with Chromosomes
Working with ChromosomesWorking with Chromosomes
Working with Chromosomes
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08
 
Studying the microbiome
Studying the microbiomeStudying the microbiome
Studying the microbiome
 
Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomics
 
2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes
 
Introduction to Bioinformatics (1).pptx
Introduction to Bioinformatics (1).pptxIntroduction to Bioinformatics (1).pptx
Introduction to Bioinformatics (1).pptx
 
Ewan Birney Biocuration 2013
Ewan Birney Biocuration 2013Ewan Birney Biocuration 2013
Ewan Birney Biocuration 2013
 
Microbial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
Microbial Phylogenomics (EVE161) Class 10-11: Genome SequencingMicrobial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
Microbial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
 
Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.
 
Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics Institute
 
Towards Incidental Collaboratories For Experimental Data
Towards Incidental Collaboratories For Experimental DataTowards Incidental Collaboratories For Experimental Data
Towards Incidental Collaboratories For Experimental Data
 
L1 intro biology-pdf
L1 intro biology-pdfL1 intro biology-pdf
L1 intro biology-pdf
 
Apollo Workshop AGS2017 Introduction
Apollo Workshop AGS2017 IntroductionApollo Workshop AGS2017 Introduction
Apollo Workshop AGS2017 Introduction
 

Plus de c.titus.brown

Plus de c.titus.brown (20)

2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
2016 davis-biotech
2016 davis-biotech2016 davis-biotech
2016 davis-biotech
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
2015 msu-code-review
2015 msu-code-review2015 msu-code-review
2015 msu-code-review
 
2015 pycon-talk
2015 pycon-talk2015 pycon-talk
2015 pycon-talk
 
2015 opencon-webcast
2015 opencon-webcast2015 opencon-webcast
2015 opencon-webcast
 
2015 vancouver-vanbug
2015 vancouver-vanbug2015 vancouver-vanbug
2015 vancouver-vanbug
 
2015 osu-metagenome
2015 osu-metagenome2015 osu-metagenome
2015 osu-metagenome
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
 
2015 pag-chicken
2015 pag-chicken2015 pag-chicken
2015 pag-chicken
 
2015 pag-metagenome
2015 pag-metagenome2015 pag-metagenome
2015 pag-metagenome
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
2014 abic-talk
2014 abic-talk2014 abic-talk
2014 abic-talk
 
2014 nci-edrn
2014 nci-edrn2014 nci-edrn
2014 nci-edrn
 
2014 wcgalp
2014 wcgalp2014 wcgalp
2014 wcgalp
 
2014 moore-ddd
2014 moore-ddd2014 moore-ddd
2014 moore-ddd
 
2014 ismb-extra-slides
2014 ismb-extra-slides2014 ismb-extra-slides
2014 ismb-extra-slides
 
2014 bosc-keynote
2014 bosc-keynote2014 bosc-keynote
2014 bosc-keynote
 

Dernier

FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330
 

Dernier (20)

FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 

2014 bangkok-talk

  • 1. NON-INTENSIVE BIOLOGY: OPPORTUNITIES AND CHALLENGES OF NEXT-GEN SEQUENCING C. Titus Brown Assistant Professor MMG / CSE
  • 2. Lansing, Michigan -> Davis, California
  • 3. We practice open science! Everything discussed here: • Code: github.com/ged-lab/ ; BSD license • Blog: http://ivory.idyll.org/blog (‘titus brown blog’) • Twitter: @ctitusbrown • Grants on Lab Web site: http://ged.msu.edu/research.html • Papers available as preprints. • All my talks are available at slideshare.net/c.titus.brown/
  • 4. Sequencing! • Sequencing of DNA and RNA. • Single genomes • Transcriptomes • Natural populations (tags) • Environmental samples/microbial populations (metagenomics) • Cheap and massively scalable sequencing of DNA and RNA.
  • 5. Sequencing technology • Major, dramatic changes in our ability to sequence DNA and RNA quickly and cheaply. • Majority of deployed techniques depend on (variations of) a single trick: “polony” sequencing. No cloning. • Single-molecule sequencing coming along fast, but not yet ready for prime time.
  • 6.
  • 7.
  • 8. Two specific concepts: • First, sequencing everything at random is very much easier than sequencing a specific gene region. (For example, it will soon be easier and cheaper to shotgun-sequence all of E. coli then it is to get a single good plasmid sequence.) • Second, if you are sequencing on a 2-D substrate (wells, or surfaces, or whatnot) then any increase in density (smaller wells, or better imaging) leads to a squared increase in the number of sequences.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. Some numbers • For under $1,000 per sample, the Illumina HiSeq machine will generate: • 200,000,000 reads • Each of length ~150 • In under a week. • x 16 samples/run. • That’s almost 500 Gbp of sequence, or just over 160x human genome…
  • 15. Shotgun sequencing • Collect samples; • Extract DNA or RNA; • Feed into sequencer; • Computationally analyze. “Sequence it all and let the bioinformaticians sort it Wikipedia: Environmental shotgun sequencing.png out”
  • 16. The challenges of non-model sequencing • Missing or low quality genome reference. • Evolutionarily distant. • Most extant computational tools focus on model organisms – • Assume low polymorphism (internal variation) • Assume reference genome • Assume somewhat reliable functional annotation • More significant compute infrastructure …and cannot easily or directly be used on critters of interest.
  • 17. Shotgun sequencing analysis goals: • Assembly (what is the text?) • Produces new genomes & transcriptomes. • Gene discovery for enzymes, drug targets, etc. • Counting (how many copies of each book?) • Measure gene expression levels, protein-DNA interactions • Variant calling (how does each edition vary?) • Discover genetic variation: genotyping, linkage studies… • Allele-specific expression analysis.
  • 18. Shotgun sequencing & assembly http://eofdreams.com/library.html; http://www.theshreddingservices.com/2011/11/paper-shredding-services-small-business/; http://schoolworkhelper.net/charles-dickens%E2%80%99-tale-of-two-cities-summary-analysis/
  • 19. Shotgun sequencing analysis goals: • Assembly (what is the text?) • Produces new genomes & transcriptomes. • Gene discovery for enzymes, drug targets, etc. • Counting (how many copies of each book?) • Measure gene expression levels, protein-DNA interactions • Variant calling (how does each edition vary?) • Discover genetic variation: genotyping, linkage studies… • Allele-specific expression analysis.
  • 20. Assembly It was the best of times, it was the wor , it was the worst of times, it was the isdom, it was the age of foolishness mes, it was the age of wisdom, it was th It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness …but for lots and lots of fragments!
  • 21. Mapping: locate reads in reference http://en.wikipedia.org/wiki/File:Mapping_Reads.png
  • 22. Variant detection after mapping http://www.kenkraaijeveld.nl/genomics/bioinformatics/
  • 23. Looking forward 5 years… Navin et al., 2011
  • 24. Some basic math: • 1000 single cells from a tumor… • …sequenced to 40x haploid coverage with Illumina… • …yields 120 Gbp each cell… • …or 120 Tbp of data. • HiSeq X10 can do the sequencing in ~3 weeks. • The variant calling will require 2,000 CPU weeks… • …so, given ~2,000 computers, can do this all in one month.
  • 25. Similar math applies: • Pathogen detection in blood; • Environmental sequencing; • Sequencing rare DNA from circulating blood. • Two issues: •Volume of data & compute infrastructure; • Latency for clinical applications.
  • 26. The Data Deluge (a traditional requirement for these talks)
  • 27. Lab approach: Lossy compression (Reduce volume of data & compute infrastructure requirements) Raw data (~10-100 GB) Analysis "Information" ~1 GB "Information" "Information" "Information" "Information" Database & integration Compression (~2 GB) Lossy compression can substantially reduce data size while retaining information needed for later (re)analysis.
  • 33. Outline • The Molgulid story: investigating non-model ascidians ( this is the biology) • Meditations on data analysis. • Methods, methods, methods. •Training, training, training. • Concluding thoughts
  • 34. The Molgula Story – an int’l collaboration Elijah Lowe (MSU; Naples?) Billie Swalla (UW, BEACON) Lionel Christiaen (NYU); Claudia Racioppi (Naples; NYU)
  • 35. …to the urochordateswe go! Putnam et al., 2008, Modified from SwaNllaa t2u0r0e1.
  • 36. Filter feeding adults Molgula oculata Molgula occulta Molgula oculata Ciona intestinalis Elijah Lowe; collaboration w/Billie Swalla
  • 37. Challenging organisms to work on! Molgula occulta & M. oculata: • Only spawn ~1 month out of the year • Located off the northern coast of France • Hybrids not found outside of lab conditions • Species cannot be cultured •Wet lab techniques are not fully developed for species • No genomic resources (as of 2008).
  • 38. Billie Swalla, Nadine Peyriéras, Alberto Stolfi
  • 39. Tail loss and notochord a) M. oculata b) hybrid (occulta egg x oculata sperm) c) M. occulta Notochord cells in orange Swalla, B. et al. Science, Vol 274, Issue 5290, 1205-1208 , 15 November 1996
  • 40. Molgula clades – tail loss is derived
  • 41. Solitary ascidians have determinant and invariant cleavage. Some species have colored cytoplasms. (Boltenia villosa) The cell lineage is very similar in Ciona, Phallusia, Halocynthia roretzi & Molgula oculata.
  • 43.
  • 44. Notochord formation (convergence & extension) in ascidians is highly conserved. Ciona savignyi Jiang and Smith, 2007
  • 45. Notochord Formation in Molgulids Molgula oculata notochord (40 cells, converged & extended) Molgula occulta no notochord (20 cells, not converged & extended) Hybrid notochord (20 cells, converged & extended) Swalla and Jeffery, 1996
  • 46. First we applied mRNAseq… Lowe et al., in review (PeerJ). https://peerj.com/preprints/505/
  • 47. …which gave us entire transcriptomes… Lowe et al., in review (PeerJ). https://peerj.com/preprints/505/
  • 48. …then we sequenced their genomes... • 3 species: Molgula occidentalis (tailed) – “MOXI” Molgula oculata (tailed) – “MOCU” Molgula occulta (tail-less) – “MOCC” • 3 lanes: 300-400 bp; 650-750 bp; 900-1000 bp • ≥ 200X coverage each genome De novo assembly by Elijah Lowe (MSU) Stolfi et al., eLife, 2014; http://dx.doi.org/10.7554/eLife.03728
  • 49. …which gave us most of their genes (and regulatory elements?) Genome assembly statistics: Stolfi et al., eLife, 2014; http://dx.doi.org/10.7554/eLife.03728
  • 50. Shift in differentially expressed genes from gastrulation to neurulation M. ocu vs. M. occ gastrula M. ocu vs. M. occ neurula Differentially expressed during neurulation in M. ocu vs M. occ Elijah Lowe
  • 51. Notochord gene expression similar to tailed species -10 -5 0 5 10 15 -10 -5 0 5 10 15 Expression difference Hybrid vs Parent species log2(hybrid)-log2(oculata) log2(hybrid)-log2(occulta) Elijah Lowe
  • 52. Heterochronic Shift in MolgulidaeDevelopment *79 genes examined across six species
  • 53. Transgenics of reporter constructs (“Mutual intelligibility” across ~350 my) Stolfi et al., eLife, 2014; http://dx.doi.org/10.7554/eLife.03728
  • 54. Prickle is a key part of the notochord program. Veeman, M., et al., 2007 •Planar cell polarity (PCP) pathway •Involved in convergence and extension
  • 55. Prickle expressed in notochord cells of tailless ascidians. Mita et al Zool. Sci., 2010 M. occulta gastrulation Ciona intestinalis Satoh Nature Reviews Genetics 4, 2003 FGF Bra Pk Elijah Lowe
  • 56. (Re)booting the Molgula -- • Determined conservation of cardiopharyngeal developmental program, despite shifts in cis-regulatory sequences (Stolfi et al, eLife, 2014). • Examining heterochronic shifts in developmental timing (tail loss) (Maliska et al., in preparation). • Connecting evolutionary shifts in developmental gene regulatory networks with conserved molecular profiles (Lowe et al, submitted; Lowe et al., in preparation).
  • 57. More thoughts on Molgula • One grad student, two transcriptomes, three genomes, four years… • Genomic resources are enabling a sprawling international collaboration (UW/BEACON, MSU/BEACON, NYU, Naples, Paris) • ! Methods development key!
  • 58. How Science Works Data Analysis Data generation
  • 59. Luckily, data analysis is cheap and easy!
  • 60. Err, well, actually… Data generation Data Analysis http://www.pixelpog.com/ftpimages/GnomesAttack.jpg
  • 61. It is now easy to generate sequencing data sets of such a size and scale that the first round analysis cannot even be completed.
  • 62. My research: theoretical => applied solutions to scale. Theoretical advances in data structures and algorithms Practically useful & usable implementations, at scale. Demonstrated effectiveness on real data.
  • 63. My research: three methods. 1. Adaptation of a suite of probabilistic data structures for representing set membership and counting (Bloom filters and CountMin Sketch). (Zhang et al., PLoS One, 2014.) 2. An online streaming approach to lossy compression of sequencing data. (Brown et al., arXiv, 2012; Howe et al., PNAS, 2014.) 3. Compressible de Bruijn graph representation for assembly. (Pell et al., PNAS, 2012.)
  • 64. Method #2 - Digital normalization (a computational version of library normalization) Suppose you have a dilution factor of A (10) to B(1). To get 10x of B you need to get 100x of A! Overkill!! This 100x will consume disk space and, because of errors, memory. We can discard it for you…
  • 71. Digital normalization retains information, while discarding data and errors
  • 72. Digital normalization approach A digital analog to cDNA library normalization, diginorm: • Streaming & single pass: looks at each read at most once; • Does not “collect” the majority of errors; • Keeps all low-coverage reads; • Smooths out coverage of sequencing. => Enables analyses that are otherwise completely impossible.
  • 73. Witness the power of this fully operational set of sequence analysis methods: 1. Assembling soil metagenomes. Howe et al., PNAS, 2014 (w/Tiedje) 2. Understanding bone-eating worm symbionts. Goffredi et al., ISME, 2014. 3. An ultra-deep look at the lamprey transcriptome. Scott et al., in preparation (w/Li) 4. Understanding development in Molgulid ascidians. Stolfi et al, eLife 2014; etc.
  • 74. Open science Guiding principle: methods that aren’t broadly available aren’t very useful. (=> Preprints, open source code, blog posts, Twitter, training, etc.) Estimated ~1000 users of our software. Diginorm now included in Trinity software from Broad Institute (~10,000 users) Illumina TruSeq long-read technology now incorporates our approach (~100,000 users)
  • 75. Current research: Compressive algorithms for sequence analysis Raw data (~10-100 GB) Analysis "Information" ~1 GB "Information" "Information" "Information" "Information" Database & integration Compression (~2 GB) Can we enable and accelerate sequence-based inquiry by making all basic analysis easier and some analyses possible?
  • 76. The data challenge in biology In 5-10 years, we will have nigh-infinite data. (Genomic, transcriptomic, proteomic, metabolomic, …?) We currently have no good way of querying, exploring, investigating, or mining these data sets, especially across multiple locations.. Moreover, most data is unavailable until after publication… …which, in practice, means it will be lost.
  • 77. Infrastructure: distributed graph database server Web interface + API Compute server (Galaxy? Arvados?) Data/ Info Raw data sets Public servers "Walled garden" server Private server Graph query layer Upload/submit (NCBI, KBase) Import (MG-RAST, SRA, EBI)
  • 78. “Data Intensive Biology” • Increasingly, relevant data is out there or can be generated fairly inexpensively. • But what does the data mean? How can we get it to yield putative answers? How can we integrate it with other people’s data? • Virtually nobody in biology is trained to do this. • Virtually nobody in biology is being trained in how to do this.
  • 79. Summer NGS workshop (2010-2017)
  • 80. Perspectives on training • Prediction: The single biggest challenge facing biology over the next 20 years is the lack of data analysis training (see: NIH DIWG report) • Data analysis is not turning the crank; it is an intellectual exercise on par with experimental design or paper writing. • Training is systematically undervalued in academia (!?)
  • 81. Training - looking forward • NIH “Big Data 2 Knowledge” (BD2K) will be investing ~$20-40m in training each year (my estimate). Biomedical science increasingly depends on data analysis. • Moore, Sloan Foundations are investing heavily in training (see: Software Carpentry) • NSF BIO Centers have stated that “training is the second most important problem that all of us have”. We need to figure out solutions…
  • 83. Students and postdocs Former: • Dr. Jason Pell (Google NYC) • Asst Professor Adina Howe (Iowa State) • Current: • Dr. Likit Preeyanon (MMG) • Elijah Lowe (CSE) • Qingpeng Zhang (CSE) • Jaron Guo (MMG) • Camille Scott (CSE) • Michael Crusoe • Luiz Irber (CSE) • Dr. Sherine Awad (MMG)
  • 84. Students and postdocs Former: • Dr. Jason Pell (Google NYC) • Asst Professor Adina Howe (Iowa State) • Current: • Dr. Likit Preeyanon (MMG) • Elijah Lowe (CSE) • Qingpeng Zhang (CSE) • Jaron Guo (MMG) • Camille Scott (CSE) • Michael Crusoe • Luiz Irber (CSE) • Dr. Sherine Awad (MMG)