SlideShare une entreprise Scribd logo
1  sur  64
C. Titus Brown
Assistant Professor
MMG, CSE, BEACON
Michigan State University
May 2014
ctb@msu.edu
Applying mRNAseq to non-model organisms:
challenges, opportunities, and solutions
We practice open science!
Everything discussed here:
 Code: github.com/ged-lab/ ; BSD license
 Blog: http://ivory.idyll.org/blog (‘titus brown blog’)
 Twitter: @ctitusbrown
 Grants on Lab Web site:
http://ged.msu.edu/research.html
 Preprints available.
Sequencing has become very
inexpensive.
Sequencing costs
 Approximately $1000 of mRNAseq will yield a
decent transcriptome.
 Multiple samples will allow you to generate gene
inventories.
 For the ascidian project I will show you,
 1 graduate student,
 2 transcriptomes,
 3 genomes…
Mapping => quantitation
Reference transcriptome required.
Interpreting RNAseq requires gene
models:
http://www.hitseq.com/images/RNA-seq_AS.jpg
The challenges of non-model
transcriptomics
 Missing or low quality genome reference.
 Evolutionarily distant.
 Most extant computational tools focus on model
organisms –
 Assume low polymorphism (internal variation)
 Assume reference genome
 Assume somewhat reliable functional annotation
 More significant compute infrastructure
…and cannot easily or directly be used on critters of
interest.
Outline
1. Challenges of non-model
transcriptomics.
2. Lamprey: too much data, not enough
genome
3. Digital normalization as a coping
mechanism
4. …applied to Molgulid ascidians…
5. …and back to lamprey.
6. More transcriptome challenges
7. What’s next?
Sea lamprey in the Great Lakes
 Non-native
 Parasite of
medium to large
fishes
 Caused
populations of
host fishes to
crash
Li Lab / Y-W C-D
The problem of lamprey:
 Diverged at base of vertebrates;
evolutionarily distant from model
organisms.
 Large, complicated genome (~2 GB)
 Relatively little existing sequence.
 We sequenced the liver genome…
Lamprey has incomplete genomic sequence
J. Smith et al., PNAS 2009
Evidence of somatic recombination;
100s of mb of sequence eliminated
from genome during development.
More recent evidence (unpub, J.
Smith et al.) suggests that this loss
is developmentally regulated,
results in changes in gene
expression (due to loss of genes!),
and is tissue specific.
Liver genome is not the entire
genome.
Lamprey tissues for which we have
mRNAseq
embryo stages (late blastula,
gastrula, neurula, 22b, neural-
crest migration, 24c1,24c2)
metamorphosis 3 (intestine,
kidney)
ovulatory female head skin
adult intestine
metamorphosis 4 (intestine,
kidney)
preovulatory female eye
adult kidney
metamorphosis 5 (liver, intestine,
kidney)
preovulatory female tail skin
brain paired
metamorphosis 6 (intestine,
kidney)
prespermiating male gill
freshwater (gill, intestine, kidney)
metamorphosis 7 (intestine,
kidney)
mature adult male rope tissue
larval (gill, kidney, liver, intestine) monocytes
spermiating male gill
juvenile (intestine, liver, kidney) brain (0,3,21 dpi)
spermiating male head skin
lips spinal cord (0.3.21 dpi)
supraneural tissue
metamorphosis 1 (intestine,
kidney) spermiating male muscle
small parasite distal intestine,
kidney, proximal intestine
metamorphosis 2 (liver, intestine, salt water (gill, intestine)
Assembly
It was the best of times, it was the wor
, it was the worst of times, it was the
isdom, it was the age of foolishness
mes, it was the age of wisdom, it was th
It was the best of times, it was the worst of times, it was
the age of wisdom, it was the age of foolishness
…but for lots and lots of fragments!
Shared low-level
transcripts may
not reach the
threshold for
assembly.
Main problem (4 years ago):
We have a massive amount of data
that challenges existing computers
when we try to assemble it all
together.
Solution: Digital normalization
(a computational version of library normalization)
Suppose you have a
dilution factor of A (10) to
B(1). To get 10x of B you
need to get 100x of A!
Overkill!!
This 100x will consume
disk space and, because
of errors, memory.
We can discard it for
you…
Digital normalization
Digital normalization
Digital normalization
Digital normalization
Digital normalization
Digital normalization
Digital normalization approach
A digital analog to cDNA library normalization, diginorm:
 Is single pass: looks at each read only once;
 Does not “collect” the majority of errors;
 Keeps all low-coverage reads;
 Smooths out coverage of sequencing.
=> Enables analyses that are otherwise completely
impossible.
Evaluating diginorm – how?
 Can’t assemble lamprey w/o
diginorm; are results any good &
how would we know?
 Need comparative data set
 …ascidians!
Looking at the Molgula…
Putnam et al., 2008,
Nature.Modified from Swalla 2001
Sea squirts!
Molgula oculata
Molgula occulta
Molgula oculata Ciona intestinalis
Elijah Lowe; collaboration w/Billie Swalla
Challenging organisms to work on --
 Only spawn ~1 month out of the year
 Located off the northern coast of France
 Hybrids not found outside of lab conditions
 Species cannot be cultured
 Wet lab techniques are not fully developed for species
Tail loss and notochord genes
a) M. oculata b) hybrid (occulta egg x oculata sperm) c) M. occulta
Notochord cells in orange Swalla, B. et al. Science, Vol 274, Issue 5290, 1205-1208 , 15 November 1996
Diginorm applied to Molgula
embryonic mRNAseq
Substantial
time savings
(3-5x)
<< RAM
Elijah Lowe
Question: does it matter what
assembly pipeline you use? (No)
3
70
25
1
36
13563
35
13
7
4 23 8 1
6
5
Diginorm V/O Raw V/O
Diginorm trinity Raw trinity
Numbers are putative orthologs (reciprocal
best hits) w/Ciona intestinalis, calculated for
each assembly.
Elijah Lowe
How complete are these
transcriptomes?
Elijah Lowe
Shift in differentially expressed genes
from gastrulation to neurulation
M. ocu vs. M. occ gastrula M. ocu vs. M. occ neurula
Differentially expressed during neurulation in M. ocu vs M. occ
Notochord gene expression similar to
tailed species
-10 -5 0 5 10 15
-10-5051015
Expression difference Hybrid vs Parent species
log2(hybrid)-log2(oculata)
log2(hybrid)-log2(occulta)
M. occulta transgenic NoTrlc
Alberto Stolfi & Lionel Christiaen
Lionel Christaen Claudia Racioppi
NYU Statione Zoologica Napoli
Enabling Molgula research…
 Develop candidate genes to generate
hypotheses about gene network
evolution;
 Rapid development of genomic
resources => reporter constructs.
Doesn’t answer any biological questions
directly, but enables us to go looking for
things much faster!
Transcriptome assembly
thoughts
 We can (now) assemble really big data
sets, and get pretty good results.
 We have lots of evidence (some
presented here :) that some assemblies
are not strongly affected by digital
normalization.
(Note: normalization algorithm is now
standard part of Trinity mRNAseq
pipeline.)
Transcriptome results - lamprey
 Started with 5.1 billion reads from 50
different tissues.
(4 years of computational research, and
about 1 month of compute time, GO
HERE)
Ended with:
Lamprey transcriptome basic
stats
 616,000 transcripts (!)
 263,000 transcript families (!)
(This seems like a lot.)
Lamprey transcriptome basic
stats
 616,000 transcripts
 263,000 transcript families
 Only 20436 transcript families have transcripts >
1kb
(compare with mouse: 17331 of 29769 genes
are > 1kb)
So, estimation by thumb ~ not that off, for long
transcripts.
Common vs rare genes
#transcripts
# samples
Camille Scott
Can look at transcripts by tissue -
-
Camille Scott
Too… many… samples…
Camille Scott
Presence/absence clustering
Expression-based clustering
Some known biology recapitulated; and… ???
Camille Scott
Next steps with lamprey
 Far more complete transcriptome than the one
generated from the genome!
 (…but suffering from contamination,
oversensitivity to unprocessed transcripts, …?)
 Enabling studies in –
 Basal vertebrate phylogeny
 Biliary atresia
 Evolutionary origin of brown fat (previously thought
to be mammalian only!)
 Pheromonal response in adults
 Spinal cord regeneration
Next challenges
OK, we can deal with volume of data,
make pretty pictures, and ... Now what?
Contamination!
Both experimental or “real” contaminants are big pro
Camille Scott
Pathway predictions vary
dramatically depending on data
set, annotation
Likit Preeyanon
KEGG
pathway
comparison
across several
different gene
annotation
sets for
chicken
The problem of lopsided gene characterization is
pervasive: e.g., the brain "ignorome"
"...ignorome genes do not differ from well-studied genes in terms of connectivity in coexpression
networks. Nor do they differ with respect to numbers of orthologs, paralogs, or protein domains.
The major distinguishing characteristic between these sets of genes is date of discovery, early
discovery being associated with greater research momentum—a genomic bandwagon effect."
Ref.: Pandey et al. (2014), PLoS One 11, e88889.lide courtesy Erich Schwarz
Practical implications of diginorm
 Data is (essentially) free;
 For some problems, analysis is now
cheaper than data gathering (i.e.
essentially free);
 …plus, we can run most of our
approaches in the cloud (per-hour
rental compute resources).
1. khmer-protocols
 Effort to provide standard “cheap”
assembly protocols for the cloud.
 Entirely copy/paste; ~2-6 days from
raw reads to assembly,
annotations, and differential
expression analysis.
 Open, versioned, forkable, citable.
(“Don’t bother me unless it doesn’t
work.”
Read cleaning
Diginorm
Assembly
Annotation
RSEM differential
expression
CC0; BSD; on github; in reStructuredText.
A few thoughts on our
approach…
 Explicitly a “protocol” – explicit steps, copy-paste,
customizable.
 No requirement for computational expertise or
significant computational hardware.
 ~1-5 days to teach a bench biologist to use.
 $100-150 of rental compute (“cloud computing”)…
 …for $1000 data set.
 Adding in quality control and internal validation
steps.
2. Data availability is important for
annotating distant sequences
Anything else Mollusc Cephalopod
no similarity
Can we incentivize data sharing?
 ~$100-$150/transcriptome in the cloud
 Offer to analyze people’s existing data for free,
IFF they open it up within a year.
See:
• CephSeq white paper.
• “Dead Sea Scrolls & Open Marine Transcriptome
Project” blog post;
First results: Loligo
genomic/transcriptome resources
Putting other people’s sequences where my
mouth is:
w/Josh Rosenthal and Benton Grav
Acknowledgements
Lab members involved Collaborators
 Adina Howe (w/Tiedje)
 Jason Pell
 Arend Hintze
 Qingpeng Zhang
 Elijah Lowe
 Likit Preeyanon
 Jiarong Guo
 Tim Brom
 Kanchan Pavangadkar
 Eric McDonald
 Camille Scott
 Jordan Fish
 Michael Crusoe
 Leigh Sheneman
 Billie Swalla (UW)
 Josh Rosenthal (UPR)
 Weiming Li, MSU
 Ona Bloom
(Feinstein), Jen
Morgan (MBL), Joe
Buxbaum (MSSM)
Funding
USDA NIFA; NSF IOS;
NIH; BEACON.
Elijah Lowe
MSU
C. Titus Brown Billie J. Swalla
MSU UW
Efficient online
counting of k-mers
Trimming reads
on abundance
Efficient De
Bruijn graph
representations
Read
abundance
normalization
Streaming
algorithms for
assembly,
variant calling,
and error
correction
Cloud assembly
protocols
Efficient graph
labeling &
exploration
Data set
partitioning
approaches
Assembly-free
comparison of
data sets
HMM-guided
assembly
Efficient search
for target genes
Currentresearch
(khmer software)
Thanks!

Contenu connexe

Tendances

ASM Microbe 2017: Reaching the Parts Other Methods Can't: Long Reads for Micr...
ASM Microbe 2017: Reaching the Parts Other Methods Can't: Long Reads for Micr...ASM Microbe 2017: Reaching the Parts Other Methods Can't: Long Reads for Micr...
ASM Microbe 2017: Reaching the Parts Other Methods Can't: Long Reads for Micr...Nick Loman
 
Lets Make a Mammoth
Lets Make a Mammoth  Lets Make a Mammoth
Lets Make a Mammoth Cheche Salas
 
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...
Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobi...Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobi...
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...Torsten Seemann
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Yaoyu Wang
 
Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...QBiC_Tue
 
New Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewNew Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewPaolo Dametto
 
Next Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and OpportunitiesNext Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and OpportunitiesChung-Tsai Su
 
[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomicsMads Albertsen
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisdrelamuruganvet
 
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and MetagenomicsCross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics Christopher Mason
 
Comparison between RNASeq and Microarray for Gene Expression Analysis
Comparison between RNASeq and Microarray for Gene Expression AnalysisComparison between RNASeq and Microarray for Gene Expression Analysis
Comparison between RNASeq and Microarray for Gene Expression AnalysisYaoyu Wang
 
Knowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsKnowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsGolden Helix Inc
 
2013 stamps-intro-assembly
2013 stamps-intro-assembly2013 stamps-intro-assembly
2013 stamps-intro-assemblyc.titus.brown
 
Bayesian Taxonomic Assignment for the Next-Generation Metagenomics
Bayesian Taxonomic Assignment for the Next-Generation MetagenomicsBayesian Taxonomic Assignment for the Next-Generation Metagenomics
Bayesian Taxonomic Assignment for the Next-Generation MetagenomicsJonathan Eisen
 

Tendances (20)

ASM Microbe 2017: Reaching the Parts Other Methods Can't: Long Reads for Micr...
ASM Microbe 2017: Reaching the Parts Other Methods Can't: Long Reads for Micr...ASM Microbe 2017: Reaching the Parts Other Methods Can't: Long Reads for Micr...
ASM Microbe 2017: Reaching the Parts Other Methods Can't: Long Reads for Micr...
 
Lets Make a Mammoth
Lets Make a Mammoth  Lets Make a Mammoth
Lets Make a Mammoth
 
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...
Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobi...Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobi...
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1
 
Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...
 
Future of metagenomics
Future of metagenomicsFuture of metagenomics
Future of metagenomics
 
New Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewNew Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overview
 
Genome Curation using Apollo
Genome Curation using ApolloGenome Curation using Apollo
Genome Curation using Apollo
 
Hertweck bbl2012
Hertweck bbl2012Hertweck bbl2012
Hertweck bbl2012
 
Next Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and OpportunitiesNext Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and Opportunities
 
NGS and the molecular basis of disease: a practical view
NGS and the molecular basis of disease: a practical viewNGS and the molecular basis of disease: a practical view
NGS and the molecular basis of disease: a practical view
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
2012 oslo-talk
2012 oslo-talk2012 oslo-talk
2012 oslo-talk
 
[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and MetagenomicsCross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
 
Comparison between RNASeq and Microarray for Gene Expression Analysis
Comparison between RNASeq and Microarray for Gene Expression AnalysisComparison between RNASeq and Microarray for Gene Expression Analysis
Comparison between RNASeq and Microarray for Gene Expression Analysis
 
Knowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsKnowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and Variants
 
2013 stamps-intro-assembly
2013 stamps-intro-assembly2013 stamps-intro-assembly
2013 stamps-intro-assembly
 
Bayesian Taxonomic Assignment for the Next-Generation Metagenomics
Bayesian Taxonomic Assignment for the Next-Generation MetagenomicsBayesian Taxonomic Assignment for the Next-Generation Metagenomics
Bayesian Taxonomic Assignment for the Next-Generation Metagenomics
 

En vedette

Linuxtag 2012 - continuous delivery - dream to reality
Linuxtag 2012  - continuous delivery - dream to realityLinuxtag 2012  - continuous delivery - dream to reality
Linuxtag 2012 - continuous delivery - dream to realityClément Escoffier
 
유기화학 2nd
유기화학 2nd유기화학 2nd
유기화학 2ndshinkyung
 
Motoholics Sponsorship Proposal 2010
Motoholics Sponsorship Proposal 2010Motoholics Sponsorship Proposal 2010
Motoholics Sponsorship Proposal 2010Gaurab Dutta
 
Legal Issues Important for Doing Business in the U.S. | Martijn Steger
Legal Issues Important for Doing Business in the U.S. | Martijn StegerLegal Issues Important for Doing Business in the U.S. | Martijn Steger
Legal Issues Important for Doing Business in the U.S. | Martijn StegerKegler Brown Hill + Ritter
 
2013 stamps-intro-assembly
2013 stamps-intro-assembly2013 stamps-intro-assembly
2013 stamps-intro-assemblyc.titus.brown
 
Know Your Enemy
Know Your EnemyKnow Your Enemy
Know Your Enemytlineshill
 
Circles of San Antonio Community Coalition and Bexar County DWI Task Force Ho...
Circles of San Antonio Community Coalition and Bexar County DWI Task Force Ho...Circles of San Antonio Community Coalition and Bexar County DWI Task Force Ho...
Circles of San Antonio Community Coalition and Bexar County DWI Task Force Ho...Circles of San Antonio Community Coalition
 
Avysta Presentation
Avysta PresentationAvysta Presentation
Avysta Presentationguest95d5ba
 
GAME TECHNOLOGIES USAGE FOR USERS ATTRACTION TO EDUCATIONAL RESOURCES
GAME TECHNOLOGIES USAGE FOR USERS ATTRACTION TO EDUCATIONAL RESOURCESGAME TECHNOLOGIES USAGE FOR USERS ATTRACTION TO EDUCATIONAL RESOURCES
GAME TECHNOLOGIES USAGE FOR USERS ATTRACTION TO EDUCATIONAL RESOURCESAlexander Lavrov
 
Where to focus event innovation? - An audience led approach
Where to focus event innovation? - An audience led approachWhere to focus event innovation? - An audience led approach
Where to focus event innovation? - An audience led approachLive Union
 
One Step Online School Simplified
One Step Online School SimplifiedOne Step Online School Simplified
One Step Online School SimplifiedChineseTeachers.com
 
Managing International Risks + Corporate Investigations
Managing International Risks + Corporate InvestigationsManaging International Risks + Corporate Investigations
Managing International Risks + Corporate InvestigationsKegler Brown Hill + Ritter
 

En vedette (20)

h-ubu : CDI in JavaScript
h-ubu : CDI in JavaScripth-ubu : CDI in JavaScript
h-ubu : CDI in JavaScript
 
Linuxtag 2012 - continuous delivery - dream to reality
Linuxtag 2012  - continuous delivery - dream to realityLinuxtag 2012  - continuous delivery - dream to reality
Linuxtag 2012 - continuous delivery - dream to reality
 
OSGi - beyond the myth
OSGi -  beyond the mythOSGi -  beyond the myth
OSGi - beyond the myth
 
Review Adobe Wallaby
Review Adobe WallabyReview Adobe Wallaby
Review Adobe Wallaby
 
유기화학 2nd
유기화학 2nd유기화학 2nd
유기화학 2nd
 
2014 Professional Responsibility Seminar
2014 Professional Responsibility Seminar2014 Professional Responsibility Seminar
2014 Professional Responsibility Seminar
 
Motoholics Sponsorship Proposal 2010
Motoholics Sponsorship Proposal 2010Motoholics Sponsorship Proposal 2010
Motoholics Sponsorship Proposal 2010
 
Legal Issues Important for Doing Business in the U.S. | Martijn Steger
Legal Issues Important for Doing Business in the U.S. | Martijn StegerLegal Issues Important for Doing Business in the U.S. | Martijn Steger
Legal Issues Important for Doing Business in the U.S. | Martijn Steger
 
2013 stamps-intro-assembly
2013 stamps-intro-assembly2013 stamps-intro-assembly
2013 stamps-intro-assembly
 
Know Your Enemy
Know Your EnemyKnow Your Enemy
Know Your Enemy
 
Circles of San Antonio Community Coalition and Bexar County DWI Task Force Ho...
Circles of San Antonio Community Coalition and Bexar County DWI Task Force Ho...Circles of San Antonio Community Coalition and Bexar County DWI Task Force Ho...
Circles of San Antonio Community Coalition and Bexar County DWI Task Force Ho...
 
2015 pag-chicken
2015 pag-chicken2015 pag-chicken
2015 pag-chicken
 
Avysta Presentation
Avysta PresentationAvysta Presentation
Avysta Presentation
 
GAME TECHNOLOGIES USAGE FOR USERS ATTRACTION TO EDUCATIONAL RESOURCES
GAME TECHNOLOGIES USAGE FOR USERS ATTRACTION TO EDUCATIONAL RESOURCESGAME TECHNOLOGIES USAGE FOR USERS ATTRACTION TO EDUCATIONAL RESOURCES
GAME TECHNOLOGIES USAGE FOR USERS ATTRACTION TO EDUCATIONAL RESOURCES
 
Cope Manifesto
Cope ManifestoCope Manifesto
Cope Manifesto
 
Where to focus event innovation? - An audience led approach
Where to focus event innovation? - An audience led approachWhere to focus event innovation? - An audience led approach
Where to focus event innovation? - An audience led approach
 
One Step Online School Simplified
One Step Online School SimplifiedOne Step Online School Simplified
One Step Online School Simplified
 
Sceneries
SceneriesSceneries
Sceneries
 
Langkah Membuat Blogspot
Langkah Membuat BlogspotLangkah Membuat Blogspot
Langkah Membuat Blogspot
 
Managing International Risks + Corporate Investigations
Managing International Risks + Corporate InvestigationsManaging International Risks + Corporate Investigations
Managing International Risks + Corporate Investigations
 

Similaire à 2014 naples

2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-researchc.titus.brown
 
2012 hpcuserforum talk
2012 hpcuserforum talk2012 hpcuserforum talk
2012 hpcuserforum talkc.titus.brown
 
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017David Cook
 
2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grcc.titus.brown
 
Marzillier_09052014.pdf
Marzillier_09052014.pdfMarzillier_09052014.pdf
Marzillier_09052014.pdf7006ASWATHIRR
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsmikaelhuss
 
Group 5 DNA Tech - Ecology & Envt
Group 5 DNA Tech - Ecology & EnvtGroup 5 DNA Tech - Ecology & Envt
Group 5 DNA Tech - Ecology & EnvtJessica Kabigting
 
Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomicsNikhil Aggarwal
 

Similaire à 2014 naples (20)

2014 villefranche
2014 villefranche2014 villefranche
2014 villefranche
 
2014 bangkok-talk
2014 bangkok-talk2014 bangkok-talk
2014 bangkok-talk
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-research
 
2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
2012 hpcuserforum talk
2012 hpcuserforum talk2012 hpcuserforum talk
2012 hpcuserforum talk
 
2014 mmg-talk
2014 mmg-talk2014 mmg-talk
2014 mmg-talk
 
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
 
2014 davis-talk
2014 davis-talk2014 davis-talk
2014 davis-talk
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grc
 
rheumatoid arthritis
rheumatoid arthritisrheumatoid arthritis
rheumatoid arthritis
 
Marzillier_09052014.pdf
Marzillier_09052014.pdfMarzillier_09052014.pdf
Marzillier_09052014.pdf
 
Sweden_eemis_big_data
Sweden_eemis_big_dataSweden_eemis_big_data
Sweden_eemis_big_data
 
2014 sage-talk
2014 sage-talk2014 sage-talk
2014 sage-talk
 
Outcrossing
OutcrossingOutcrossing
Outcrossing
 
Thesis biobix
Thesis biobixThesis biobix
Thesis biobix
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 
Group 5 DNA Tech - Ecology & Envt
Group 5 DNA Tech - Ecology & EnvtGroup 5 DNA Tech - Ecology & Envt
Group 5 DNA Tech - Ecology & Envt
 
2015 pycon-talk
2015 pycon-talk2015 pycon-talk
2015 pycon-talk
 
Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomics
 

Plus de c.titus.brown

Plus de c.titus.brown (20)

2016 davis-biotech
2016 davis-biotech2016 davis-biotech
2016 davis-biotech
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
2015 aem-grs-keynote
2015 aem-grs-keynote2015 aem-grs-keynote
2015 aem-grs-keynote
 
2015 msu-code-review
2015 msu-code-review2015 msu-code-review
2015 msu-code-review
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
 
2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talk
 
2015 opencon-webcast
2015 opencon-webcast2015 opencon-webcast
2015 opencon-webcast
 
2015 vancouver-vanbug
2015 vancouver-vanbug2015 vancouver-vanbug
2015 vancouver-vanbug
 
2015 osu-metagenome
2015 osu-metagenome2015 osu-metagenome
2015 osu-metagenome
 
2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenome
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
 
2015 pag-metagenome
2015 pag-metagenome2015 pag-metagenome
2015 pag-metagenome
 
2014 nyu-bio-talk
2014 nyu-bio-talk2014 nyu-bio-talk
2014 nyu-bio-talk
 
2014 anu-canberra-streaming
2014 anu-canberra-streaming2014 anu-canberra-streaming
2014 anu-canberra-streaming
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
2014 abic-talk
2014 abic-talk2014 abic-talk
2014 abic-talk
 
2014 nci-edrn
2014 nci-edrn2014 nci-edrn
2014 nci-edrn
 
2014 wcgalp
2014 wcgalp2014 wcgalp
2014 wcgalp
 

Dernier

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 

Dernier (20)

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 

2014 naples

  • 1. C. Titus Brown Assistant Professor MMG, CSE, BEACON Michigan State University May 2014 ctb@msu.edu Applying mRNAseq to non-model organisms: challenges, opportunities, and solutions
  • 2. We practice open science! Everything discussed here:  Code: github.com/ged-lab/ ; BSD license  Blog: http://ivory.idyll.org/blog (‘titus brown blog’)  Twitter: @ctitusbrown  Grants on Lab Web site: http://ged.msu.edu/research.html  Preprints available.
  • 3. Sequencing has become very inexpensive.
  • 4. Sequencing costs  Approximately $1000 of mRNAseq will yield a decent transcriptome.  Multiple samples will allow you to generate gene inventories.  For the ascidian project I will show you,  1 graduate student,  2 transcriptomes,  3 genomes…
  • 5.
  • 6.
  • 7. Mapping => quantitation Reference transcriptome required.
  • 8. Interpreting RNAseq requires gene models: http://www.hitseq.com/images/RNA-seq_AS.jpg
  • 9. The challenges of non-model transcriptomics  Missing or low quality genome reference.  Evolutionarily distant.  Most extant computational tools focus on model organisms –  Assume low polymorphism (internal variation)  Assume reference genome  Assume somewhat reliable functional annotation  More significant compute infrastructure …and cannot easily or directly be used on critters of interest.
  • 10. Outline 1. Challenges of non-model transcriptomics. 2. Lamprey: too much data, not enough genome 3. Digital normalization as a coping mechanism 4. …applied to Molgulid ascidians… 5. …and back to lamprey. 6. More transcriptome challenges 7. What’s next?
  • 11. Sea lamprey in the Great Lakes  Non-native  Parasite of medium to large fishes  Caused populations of host fishes to crash Li Lab / Y-W C-D
  • 12. The problem of lamprey:  Diverged at base of vertebrates; evolutionarily distant from model organisms.  Large, complicated genome (~2 GB)  Relatively little existing sequence.  We sequenced the liver genome…
  • 13. Lamprey has incomplete genomic sequence J. Smith et al., PNAS 2009 Evidence of somatic recombination; 100s of mb of sequence eliminated from genome during development. More recent evidence (unpub, J. Smith et al.) suggests that this loss is developmentally regulated, results in changes in gene expression (due to loss of genes!), and is tissue specific. Liver genome is not the entire genome.
  • 14. Lamprey tissues for which we have mRNAseq embryo stages (late blastula, gastrula, neurula, 22b, neural- crest migration, 24c1,24c2) metamorphosis 3 (intestine, kidney) ovulatory female head skin adult intestine metamorphosis 4 (intestine, kidney) preovulatory female eye adult kidney metamorphosis 5 (liver, intestine, kidney) preovulatory female tail skin brain paired metamorphosis 6 (intestine, kidney) prespermiating male gill freshwater (gill, intestine, kidney) metamorphosis 7 (intestine, kidney) mature adult male rope tissue larval (gill, kidney, liver, intestine) monocytes spermiating male gill juvenile (intestine, liver, kidney) brain (0,3,21 dpi) spermiating male head skin lips spinal cord (0.3.21 dpi) supraneural tissue metamorphosis 1 (intestine, kidney) spermiating male muscle small parasite distal intestine, kidney, proximal intestine metamorphosis 2 (liver, intestine, salt water (gill, intestine)
  • 15. Assembly It was the best of times, it was the wor , it was the worst of times, it was the isdom, it was the age of foolishness mes, it was the age of wisdom, it was th It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness …but for lots and lots of fragments!
  • 16. Shared low-level transcripts may not reach the threshold for assembly.
  • 17. Main problem (4 years ago): We have a massive amount of data that challenges existing computers when we try to assemble it all together.
  • 18. Solution: Digital normalization (a computational version of library normalization) Suppose you have a dilution factor of A (10) to B(1). To get 10x of B you need to get 100x of A! Overkill!! This 100x will consume disk space and, because of errors, memory. We can discard it for you…
  • 25. Digital normalization approach A digital analog to cDNA library normalization, diginorm:  Is single pass: looks at each read only once;  Does not “collect” the majority of errors;  Keeps all low-coverage reads;  Smooths out coverage of sequencing. => Enables analyses that are otherwise completely impossible.
  • 26. Evaluating diginorm – how?  Can’t assemble lamprey w/o diginorm; are results any good & how would we know?  Need comparative data set  …ascidians!
  • 27. Looking at the Molgula… Putnam et al., 2008, Nature.Modified from Swalla 2001
  • 28. Sea squirts! Molgula oculata Molgula occulta Molgula oculata Ciona intestinalis Elijah Lowe; collaboration w/Billie Swalla
  • 29. Challenging organisms to work on --  Only spawn ~1 month out of the year  Located off the northern coast of France  Hybrids not found outside of lab conditions  Species cannot be cultured  Wet lab techniques are not fully developed for species
  • 30. Tail loss and notochord genes a) M. oculata b) hybrid (occulta egg x oculata sperm) c) M. occulta Notochord cells in orange Swalla, B. et al. Science, Vol 274, Issue 5290, 1205-1208 , 15 November 1996
  • 31. Diginorm applied to Molgula embryonic mRNAseq
  • 33. Question: does it matter what assembly pipeline you use? (No) 3 70 25 1 36 13563 35 13 7 4 23 8 1 6 5 Diginorm V/O Raw V/O Diginorm trinity Raw trinity Numbers are putative orthologs (reciprocal best hits) w/Ciona intestinalis, calculated for each assembly. Elijah Lowe
  • 34. How complete are these transcriptomes? Elijah Lowe
  • 35. Shift in differentially expressed genes from gastrulation to neurulation M. ocu vs. M. occ gastrula M. ocu vs. M. occ neurula Differentially expressed during neurulation in M. ocu vs M. occ
  • 36. Notochord gene expression similar to tailed species -10 -5 0 5 10 15 -10-5051015 Expression difference Hybrid vs Parent species log2(hybrid)-log2(oculata) log2(hybrid)-log2(occulta)
  • 37. M. occulta transgenic NoTrlc Alberto Stolfi & Lionel Christiaen
  • 38. Lionel Christaen Claudia Racioppi NYU Statione Zoologica Napoli
  • 39. Enabling Molgula research…  Develop candidate genes to generate hypotheses about gene network evolution;  Rapid development of genomic resources => reporter constructs. Doesn’t answer any biological questions directly, but enables us to go looking for things much faster!
  • 40. Transcriptome assembly thoughts  We can (now) assemble really big data sets, and get pretty good results.  We have lots of evidence (some presented here :) that some assemblies are not strongly affected by digital normalization. (Note: normalization algorithm is now standard part of Trinity mRNAseq pipeline.)
  • 41. Transcriptome results - lamprey  Started with 5.1 billion reads from 50 different tissues. (4 years of computational research, and about 1 month of compute time, GO HERE) Ended with:
  • 42. Lamprey transcriptome basic stats  616,000 transcripts (!)  263,000 transcript families (!) (This seems like a lot.)
  • 43. Lamprey transcriptome basic stats  616,000 transcripts  263,000 transcript families  Only 20436 transcript families have transcripts > 1kb (compare with mouse: 17331 of 29769 genes are > 1kb) So, estimation by thumb ~ not that off, for long transcripts.
  • 44. Common vs rare genes #transcripts # samples Camille Scott
  • 45. Can look at transcripts by tissue - - Camille Scott
  • 46. Too… many… samples… Camille Scott Presence/absence clustering
  • 47. Expression-based clustering Some known biology recapitulated; and… ??? Camille Scott
  • 48. Next steps with lamprey  Far more complete transcriptome than the one generated from the genome!  (…but suffering from contamination, oversensitivity to unprocessed transcripts, …?)  Enabling studies in –  Basal vertebrate phylogeny  Biliary atresia  Evolutionary origin of brown fat (previously thought to be mammalian only!)  Pheromonal response in adults  Spinal cord regeneration
  • 49. Next challenges OK, we can deal with volume of data, make pretty pictures, and ... Now what?
  • 50. Contamination! Both experimental or “real” contaminants are big pro Camille Scott
  • 51. Pathway predictions vary dramatically depending on data set, annotation Likit Preeyanon KEGG pathway comparison across several different gene annotation sets for chicken
  • 52. The problem of lopsided gene characterization is pervasive: e.g., the brain "ignorome" "...ignorome genes do not differ from well-studied genes in terms of connectivity in coexpression networks. Nor do they differ with respect to numbers of orthologs, paralogs, or protein domains. The major distinguishing characteristic between these sets of genes is date of discovery, early discovery being associated with greater research momentum—a genomic bandwagon effect." Ref.: Pandey et al. (2014), PLoS One 11, e88889.lide courtesy Erich Schwarz
  • 53. Practical implications of diginorm  Data is (essentially) free;  For some problems, analysis is now cheaper than data gathering (i.e. essentially free);  …plus, we can run most of our approaches in the cloud (per-hour rental compute resources).
  • 54. 1. khmer-protocols  Effort to provide standard “cheap” assembly protocols for the cloud.  Entirely copy/paste; ~2-6 days from raw reads to assembly, annotations, and differential expression analysis.  Open, versioned, forkable, citable. (“Don’t bother me unless it doesn’t work.” Read cleaning Diginorm Assembly Annotation RSEM differential expression
  • 55. CC0; BSD; on github; in reStructuredText.
  • 56. A few thoughts on our approach…  Explicitly a “protocol” – explicit steps, copy-paste, customizable.  No requirement for computational expertise or significant computational hardware.  ~1-5 days to teach a bench biologist to use.  $100-150 of rental compute (“cloud computing”)…  …for $1000 data set.  Adding in quality control and internal validation steps.
  • 57. 2. Data availability is important for annotating distant sequences Anything else Mollusc Cephalopod no similarity
  • 58. Can we incentivize data sharing?  ~$100-$150/transcriptome in the cloud  Offer to analyze people’s existing data for free, IFF they open it up within a year. See: • CephSeq white paper. • “Dead Sea Scrolls & Open Marine Transcriptome Project” blog post;
  • 59. First results: Loligo genomic/transcriptome resources Putting other people’s sequences where my mouth is: w/Josh Rosenthal and Benton Grav
  • 60. Acknowledgements Lab members involved Collaborators  Adina Howe (w/Tiedje)  Jason Pell  Arend Hintze  Qingpeng Zhang  Elijah Lowe  Likit Preeyanon  Jiarong Guo  Tim Brom  Kanchan Pavangadkar  Eric McDonald  Camille Scott  Jordan Fish  Michael Crusoe  Leigh Sheneman  Billie Swalla (UW)  Josh Rosenthal (UPR)  Weiming Li, MSU  Ona Bloom (Feinstein), Jen Morgan (MBL), Joe Buxbaum (MSSM) Funding USDA NIFA; NSF IOS; NIH; BEACON.
  • 62. C. Titus Brown Billie J. Swalla MSU UW
  • 63. Efficient online counting of k-mers Trimming reads on abundance Efficient De Bruijn graph representations Read abundance normalization Streaming algorithms for assembly, variant calling, and error correction Cloud assembly protocols Efficient graph labeling & exploration Data set partitioning approaches Assembly-free comparison of data sets HMM-guided assembly Efficient search for target genes Currentresearch (khmer software)