SlideShare a Scribd company logo
1 of 54
BEACON 101: Making use of new sequencing technologies C. Titus Brown ctb@msu.edu Comp Sci & Micro Michigan State University
Outline “Next”-generation sequencing. Dealing with the data – our research. What are they teaching kids these days, anyway?
But first, some background… The kinds of technology I’ll be talking about are being used by many BEACON groups, and will probably be used by many more within the next few years. Sequencing advances are (IMO) one of the most stunning technological breakthroughs in biology in the last 20 years. As a mid-level BEACON bureaucrat (TG leader! Course instructor!) I’m interested in: Enabling interesting science. Finding fun new problems to tackle. Developing a training & education plan so that we produce tech-savvy students and junior faculty.
In particular… At the last BEACON Congress, we had a “bioinformatics sandbox” session. Only MSU folk could attend (short notice!) About 8 labs, all using next-gen sequencing… …and 2 labs, working on methods for analyzing data. (Hi!) I know there are more people out there, on both sides of the equation.  Who are you??
OK, Back to…I. Sequencing! Sequencing of DNA and RNA. Single genomes Transcriptomes Natural populations (tags) Environmental samples/microbial populations (metagenomics) Cheap and massively scalable sequencing of DNA and RNA.
Sequencing technology Major, dramatic changes in our ability to sequence DNA and RNA quickly and cheaply. Majority of deployed techniques depend on (variations of) a single trick: “polony” sequencing.  No cloning. Single-molecule sequencing coming along fast, but not yet ready for prime time.
Two specific concepts: First, sequencing everything at random is very much easier than sequencing a specific gene region.  (For example, it will soon be easier and cheaper to shotgun-sequence all of E. coli then it is to get a single good plasmid sequence.) Second, if you are sequencing on a 2-D substrate (wells, or surfaces, or whatnot) then any increase in density (smaller wells, or better imaging) leads to a squared increase in the number of sequences.
Novel genome sequencing
Some numbers For under $1,000 per sample, the Illumina HiSeq machine will generate: 100,000,000 reads Each of length ~100 In under a week. x 16 samples/run. That’s 160 Gb of sequence, or just over 50x human genome…
How do you choose a sequencing approach? Choose one: Long reads (low sampling, but easier to work with) Deep random sampling (quantitative sequencing, quite sensitive) The answer will depend on what exactly you want to do.  Generally I prefer the shorter reads. Find someone who pays obsessive attention to this stuff.  (Hi!)
Data analysis! In general, it now takes longer to analyze the data than it does to generate the data. That is, suppose you already know exactly what to do and simply want to run your analysis. By and large, you can generate a large enough amount of data in one week that you cannot run the analysis of it in the following week. …this is steadily shifting towards the “more data” side, too. (This is really a paradigm shift for many areas of biology.)
Your basic data file. >895:5:1:1276:16683/1 GTCGCTTTGCGATGTTTGTCGGGTGCATCTTTTGGGAACAGCAAGTTTTGGAATGATCCCTGCACTTTCATCGGAACACC >895:5:1:1558:16140/2 CCGTTCCAGAGATATGACCCGTTTTAATGAACGCTGCCAGTTGACAAATTATTTTCCAAAATTAGCAATTGCGTGGGTTCTTTTCCATCTAAACAGCTTCTGGGCTTTATGCTG >895:5:1:1581:10052/1 TTACAGACGTCGTTCTAACTAATTTGTGACGAAAATTGCCCACAATTATGACTATATGTGGAATTTTG >895:5:1:1824:4518/2 CCAAATTAGTTAGAATGACGTTTGTAACCGTATTCCGGTGCAACTTTGTGAATAATTTCTAACTGTAAAAATTTTTGGCAAAACCAAGTTTGCCGGCCGCAACCGCAAC >895:5:1:1945:14960/1 CTGATTTTGCAATGTTACTGACATGGGTATGCCAGTTGTGATTATTGGCGACTGCAACTCCCAACAATGATACTGTTTACTTTTGTGTGAATGAACATTTATTCATCCTTGGGT …
What now?
Mapping U. Colorado http://genomics-course.jasondk.org/?p=395 Many fast & efficient computational solutions exist. You have to figure out how to choose parameters to maximize sensitivity/specificity, and when to validate.
Whole genome shotgun sequencing & assembly Randomly fragment & sequence from DNA; reassemble computationally. UMD assembly primer (cbcb.umd.edu)
Data analysis challenges Choosing a software suite/pipeline/analysis approach. Scaling chosen approach to volume of data (2-200x what they designed it for) Efficiently running software. Integrating analysis results and extracting desired information. Understanding what you’ve done in sufficient detail to design & perform requisite computational controls.
Data analysis challenges, cont’d The rate of change is itself accelerating: New tools, approaches every month. More data, data types, chemistries every month. Increasing commercialization (so getting an honest answer from the companies is basically impossible) But… opportunities are great!  Jump on in!
What does the future hold? “Prediction is very difficult, especially about the future.” -- Niels Bohr More, cheaper sequencing: plan for a world where you can sequence anything you sample, to any depth you want, for arbitrarily small amounts of money.  Seriously. Solutions to the majority of the scaling issues in data analysis (but not the scientific issues…)
Questions?
II. Our research “Making sense of sequence” “Surfing the data tsunami” There are a number of fascinating challenges at the intersection of genomics and the rest of biology; they require appropriate (ab)use of computational techniques, applied to data sets from interesting critters and/or experimental setups. (Evolution turns out to be especially interesting in this regard.)
Frontiers in sequencing new stuff… There are many, many interesting critters for which we have essentially no genomic or transcriptomic information. Next-gen sequencing has now made these organisms accessible to investigation. But dealing with organisms for which there is no reference genome is … challenging.
Whole genome shotgun sequencing & assembly Randomly fragment & sequence from DNA; reassemble computationally. UMD assembly primer (cbcb.umd.edu)
A brief intro to shotgun assembly It was the best of times, it was the wor , it was the worst of times, it was the  isdom, it was the age of foolishness mes, it was the age of wisdom, it was th It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness …but for 2 bn+ fragments. Not subdivisible; not easy to distribute; memory intensive.
Assemble based on word overlaps: the quick brown fox jumped  jumped over the lazy dog the quick brown fox jumpedover the lazy dog Repeats do cause problems: my chemical romance: nanana nanana, batman!
Project I: metagenomics Wild microbes! ,[object Object]
These microbes mediate important geobiological processes (e.g. nitrogen reduction)
Ecology & evolution of these habitats??,[object Object]
SAMPLING LOCATIONS
Sampling strategy per site 1 M 1 cM 10 M 1 cM Reference soil 1 M Soil cores: 1 inch diameter, 4 inches deep Total: 8 Reference metagenomes + 64 spatially separated cores             (pyrotag sequencing) 10 M
Great Prairie sequencing summary 200x human genome…! > 10x more challenging (total diversity)
Subdividing reads by connection “Partitioning” => assembly on multiple computers
Project II: transcriptomics Developmental change in non-model ascidians, the Molgula
Molgula questions What happened to the downstream tail gene network in the tailless ascidian? What are the genomic adaptations that made the Molgulidae particularly susceptible to tail loss? (e.g. Manx/bobcat) How does tail loss actually work, functionally? Heterochrony of metamorphosis?
Preliminary round of sequencing (Illumina 76 bp x 2, ~250 bp insert size)
We can count by allele!
Molgula/emerging story Looks like notochord/tail cells are being specified, but cell movement isn’t happening. May be failure in convergence/extension? Computational leads => experimental validation.
Research goals “Better science through superior computation” Enable interesting biology downstream of sequence analysis. Also, provide tools to others.
Questions?
III. (Graduate) education! Biology is fast becoming data-intensive. This requires expertise that is not traditionally part of many biologists’ training. More generally, “computational science” in biology is really at least three different things: Data analysis (data => hypothesis discovery/validation) Modeling/simulation (ecology models, protein structure, etc.) Instantiation of biological system (e.g. evolution). I’m avoiding theory and (non-digital) experiment, which are yet separate skills…
…and worse… Increasingly, biological understanding relies on computational analysis and inference. Computational intuition and informed skepticism (a.k.a. “scientific method”…) isn’t taught to biologists.
…and worst. All of this rests on a “bedrock” foundation of Badly written or inflexible software that’s difficult to run or install. Scripts written quickly and without reflection or testing. Ineffective computer use. …and a general lack of regard for reproducibility and replication.
Cultural problems? Physics, in particular, has a history of computation, and a robust computational culture… but not bio so much. “Many undergrads got into biology because they were interested in science, but didn’t like the math required for physics and chemistry.  I have bad news for them…” -- me Bad news?  Computation is increasingly important in bio. Good news?  Computation != math. Better news: BEACON is enriched for grads, postdocs, and faculty that live at this interface.  It’s a good crowd.
So what do we do? BEACON course: “Computational Science for (Evolutionary) Biologists”, v2.0 (alpha) 1. Teach programming for computational scientists. 2. Teach computational science strategies/thinking. 3. Touch on reproducibility, RCR, and data management. 4. Keep it interesting enough that people don’t “check out” 5. …try to figure out remote interaction: currently teaching across MSU (15), UT Austin (3), UW Seattle (2), and U Idaho (3).
What is class like? Tuesdays: programming HW due; discussion of computational stuff. Thursdays: reading HW due; group presentation; discussion. Groups split between MSU & (other); in-group teleconf (iPads and FaceTime), whiteboard (Jot!) (Yes, we bought 16 iPads for the course.  BEACON now owns 16 iPads.)
(Jot! Demo)
The course is still a work in progress You can ask your local students, too, but – In-class interaction is possible, but still hard. Group dynamics! Now across 1000s of miles! Not everyone is great at technology multitasking (although kids these days…) That whole “mixed background” thing is extra challenging.  BEACON students are so diverse that you can’t rely on all of them really knowing anything specific.  But they all know so much individually that you risk boring them. Sigh.
Educational future Increasingly, BEACON graduate students cannot be placed into easy categories (Michelle Vogel, Tasneem Pierce). Can we really split these people into “bio” and “compu” folk?  No… nor should we want to, necessarily; whole point of BEACON! Can we make the courses more distributed to take advantage of remote faculty expertise? Last year: no way in heck. This year, the tech is working better.  Plus, iPads! Your opinions welcome, especially if it involves less work for me. Note: options for faculty & postdocs, too: summer course w/Dworkin.

More Related Content

What's hot

Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformaticsChris Dwan
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to BioinformaticsLeighton Pritchard
 
BCIs and DNA Nanotechnology
BCIs and DNA NanotechnologyBCIs and DNA Nanotechnology
BCIs and DNA NanotechnologyMelanie Swan
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshopc.titus.brown
 
DNA analysis on your laptop: Spot the differences
DNA analysis on your laptop: Spot the differencesDNA analysis on your laptop: Spot the differences
DNA analysis on your laptop: Spot the differencesBarbera van Schaik
 
Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...Barbera van Schaik
 

What's hot (8)

Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformatics
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
BCIs and DNA Nanotechnology
BCIs and DNA NanotechnologyBCIs and DNA Nanotechnology
BCIs and DNA Nanotechnology
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
 
DNA analysis on your laptop: Spot the differences
DNA analysis on your laptop: Spot the differencesDNA analysis on your laptop: Spot the differences
DNA analysis on your laptop: Spot the differences
 
Collins seattle-2014-final
Collins seattle-2014-finalCollins seattle-2014-final
Collins seattle-2014-final
 
COMPUTATIONAL BIOLOGY
COMPUTATIONAL BIOLOGYCOMPUTATIONAL BIOLOGY
COMPUTATIONAL BIOLOGY
 
Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...
 

Viewers also liked

PPP Project Development Fund Initiative-PbyR
PPP Project Development Fund Initiative-PbyRPPP Project Development Fund Initiative-PbyR
PPP Project Development Fund Initiative-PbyRAkwu OKOLO
 
Dm95 slide presentasi_maskot_ponxix_hanum_sujana
Dm95 slide presentasi_maskot_ponxix_hanum_sujanaDm95 slide presentasi_maskot_ponxix_hanum_sujana
Dm95 slide presentasi_maskot_ponxix_hanum_sujanaHanum Sujana
 
Perspectives on Poverty and Class
Perspectives on Poverty and ClassPerspectives on Poverty and Class
Perspectives on Poverty and ClassSarah Halstead
 
Come misurare i risultati sui social media
Come misurare i risultati sui social mediaCome misurare i risultati sui social media
Come misurare i risultati sui social mediaLoris Castagnini
 
2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grcc.titus.brown
 
Sixth formers at the University Library
Sixth formers at the University LibrarySixth formers at the University Library
Sixth formers at the University LibraryTina Hohmann
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-researchc.titus.brown
 
Protecting Your Business' Secrets in the Modern Era
Protecting Your Business' Secrets in the Modern EraProtecting Your Business' Secrets in the Modern Era
Protecting Your Business' Secrets in the Modern EraKegler Brown Hill + Ritter
 
Ashleigh and Sarah's Killer Whale
Ashleigh and Sarah's Killer WhaleAshleigh and Sarah's Killer Whale
Ashleigh and Sarah's Killer WhaleTakahe One
 
13th Annual Seminar on Professional Responsibility
13th Annual Seminar on Professional Responsibility13th Annual Seminar on Professional Responsibility
13th Annual Seminar on Professional ResponsibilityKegler Brown Hill + Ritter
 
NZ Myths & Legends webquest
NZ Myths & Legends webquestNZ Myths & Legends webquest
NZ Myths & Legends webquestTakahe One
 
Company Presentation for Publishers
Company Presentation for PublishersCompany Presentation for Publishers
Company Presentation for PublishersSponsormob
 
Professional responsibility seminar in cleveland
Professional responsibility seminar in clevelandProfessional responsibility seminar in cleveland
Professional responsibility seminar in clevelandKegler Brown Hill + Ritter
 
Claire Nessler's 97 2003
Claire Nessler's 97 2003Claire Nessler's 97 2003
Claire Nessler's 97 2003guesteccab3
 

Viewers also liked (20)

PPP Project Development Fund Initiative-PbyR
PPP Project Development Fund Initiative-PbyRPPP Project Development Fund Initiative-PbyR
PPP Project Development Fund Initiative-PbyR
 
Dm95 slide presentasi_maskot_ponxix_hanum_sujana
Dm95 slide presentasi_maskot_ponxix_hanum_sujanaDm95 slide presentasi_maskot_ponxix_hanum_sujana
Dm95 slide presentasi_maskot_ponxix_hanum_sujana
 
Perspectives on Poverty and Class
Perspectives on Poverty and ClassPerspectives on Poverty and Class
Perspectives on Poverty and Class
 
Come misurare i risultati sui social media
Come misurare i risultati sui social mediaCome misurare i risultati sui social media
Come misurare i risultati sui social media
 
2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grc
 
Sixth formers at the University Library
Sixth formers at the University LibrarySixth formers at the University Library
Sixth formers at the University Library
 
Br10 tekniske installationer
Br10 tekniske installationerBr10 tekniske installationer
Br10 tekniske installationer
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-research
 
OW2 Nanoko
OW2 NanokoOW2 Nanoko
OW2 Nanoko
 
Passivhus nordvest
Passivhus nordvestPassivhus nordvest
Passivhus nordvest
 
Protecting Your Business' Secrets in the Modern Era
Protecting Your Business' Secrets in the Modern EraProtecting Your Business' Secrets in the Modern Era
Protecting Your Business' Secrets in the Modern Era
 
Peuples inconnus
Peuples inconnusPeuples inconnus
Peuples inconnus
 
Ashleigh and Sarah's Killer Whale
Ashleigh and Sarah's Killer WhaleAshleigh and Sarah's Killer Whale
Ashleigh and Sarah's Killer Whale
 
13th Annual Seminar on Professional Responsibility
13th Annual Seminar on Professional Responsibility13th Annual Seminar on Professional Responsibility
13th Annual Seminar on Professional Responsibility
 
NZ Myths & Legends webquest
NZ Myths & Legends webquestNZ Myths & Legends webquest
NZ Myths & Legends webquest
 
RealTimePostproduction
RealTimePostproductionRealTimePostproduction
RealTimePostproduction
 
Company Presentation for Publishers
Company Presentation for PublishersCompany Presentation for Publishers
Company Presentation for Publishers
 
Professional responsibility seminar in cleveland
Professional responsibility seminar in clevelandProfessional responsibility seminar in cleveland
Professional responsibility seminar in cleveland
 
Morsø kommune og landbruget
Morsø kommune og landbrugetMorsø kommune og landbruget
Morsø kommune og landbruget
 
Claire Nessler's 97 2003
Claire Nessler's 97 2003Claire Nessler's 97 2003
Claire Nessler's 97 2003
 

Similar to BEACON 101: Sequencing tech

2012 hpcuserforum talk
2012 hpcuserforum talk2012 hpcuserforum talk
2012 hpcuserforum talkc.titus.brown
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsmikaelhuss
 
Ontologies for baby animals and robots From "baby stuff" to the world of adul...
Ontologies for baby animals and robots From "baby stuff" to the world of adul...Ontologies for baby animals and robots From "baby stuff" to the world of adul...
Ontologies for baby animals and robots From "baby stuff" to the world of adul...Aaron Sloman
 
Synbioabs2 gold
Synbioabs2 goldSynbioabs2 gold
Synbioabs2 goldjunvirola
 
Tales from BioLand - Engineering Challenges in the World of Life Sciences
Tales from BioLand - Engineering Challenges in the World of Life SciencesTales from BioLand - Engineering Challenges in the World of Life Sciences
Tales from BioLand - Engineering Challenges in the World of Life SciencesStefano Di Carlo
 
Bioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformaticsBioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformaticsProf. Wim Van Criekinge
 
2013 bms-retreat-talk
2013 bms-retreat-talk2013 bms-retreat-talk
2013 bms-retreat-talkc.titus.brown
 
Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search petermurrayrust
 
Kain042710 mit sloan-school
Kain042710 mit sloan-schoolKain042710 mit sloan-school
Kain042710 mit sloan-schoolErik Chan
 
2013 caltech-edrn-talk
2013 caltech-edrn-talk2013 caltech-edrn-talk
2013 caltech-edrn-talkc.titus.brown
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Mark Wilkinson
 

Similar to BEACON 101: Sequencing tech (20)

2012 hpcuserforum talk
2012 hpcuserforum talk2012 hpcuserforum talk
2012 hpcuserforum talk
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 
2014 ucl
2014 ucl2014 ucl
2014 ucl
 
2014 sage-talk
2014 sage-talk2014 sage-talk
2014 sage-talk
 
Ontologies for baby animals and robots From "baby stuff" to the world of adul...
Ontologies for baby animals and robots From "baby stuff" to the world of adul...Ontologies for baby animals and robots From "baby stuff" to the world of adul...
Ontologies for baby animals and robots From "baby stuff" to the world of adul...
 
Synbioabs2 gold
Synbioabs2 goldSynbioabs2 gold
Synbioabs2 gold
 
2014 mmg-talk
2014 mmg-talk2014 mmg-talk
2014 mmg-talk
 
Tales from BioLand - Engineering Challenges in the World of Life Sciences
Tales from BioLand - Engineering Challenges in the World of Life SciencesTales from BioLand - Engineering Challenges in the World of Life Sciences
Tales from BioLand - Engineering Challenges in the World of Life Sciences
 
Bioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformaticsBioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformatics
 
2014 villefranche
2014 villefranche2014 villefranche
2014 villefranche
 
2014 naples
2014 naples2014 naples
2014 naples
 
2014 abic-talk
2014 abic-talk2014 abic-talk
2014 abic-talk
 
2016 davis-biotech
2016 davis-biotech2016 davis-biotech
2016 davis-biotech
 
2013 bms-retreat-talk
2013 bms-retreat-talk2013 bms-retreat-talk
2013 bms-retreat-talk
 
A01-Openness in knowledge-based systems
A01-Openness in knowledge-based systemsA01-Openness in knowledge-based systems
A01-Openness in knowledge-based systems
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search
 
Kain042710 mit sloan-school
Kain042710 mit sloan-schoolKain042710 mit sloan-school
Kain042710 mit sloan-school
 
2013 caltech-edrn-talk
2013 caltech-edrn-talk2013 caltech-edrn-talk
2013 caltech-edrn-talk
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014
 

More from c.titus.brown

More from c.titus.brown (20)

2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
2015 msu-code-review
2015 msu-code-review2015 msu-code-review
2015 msu-code-review
 
2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talk
 
2015 pycon-talk
2015 pycon-talk2015 pycon-talk
2015 pycon-talk
 
2015 opencon-webcast
2015 opencon-webcast2015 opencon-webcast
2015 opencon-webcast
 
2015 vancouver-vanbug
2015 vancouver-vanbug2015 vancouver-vanbug
2015 vancouver-vanbug
 
2015 osu-metagenome
2015 osu-metagenome2015 osu-metagenome
2015 osu-metagenome
 
2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenome
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
 
2015 pag-chicken
2015 pag-chicken2015 pag-chicken
2015 pag-chicken
 
2015 pag-metagenome
2015 pag-metagenome2015 pag-metagenome
2015 pag-metagenome
 
2014 nyu-bio-talk
2014 nyu-bio-talk2014 nyu-bio-talk
2014 nyu-bio-talk
 
2014 bangkok-talk
2014 bangkok-talk2014 bangkok-talk
2014 bangkok-talk
 
2014 anu-canberra-streaming
2014 anu-canberra-streaming2014 anu-canberra-streaming
2014 anu-canberra-streaming
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
 
2014 nci-edrn
2014 nci-edrn2014 nci-edrn
2014 nci-edrn
 
2014 wcgalp
2014 wcgalp2014 wcgalp
2014 wcgalp
 

Recently uploaded

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 

Recently uploaded (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

BEACON 101: Sequencing tech

  • 1. BEACON 101: Making use of new sequencing technologies C. Titus Brown ctb@msu.edu Comp Sci & Micro Michigan State University
  • 2. Outline “Next”-generation sequencing. Dealing with the data – our research. What are they teaching kids these days, anyway?
  • 3. But first, some background… The kinds of technology I’ll be talking about are being used by many BEACON groups, and will probably be used by many more within the next few years. Sequencing advances are (IMO) one of the most stunning technological breakthroughs in biology in the last 20 years. As a mid-level BEACON bureaucrat (TG leader! Course instructor!) I’m interested in: Enabling interesting science. Finding fun new problems to tackle. Developing a training & education plan so that we produce tech-savvy students and junior faculty.
  • 4. In particular… At the last BEACON Congress, we had a “bioinformatics sandbox” session. Only MSU folk could attend (short notice!) About 8 labs, all using next-gen sequencing… …and 2 labs, working on methods for analyzing data. (Hi!) I know there are more people out there, on both sides of the equation. Who are you??
  • 5. OK, Back to…I. Sequencing! Sequencing of DNA and RNA. Single genomes Transcriptomes Natural populations (tags) Environmental samples/microbial populations (metagenomics) Cheap and massively scalable sequencing of DNA and RNA.
  • 6. Sequencing technology Major, dramatic changes in our ability to sequence DNA and RNA quickly and cheaply. Majority of deployed techniques depend on (variations of) a single trick: “polony” sequencing. No cloning. Single-molecule sequencing coming along fast, but not yet ready for prime time.
  • 7.
  • 8.
  • 9. Two specific concepts: First, sequencing everything at random is very much easier than sequencing a specific gene region. (For example, it will soon be easier and cheaper to shotgun-sequence all of E. coli then it is to get a single good plasmid sequence.) Second, if you are sequencing on a 2-D substrate (wells, or surfaces, or whatnot) then any increase in density (smaller wells, or better imaging) leads to a squared increase in the number of sequences.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15. Some numbers For under $1,000 per sample, the Illumina HiSeq machine will generate: 100,000,000 reads Each of length ~100 In under a week. x 16 samples/run. That’s 160 Gb of sequence, or just over 50x human genome…
  • 16. How do you choose a sequencing approach? Choose one: Long reads (low sampling, but easier to work with) Deep random sampling (quantitative sequencing, quite sensitive) The answer will depend on what exactly you want to do. Generally I prefer the shorter reads. Find someone who pays obsessive attention to this stuff. (Hi!)
  • 17. Data analysis! In general, it now takes longer to analyze the data than it does to generate the data. That is, suppose you already know exactly what to do and simply want to run your analysis. By and large, you can generate a large enough amount of data in one week that you cannot run the analysis of it in the following week. …this is steadily shifting towards the “more data” side, too. (This is really a paradigm shift for many areas of biology.)
  • 18. Your basic data file. >895:5:1:1276:16683/1 GTCGCTTTGCGATGTTTGTCGGGTGCATCTTTTGGGAACAGCAAGTTTTGGAATGATCCCTGCACTTTCATCGGAACACC >895:5:1:1558:16140/2 CCGTTCCAGAGATATGACCCGTTTTAATGAACGCTGCCAGTTGACAAATTATTTTCCAAAATTAGCAATTGCGTGGGTTCTTTTCCATCTAAACAGCTTCTGGGCTTTATGCTG >895:5:1:1581:10052/1 TTACAGACGTCGTTCTAACTAATTTGTGACGAAAATTGCCCACAATTATGACTATATGTGGAATTTTG >895:5:1:1824:4518/2 CCAAATTAGTTAGAATGACGTTTGTAACCGTATTCCGGTGCAACTTTGTGAATAATTTCTAACTGTAAAAATTTTTGGCAAAACCAAGTTTGCCGGCCGCAACCGCAAC >895:5:1:1945:14960/1 CTGATTTTGCAATGTTACTGACATGGGTATGCCAGTTGTGATTATTGGCGACTGCAACTCCCAACAATGATACTGTTTACTTTTGTGTGAATGAACATTTATTCATCCTTGGGT …
  • 20. Mapping U. Colorado http://genomics-course.jasondk.org/?p=395 Many fast & efficient computational solutions exist. You have to figure out how to choose parameters to maximize sensitivity/specificity, and when to validate.
  • 21. Whole genome shotgun sequencing & assembly Randomly fragment & sequence from DNA; reassemble computationally. UMD assembly primer (cbcb.umd.edu)
  • 22. Data analysis challenges Choosing a software suite/pipeline/analysis approach. Scaling chosen approach to volume of data (2-200x what they designed it for) Efficiently running software. Integrating analysis results and extracting desired information. Understanding what you’ve done in sufficient detail to design & perform requisite computational controls.
  • 23. Data analysis challenges, cont’d The rate of change is itself accelerating: New tools, approaches every month. More data, data types, chemistries every month. Increasing commercialization (so getting an honest answer from the companies is basically impossible) But… opportunities are great! Jump on in!
  • 24. What does the future hold? “Prediction is very difficult, especially about the future.” -- Niels Bohr More, cheaper sequencing: plan for a world where you can sequence anything you sample, to any depth you want, for arbitrarily small amounts of money. Seriously. Solutions to the majority of the scaling issues in data analysis (but not the scientific issues…)
  • 26. II. Our research “Making sense of sequence” “Surfing the data tsunami” There are a number of fascinating challenges at the intersection of genomics and the rest of biology; they require appropriate (ab)use of computational techniques, applied to data sets from interesting critters and/or experimental setups. (Evolution turns out to be especially interesting in this regard.)
  • 27. Frontiers in sequencing new stuff… There are many, many interesting critters for which we have essentially no genomic or transcriptomic information. Next-gen sequencing has now made these organisms accessible to investigation. But dealing with organisms for which there is no reference genome is … challenging.
  • 28. Whole genome shotgun sequencing & assembly Randomly fragment & sequence from DNA; reassemble computationally. UMD assembly primer (cbcb.umd.edu)
  • 29. A brief intro to shotgun assembly It was the best of times, it was the wor , it was the worst of times, it was the isdom, it was the age of foolishness mes, it was the age of wisdom, it was th It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness …but for 2 bn+ fragments. Not subdivisible; not easy to distribute; memory intensive.
  • 30. Assemble based on word overlaps: the quick brown fox jumped jumped over the lazy dog the quick brown fox jumpedover the lazy dog Repeats do cause problems: my chemical romance: nanana nanana, batman!
  • 31.
  • 32. These microbes mediate important geobiological processes (e.g. nitrogen reduction)
  • 33.
  • 34.
  • 36. Sampling strategy per site 1 M 1 cM 10 M 1 cM Reference soil 1 M Soil cores: 1 inch diameter, 4 inches deep Total: 8 Reference metagenomes + 64 spatially separated cores (pyrotag sequencing) 10 M
  • 37. Great Prairie sequencing summary 200x human genome…! > 10x more challenging (total diversity)
  • 38. Subdividing reads by connection “Partitioning” => assembly on multiple computers
  • 39. Project II: transcriptomics Developmental change in non-model ascidians, the Molgula
  • 40. Molgula questions What happened to the downstream tail gene network in the tailless ascidian? What are the genomic adaptations that made the Molgulidae particularly susceptible to tail loss? (e.g. Manx/bobcat) How does tail loss actually work, functionally? Heterochrony of metamorphosis?
  • 41. Preliminary round of sequencing (Illumina 76 bp x 2, ~250 bp insert size)
  • 42. We can count by allele!
  • 43. Molgula/emerging story Looks like notochord/tail cells are being specified, but cell movement isn’t happening. May be failure in convergence/extension? Computational leads => experimental validation.
  • 44. Research goals “Better science through superior computation” Enable interesting biology downstream of sequence analysis. Also, provide tools to others.
  • 46. III. (Graduate) education! Biology is fast becoming data-intensive. This requires expertise that is not traditionally part of many biologists’ training. More generally, “computational science” in biology is really at least three different things: Data analysis (data => hypothesis discovery/validation) Modeling/simulation (ecology models, protein structure, etc.) Instantiation of biological system (e.g. evolution). I’m avoiding theory and (non-digital) experiment, which are yet separate skills…
  • 47. …and worse… Increasingly, biological understanding relies on computational analysis and inference. Computational intuition and informed skepticism (a.k.a. “scientific method”…) isn’t taught to biologists.
  • 48. …and worst. All of this rests on a “bedrock” foundation of Badly written or inflexible software that’s difficult to run or install. Scripts written quickly and without reflection or testing. Ineffective computer use. …and a general lack of regard for reproducibility and replication.
  • 49. Cultural problems? Physics, in particular, has a history of computation, and a robust computational culture… but not bio so much. “Many undergrads got into biology because they were interested in science, but didn’t like the math required for physics and chemistry. I have bad news for them…” -- me Bad news? Computation is increasingly important in bio. Good news? Computation != math. Better news: BEACON is enriched for grads, postdocs, and faculty that live at this interface. It’s a good crowd.
  • 50. So what do we do? BEACON course: “Computational Science for (Evolutionary) Biologists”, v2.0 (alpha) 1. Teach programming for computational scientists. 2. Teach computational science strategies/thinking. 3. Touch on reproducibility, RCR, and data management. 4. Keep it interesting enough that people don’t “check out” 5. …try to figure out remote interaction: currently teaching across MSU (15), UT Austin (3), UW Seattle (2), and U Idaho (3).
  • 51. What is class like? Tuesdays: programming HW due; discussion of computational stuff. Thursdays: reading HW due; group presentation; discussion. Groups split between MSU & (other); in-group teleconf (iPads and FaceTime), whiteboard (Jot!) (Yes, we bought 16 iPads for the course. BEACON now owns 16 iPads.)
  • 53. The course is still a work in progress You can ask your local students, too, but – In-class interaction is possible, but still hard. Group dynamics! Now across 1000s of miles! Not everyone is great at technology multitasking (although kids these days…) That whole “mixed background” thing is extra challenging. BEACON students are so diverse that you can’t rely on all of them really knowing anything specific. But they all know so much individually that you risk boring them. Sigh.
  • 54. Educational future Increasingly, BEACON graduate students cannot be placed into easy categories (Michelle Vogel, Tasneem Pierce). Can we really split these people into “bio” and “compu” folk? No… nor should we want to, necessarily; whole point of BEACON! Can we make the courses more distributed to take advantage of remote faculty expertise? Last year: no way in heck. This year, the tech is working better. Plus, iPads! Your opinions welcome, especially if it involves less work for me. Note: options for faculty & postdocs, too: summer course w/Dworkin.
  • 55. Concluding thoughts Sequencing is awesome, and presents fantastic opportunities. Very, very exciting world!!! Taking advantage of it currently requires expertise that’s hard to teach, learn. That shouldn’t stop us! BEACON could be argued into helping: Workshops, courses, training, etc. I will be visiting UT Austin & U Idaho (Oct 17-21), and UW Seattle (Nov 14-18) and would love to chat about this stuff.