Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015

8 009 vues

Publié le

Long read sequencing - the good, the bad, and the really cool. Covers Illumina SLR, Pacbio RSII and Oxford Nanopore as of June 2015. Discusses bioinformatics differences of long reads over short reads.

Publié dans : Sciences
  • Earn $500 for taking a 1 hour paid survey! read more... ◆◆◆ https://tinyurl.com/realmoneystreams2019
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • Your opinions matter! get paid for them! click here for more info...♣♣♣ https://tinyurl.com/realmoneystreams2019
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015

  1. 1. Long read sequencing A/Prof Torsten Seemann WEHI Bioinformatics Seminar - Melbourne, AU - Tue 16 June 2015 The good, the bad, and the really cool.
  2. 2. My new home Doherty Centre for Applied Microbial Genomics
  3. 3. Why do we need long reads?
  4. 4. Short reads Long reads Inspired by Jason Chin @infoecho #SFAF2015
  5. 5. Repeats Repeat copy 1 Repeat copy 2 Collapsed repeat consensus 1 locus 4 contigs
  6. 6. Long reads can span repeats Repeat copy 1 Repeat copy 2 long reads
  7. 7. Long reads untangle graphs 100 bp 5000 bp1000 bp
  8. 8. Completed genomes
  9. 9. Heterozygosity diploid 4 pieces 2 haplotypes short reads long reads
  10. 10. Phased haplotypes
  11. 11. Structural variation The missing heritability - not just SNPs
  12. 12. Long read technologies
  13. 13. Two flavours ∷ Synthetic long reads : Still needs an Illumina short-read sequencer : Molecular biology tricks + local assembly / constraints : Illumina SLR10X Genomics, Dovetail ∷ Genuine long reads : The real deal - no tricks! : Pacific Biosciences, Oxford Nanopore
  14. 14. Illumina SLR “Moleculo” synthetic long reads
  15. 15. Adapted from http://www.nature.com/nbt/journal/v32/n3/abs/nbt.2833.html Synthetic long reads 1. Genomic DNA 2. Shear ~10 kbp fragments 3. Dilute ~500 fragments per well 4. Amplify 5. Shear to ~ 500 bp fragments 6. Barcode (384) 7. Short read sequencing 8. De-barcode into pools 9. De novo assemble each pool 10. Get ~500 x 10 kbp “long reads”
  16. 16. Illumina SLR - “reads” Q50
  17. 17. Pacific Biosciences It’s already here and it works.
  18. 18. Pacific Biosciences RSII Sequencing Robotics Compute Operator* * not included
  19. 19. Pacbio in Australia today
  20. 20. Pacific Biosciences RSII 2015 ARC LIEF CIA Tim Stinear Installed June 2015 at Doherty Institute Just started running!
  21. 21. PacBio: technology ∷ Polymerase bound to bottom of ZMW μ-well ∷ Incorporation of fluorescent nucleotides measured in real time ∷ 3 hour “movies”
  22. 22. Pacbio: reads Adapted from https://speakerdeck.com/pacbio/track-1-de-novo-assembly DNA fragment with hairpins
  23. 23. PacBio: our first two SMRT cells Yield: 2.3 Gbp No. reads: 275,906 Mean length: 8387 bp N50 length: 11782 bp Polymerase reads Subreads / CCS
  24. 24. PacBio: error rate Single read: 86% 30x Consensus: 99.999%
  25. 25. PacBio: main applications ∷ Finished genomes ∷ Full length cDNA (mRNA isoforms) ∷ Extreme GC sequence ∷ HLA / MHC / KIR haplotyping ∷ Base modifications (methylation)
  26. 26. PacBio: bioinformatics ∷ All in GitHub ∷ SMRT Portal : Nice GUI : Cloud ready : Linux backend : Cluster ready ∷ Cmdline tools ∷ Good docs
  27. 27. Oxford Nanopore The new kid on the block.
  28. 28. MinION - the device
  29. 29. PromethION - large scale device ∷ 48 independent flow cells ∷ On board ASIC ∷ Runs Python ∷ Optional compute
  30. 30. Nanopore - technology Signal is measured from 5 bases Timing is irregular Base modifications do alter the signal
  31. 31. Nanopore - reads 2D - normal 1D complement 1D template 2D - full hairpin complement template
  32. 32. Nanopore - read lengths Read length is not limited by technology but by library preparation. Can get >100kbp reads. But not trivial to do so! Read length
  33. 33. Nanopore - error rate ∷ 5-mer errors ∷ Homopolymer issues ∷ Not modelling base mods yet ∷ Changes with pore & motor enzyme Percent identity (aligned)
  34. 34. MinION - applications ∷ Same as PacBio plus.... ∷ Portable sequencing : in the field eg. Josh Quick in Guinea for Ebola : in hospitals - infection control : monitoring - water/food supply, production facilities : at the GP - pathogen test in 10 min from blood prick? : spit in a home device every morning?
  35. 35. Disruptive technology Or just another sequencer?
  36. 36. “Run until” Dynamically adjust sequencing yield
  37. 37. “Read until” ∷ Can access events/bases during reading : remember reads are long 40 kbp : examine first 100 bp say (40 bp/sec currently) : can decide to stop reading and eject molecule! ∷ This is a killer app! : only want pathogens? eject if human DNA : only want exome? eject if not exonic looking : controlled with Python code
  38. 38. “Fast mode” ∷ 2015 / MkI : enzyme deliberately slowed down - ASIC can’t keep up! : ~40 bases / sec / channel : 40 bp x 24 h x 512 channels ~ 2 Gbp ∷ 2016 / MkII : new ASIC with ~3000 channels x 500 bp / sec : 500 bp x 1 week x 3000 channels ~ 900 Gbp ∷ PromethION : has 48 flow cells . . . ~ 400 Tbp / week !?!
  39. 39. VolTRAX - library prep
  40. 40. A new business model ∷ No capital or reagent costs : Instrument will be free : Flow cells will be free : Only pay for what you want to sequence : Min. $20 and ~$1000 for a 100x human genome ∷ But I’ll scam the system! : Flowcell stats sent back to base : Won’t send you new flow cells if they look unused
  41. 41. How will our job change?
  42. 42. Some things never change ∷ Don’t worry! : 50% of our job will always be converting file formats ☺ ∷ HDF5 - Hierarchial Data Format : groups, multi-dimensional, indexed, random access : Pacbio and Nanopore produce .h5 files ∷ Can extract FASTQ from HDF5 easily
  43. 43. New work patterns ∷ Less aligning to reference, more de novo ∷ Learning to work with haplotypes ∷ More graph-based methods ∷ Complex structural variant information: VCF4.2 ∷ Smaller data files ?
  44. 44. Read alignment ∷ Reads are 15% error, mainly indels ∷ PacBio : BLASR, Daligner, MHAP : BWA MEM: bwa mem -x pacbio ∷ Nanopore : BWA MEM: bwa mem -x ont : MarginAlign - sum over possible alignments
  45. 45. De novo assembly ∷ Pacbio : Very modular system: Overlap, Layout, Consensus : Polylploid aware assembly possible ∷ MinION : Higher error rate, but rapid community development : NanoCorrect + Celera Assembler + NanoPolish ∷ For existing assemblies : Gap-filling, scaffolding/breaking, hybrid assembly
  46. 46. Streaming analysis ∷ We are not going to keep all this data : Need to think streaming analyses ∷ Extract info we need and discard : Cheaper to resequence? ∷ Lots of new applications : Much scope for method development : Even more scope for biological discovery
  47. 47. Conclusion
  48. 48. Exciting times! ∷ Genomics is changing all the time : but science will press on ∷ Pipelines are often short lived : except maybe clinical / accredited ones ∷ Bioinformaticians need to be able to adapt : focus on key skills not specific apps
  49. 49. Acknowledgments Doherty Institute ∷ Tim Stinear ∷ Ben Howden VLSCI ∷ Andrew Lonie ∷ Helen Gardiner ∷ Dieter Bulach ∷ Simon Gladman Twitter/Blogs ∷ Nick Loman ∷ Lex Nederbragt ∷ Keith Robison ∷ Konrad Paszkiewicz Oxford Nanopore ∷ Clive Brown ∷ Gordon Sanghera Millennium Science ∷ Paul Lacaze ∷ Matthew Frazer ∷ Rubber Chicken Pacific Biosciences ∷ Siddarth Singh ∷ Jason Chin ∷ Stephen Turner All the other CIs on the LIEF grant
  50. 50. Contact http://tseemann.github.io t.seemann@unimelb.edu.au @torstenseemann
  51. 51. The End Thank you for listening.

×