Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Thoughts on the feasibility of an Assemblathon 3 contest

2 379 vues

Publié le

A *draft* version of a talk presented at the 2015 Genome 10K meeting. These are slides I prepared for my PI (Ian Korf) to use. The final version of the talk may differ substantially to what is shown here.

This talk sets out some ideas as to what was bad about the Assemblathon 2 contest and how we could learn from this should there be an Assemblathon 3 contest.

Publié dans : Sciences
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

Thoughts on the feasibility of an Assemblathon 3 contest

  1. 1. Thoughts on the feasibility of… Ian Korf
  2. 2. Please note: this is a draft version of a talk. I.e. these are slides I prepared for Ian Korf to use at the Genome 10K meeting. His final version of this talk will no doubt add/remove much material. Keith Bradnam 2015-03-04
  3. 3. DNA sequencers keep on getting smaller… …the challenges of genome assembly seem to keep getting bigger.
  4. 4. flickr.com/incrediblehow/ Let the people speak…
  5. 5. *If* there was to be an Assemblathon 3, what suggestions or ideas would you have for it? Please tweet them using hashtag #A3wishlist
  6. 6. "Hybrid-approaches with PacBio, Nanopore, and Illumina data; non-model systems; egalitarian genomics" "Polyploid assembly and haplotype reconstruction" "Give people assemblies + reads, competition on best prediction which assembly is ‘best’ on different metrics" "I vote axolotl or lungfish for Assemblathon 3! Big, repetitive, interesting, useful genomes" "Large/complex/repetitive marine genomes; high heterozygosity (no inbred lines); crustacean/sharks; Illumina 250 bp paired ends + PacBio + optical maps" "PacBio vs Illumina assemblies, Illumina with low PacBio coverage to fill gaps + correcting PacBio errors with Illumina" "Polyploid (highly heterozygous) genome assembly challenge with emphasis on sub-genome (haplotype) deconvolution"
  7. 7. "Hybrid-approaches with PacBio, Nanopore, and Illumina data; non-model systems; egalitarian genomics" "Polyploid assembly and haplotype reconstruction" "Give people assemblies + reads, competition on best prediction which assembly is ‘best’ on different metrics" "I vote axolotl or lungfish for Assemblathon 3! Big, repetitive, interesting, useful genomes" "Large/complex/repetitive marine genomes; high heterozygosity (no inbred lines); crustacean/sharks; Illumina 250 bp paired ends + PacBio + optical maps" "PacBio vs Illumina assemblies, Illumina with low PacBio coverage to fill gaps + correcting PacBio errors with Illumina" "Polyploid (highly heterozygous) genome assembly challenge with emphasis on sub-genome (haplotype) deconvolution"
  8. 8. "Hybrid-approaches with PacBio, Nanopore, and Illumina data; non-model systems; egalitarian genomics" "Polyploid assembly and haplotype reconstruction" "Give people assemblies + reads, competition on best prediction which assembly is ‘best’ on different metrics" "I vote axolotl or lungfish for Assemblathon 3! Big, repetitive, interesting, useful genomes" "Large/complex/repetitive marine genomes; high heterozygosity (no inbred lines); crustacean/sharks; Illumina 250 bp paired ends + PacBio + optical maps" "PacBio vs Illumina assemblies, Illumina with low PacBio coverage to fill gaps + correcting PacBio errors with Illumina" "Polyploid (highly heterozygous) genome assembly challenge with emphasis on sub-genome (haplotype) deconvolution" A lot of people seem to want to assemble something really difficult! This presumes that we have already mastered assembly of haploid, low-repeat-content, average-sized genomes.
  9. 9. flickr.com/incrediblehow/ Problems with Assemblathon 2
  10. 10. Too many species!
  11. 11. Community effort was diluted across different species (only 2 teams assembled all 3 genomes). Multiple species presented more data management issues.
  12. 12. One species? ?
  13. 13. 285xcoverage of parrot genome Unrealistic amounts of sequence data available
  14. 14. 285x Unrealistic amounts of sequence data available It is not typical to sequence so much data for a genome assembly. Most researchers can not afford to pay for so much sequencing.
  15. 15. Make the assembly challenge representative of a real world scenario
  16. 16. Give teams a virtual budget and let them buy sequencing resources $$$
  17. 17. Budget Team Illumina Moleculo PacBio Oxford Nanopore $5,000 Team A 20x 5x Team B 40x Team C 10x 10x $50,000 Team A 30x 20x 5x 2x Team B 50x 10x Team C 75x 30x 10x Could allow teams to 'buy' sequences from a mix of platforms
  18. 18. Budget Team Illumina Moleculo PacBio Oxford Nanopore $5,000 Team A 20x 5x Team B 40x Team C 10x 10x $50,000 Team A 30x 20x 5x 2x Team B 50x 10x Team C 75x 30x 10x Could potentially have two different budgets available (budgets here are just for illustrative reasons)
  19. 19. Budget Team Illumina Moleculo PacBio Oxford Nanopore $5,000 Team A 20x 5x Team B 40x Team C 10x 10x $50,000 Team A 30x 20x 5x 2x Team B 50x 10x Team C 75x 30x 10x Fictional example to show different teams could use different strategies
  20. 20. Low amounts of useful validation data
  21. 21. Low amounts of useful validation data PacBio data could have been held back to validate assemblies but wasn't and was then only used by a few teams. No good transcript data. Had Fosmids and optical maps (but not for all species).
  22. 22. • More Fosmid and/or BAC sequences? • Transcript(ome) data? • Long read sequence data? • Synteny information? • Tools such as Irys from BioNano Genomics?
  23. 23. Documentation for how assemblies were made was often poor or missing altogether X
  24. 24. • Require reproducible assembly instructions at the time of submission • Request better information relating to computer architecture used to make assembly
  25. 25. flickr.com/incrediblehow/ Other considerations for Assemblathon 3
  26. 26. Two different sequence file formats have been developed that can represent haplotype variation in a genome assembly GFA FASTG
  27. 27. GFA FASTG Two different sequence file formats have been developed that can represent haplotype variation in a genome assembly Neither format seems to have been widely adopted… plus there are no (?) downstream bioinformatics tools that work with these formats. Would requiring either format deter participation?
  28. 28. Encourage multiple entries per team? assembly_1a.fasta assembly_1b.fasta
  29. 29. Encourage multiple entries per team? assembly_1a.fasta assembly_1b.fasta Some of the better assemblies in Assemblathon 2 were the 'experimental' entries.
  30. 30. flickr.com/incrediblehow/ What species?
  31. 31. How about an endangered species?
  32. 32. How about an endangered species? Assemblathon 3 could become a shining example of conservation genomics, and choosing an endangered species might help attract more community support. Also good PR!
  33. 33. How about an endangered species? California Condor (Gymnogyps californianus) Image from http://www.manataka.org/
  34. 34. How about an endangered species? California Condor (Gymnogyps californianus) Image from http://www.manataka.org/ Critically endangered. BAC resources may be available.
  35. 35. Tuatara lizard (Sphenodon punctatus) Image from https://student.societyforscience.org/
  36. 36. Tuatara lizard (Sphenodon punctatus) Image from https://student.societyforscience.org/ A 'living fossil'. Low risk of extinction. BAC libraries and partial transcriptome exist.
  37. 37. Spiny rat (Tokudaia spp) Image from https://wikimedia.org/
  38. 38. Spiny rat (Tokudaia spp) Image from https://wikimedia.org/ Endangered. Transcriptome available.
  39. 39. But does it have to be a Genome 10K species ?
  40. 40. But does it have to be a Genome 10K species ? If the species is eukaryotic and has a large genome, this would still be useful to assess assemblers that could be used for other Genome 10K species.
  41. 41. White abalone (Haliotis sorenseni) Image from https://wikimedia.org/
  42. 42. White abalone (Haliotis sorenseni) Image from https://wikimedia.org/ Estimated genome size: 1.7–2.0 Gbp. Native to California and Mexico. Critically endangered — first marine invertebrate to be listed under the Endangered Species Act.
  43. 43. Successfully bred the first white abalone in captivity in 2012.
  44. 44. Gary Cherr Director, Bodega Marine Laboratory Principle Investigator for abalone captive breeding program
  45. 45. "The restoration of the white abalone in the wild — the first time this would ever have been attempted for a listed marine — may depend on the genome being sequenced." Gary Cherr Director, Bodega Marine Laboratory Principle Investigator for abalone captive breeding program
  46. 46. "There’s probably a few thousand left in the wild. But because they’re so far apart, they’re effectively sterile. Their population could be effectively extinct already." Kristin Aquilino Manager of abalone captive breeding program
  47. 47. flickr.com/incrediblehow/ Summary
  48. 48. • People seem to want very different things out of a possible Assemblathon 3 contest
  49. 49. • Trying to please everyone — rather than focusing on something achievable and helpful to the ultimate users of genome assembly software — might not be the most productive strategy
  50. 50. From Wikimedia commons
  51. 51. Three months later…
  52. 52. From http://flickr.com/markturner/

×