SlideShare une entreprise Scribd logo
1  sur  14
Télécharger pour lire hors ligne
www.citrusgreening.org
High quality arthropod genome assembly with
single molecule reads and long-range
scaffolding
Prashant S Hosmani1, Mirella Flores-Gonzalez1, Wayne Hunter2, Lukas A.
Mueller1, Susan Brown3, and Surya Saha1
1Boyce Thompson Institute; 2USDA-ARS U.S. Horticultural Research Laboratory; 3Kansas
State University
ss2489@cornell.edu @SahaSurya
Entomology 2017
Advances in Arthropod Genomics Workshop
www.citrusgreening.org
Acknowledgements
Mueller Lab
Mirella Flores
Prashant Hosmani
Kansas State University
Sue Brown
Cornell University/BTI
Michelle (Cilia) Heck
USDA/ARS
Wayne Hunter
Robert Shatters
University of California, Davis
Carolyn Slupsky
Indian River State College
Tom D’elia
www.citrusgreening.org
Citrus Greening: Huanglongbing
• Most significant disease of citrus worldwide
• More than $4.5 billion in lost citrus production and more than 8,200 lost jobs
(2006/07 to 2010/11)
• Associated with gram negative bacterium Candidatus Liberibacter asiaticus (CLas)
• Spread by insect vector, Diaphorina citri (Asian citrus psyllid, ACP)
Annie Kruse
www.citrusgreening.org
Omics resources and databases are required for
identification of targets for interdiction
4
Genome Annotation
Target for interdiction molecules
Pathway Databases
Expression Networks
…….
Host
Vector
Pathogen
www.citrusgreening.org
Genome Diaci1.1
Contigs 161,988
Total
Length
485 Mb
Longest 1 Mb
Shortest 201bp
Ns 19.3 Mb
Scaffold N50: 109,898 bp
Contig N50: 34,407bp
Highly fragmented
Many examples of
misassemblies!!
Current Illumina assembly
http://biobeans.blogspot.com/2012/11/bioinformatics-genome-assembly.html
www.citrusgreening.org
Pacbio assembly
Error rate 0.013 Error rate 0.015
Number of
contigs
7,832 8,030
Total bases 462.8 Mb 493.1 Mb
Longest 1.6 Mb 1.7 Mb
Shortest 4.4 Kbp 5 Kbp
Average
length
59.9 Kb 61.4 Kb
Contig N50 85.8 Kb 92.6 Kb
Koren 2017
Contiguous assembly with longer contigs
Multiple individuals in DNA sample
http://canu.readthedocs.io/en/stable/
www.citrusgreening.org
PBJelly scaffolding
Canu assembly Scaffolded Assembly
v1.9
Number of contigs 7,832 8,352
Total bases 462.8 Mb 591.7 Mb
Longest 1.6 Mb 2 Mb
Shortest 4.4 Kb 1.5 Kb
Average length 59 Kb 70.8 Kb
Contig N50 85.8 Kb 115.8 Kb
5,290 gap extensions
535 gaps filled
Number of Ns: 0 bp
English 2012
www.citrusgreening.org
v1.91 v1.92
REFERENCE
v1.92
ALTERNATE
Number of
contigs
3,681 1,918 1,763
Total bases 596 Mb 513 Mb 83.4 Mb
Longest 4.2 Mb 4.2 Mb 760.6 Kb
Shortest 1.5 Kb 6 Kb 1.5 Kb
Average
length
162 Kb 267 Kb 47.3 Kb
Contig N50 620 Kb 755.7 Kb 75.1 Kb
Ns 5.1 Mb 4.6 Mb 467 Kb
500ng input DNA from single male psyllid
Duplicated contigs added to alternate assembly
https://github.com/Gabaldonlab/redundans
https://github.com/broadinstitute/pilon/wiki
Error correction
• DNA sequencing data
• RNA sequencing data
• Duplication removal
• Scaffolding
scaffolding
www.citrusgreening.org
Gene isoform sequencing (Iso-Seq)
Accurate gene models are
necessary for targeting assays
• Majority of genes are alternatively
spliced to produce multiple
transcript isoforms.
• Iso-Seq generates full-length cDNA
sequences (full-length transcripts
and gene isoforms).
Current MCOT (de novo and genome-based)
transcriptome is useful but fragmented
Korf 2013
www.citrusgreening.org
Sequencing full-length gene isoforms
www.citrusgreening.org
Mapping to D. citri genome
Isoforms mapped to D. citri
v1.92
Total isoforms: 314,275
Isoseq provides a comprehensive (de novo and genome-based)
transcriptome with full-length transcripts and a range of isoforms
Counts
Number of
genes
18,799
(30,562 in MCOT)
Number of
isoforms
61,086
Average
number of
isoforms/gene
3.24
N50 2.7 Kb
Longest 9 Kb
Shortest 100 bp
www.citrusgreening.org
Evaluating the assembly
Complete Fragmented Missing
Diaci 1.1 74.8% 0.3% 24.9%
Diaci 1.92 85.2% 0.1% 14.7%
Overall alignment
rate
Concordant
alignment rate
Diaci 1.1 82% 0.62%
Diaci 1.92 88% 60%
Benchmarking sets of Universal Single-Copy Orthologs based on a set of 3350 single-copy
orthologs from hemipteran species
Paired-end RNAseq
alignment
MCOT Isoseq
(full-length transcripts)
Diaci 1.1 1054 bp 470 bp
Diaci 1.92 1321 bp 699 bp
Average length of
aligned coding
sequence
NNN
www.citrusgreening.org
Improved genome and annotation will expedite
identification of targets for interdiction
13
Genome
Pacbio
v1.92
Annotation
Isoseq
Target for interdiction molecules
Pathway Databases
Expression Networks
…….
Host
Vector
Pathogen
www.citrusgreening.org
Thank you!!
Utilizing system biology resources to decipher a tritrophic disease complex
Prashant Hosmani
Wednesday, 10:30 AM - 10:45 AM
Member Symposium: Applying Emerging Genomic Techniques to Control Invasive Species

Contenu connexe

Plus de Surya Saha

Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...
Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...
Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...
Surya Saha
 
Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...
Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...
Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...
Surya Saha
 

Plus de Surya Saha (20)

Quality Control of Sequencing Data
Quality Control of Sequencing Data Quality Control of Sequencing Data
Quality Control of Sequencing Data
 
Sequencing 2017
Sequencing 2017Sequencing 2017
Sequencing 2017
 
Community resources for all y’all Omics
Community resources for all y’all OmicsCommunity resources for all y’all Omics
Community resources for all y’all Omics
 
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
 CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis... CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
 
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
 
Sequencing 2016
Sequencing 2016Sequencing 2016
Sequencing 2016
 
Tomato Genome Build SL3.0
Tomato Genome Build SL3.0Tomato Genome Build SL3.0
Tomato Genome Build SL3.0
 
Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015
 
Quality Control of Sequencing Data
Quality Control of Sequencing DataQuality Control of Sequencing Data
Quality Control of Sequencing Data
 
Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015
 
Tomato Genome SL2.50 and Beyond…
Tomato Genome SL2.50 and Beyond…Tomato Genome SL2.50 and Beyond…
Tomato Genome SL2.50 and Beyond…
 
Sequencing
SequencingSequencing
Sequencing
 
Quality Control of NGS Data
Quality Control of NGS Data Quality Control of NGS Data
Quality Control of NGS Data
 
Quality Control of NGS Data Solutions
Quality Control of NGS Data  SolutionsQuality Control of NGS Data  Solutions
Quality Control of NGS Data Solutions
 
Sequencing, Genome Assembly and the SGN Platform
Sequencing, Genome Assembly and the SGN PlatformSequencing, Genome Assembly and the SGN Platform
Sequencing, Genome Assembly and the SGN Platform
 
ICAR Soybean Indore 2014
ICAR Soybean Indore 2014ICAR Soybean Indore 2014
ICAR Soybean Indore 2014
 
Sequencing: The Next Generation
Sequencing: The Next GenerationSequencing: The Next Generation
Sequencing: The Next Generation
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...
Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...
Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...
 
Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...
Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...
Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...
 

Dernier

Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
Silpa
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
Silpa
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
Silpa
 

Dernier (20)

module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 

High quality arthropod genome assembly with single molecule reads and long-range scaffolding

  • 1. www.citrusgreening.org High quality arthropod genome assembly with single molecule reads and long-range scaffolding Prashant S Hosmani1, Mirella Flores-Gonzalez1, Wayne Hunter2, Lukas A. Mueller1, Susan Brown3, and Surya Saha1 1Boyce Thompson Institute; 2USDA-ARS U.S. Horticultural Research Laboratory; 3Kansas State University ss2489@cornell.edu @SahaSurya Entomology 2017 Advances in Arthropod Genomics Workshop
  • 2. www.citrusgreening.org Acknowledgements Mueller Lab Mirella Flores Prashant Hosmani Kansas State University Sue Brown Cornell University/BTI Michelle (Cilia) Heck USDA/ARS Wayne Hunter Robert Shatters University of California, Davis Carolyn Slupsky Indian River State College Tom D’elia
  • 3. www.citrusgreening.org Citrus Greening: Huanglongbing • Most significant disease of citrus worldwide • More than $4.5 billion in lost citrus production and more than 8,200 lost jobs (2006/07 to 2010/11) • Associated with gram negative bacterium Candidatus Liberibacter asiaticus (CLas) • Spread by insect vector, Diaphorina citri (Asian citrus psyllid, ACP) Annie Kruse
  • 4. www.citrusgreening.org Omics resources and databases are required for identification of targets for interdiction 4 Genome Annotation Target for interdiction molecules Pathway Databases Expression Networks ……. Host Vector Pathogen
  • 5. www.citrusgreening.org Genome Diaci1.1 Contigs 161,988 Total Length 485 Mb Longest 1 Mb Shortest 201bp Ns 19.3 Mb Scaffold N50: 109,898 bp Contig N50: 34,407bp Highly fragmented Many examples of misassemblies!! Current Illumina assembly http://biobeans.blogspot.com/2012/11/bioinformatics-genome-assembly.html
  • 6. www.citrusgreening.org Pacbio assembly Error rate 0.013 Error rate 0.015 Number of contigs 7,832 8,030 Total bases 462.8 Mb 493.1 Mb Longest 1.6 Mb 1.7 Mb Shortest 4.4 Kbp 5 Kbp Average length 59.9 Kb 61.4 Kb Contig N50 85.8 Kb 92.6 Kb Koren 2017 Contiguous assembly with longer contigs Multiple individuals in DNA sample http://canu.readthedocs.io/en/stable/
  • 7. www.citrusgreening.org PBJelly scaffolding Canu assembly Scaffolded Assembly v1.9 Number of contigs 7,832 8,352 Total bases 462.8 Mb 591.7 Mb Longest 1.6 Mb 2 Mb Shortest 4.4 Kb 1.5 Kb Average length 59 Kb 70.8 Kb Contig N50 85.8 Kb 115.8 Kb 5,290 gap extensions 535 gaps filled Number of Ns: 0 bp English 2012
  • 8. www.citrusgreening.org v1.91 v1.92 REFERENCE v1.92 ALTERNATE Number of contigs 3,681 1,918 1,763 Total bases 596 Mb 513 Mb 83.4 Mb Longest 4.2 Mb 4.2 Mb 760.6 Kb Shortest 1.5 Kb 6 Kb 1.5 Kb Average length 162 Kb 267 Kb 47.3 Kb Contig N50 620 Kb 755.7 Kb 75.1 Kb Ns 5.1 Mb 4.6 Mb 467 Kb 500ng input DNA from single male psyllid Duplicated contigs added to alternate assembly https://github.com/Gabaldonlab/redundans https://github.com/broadinstitute/pilon/wiki Error correction • DNA sequencing data • RNA sequencing data • Duplication removal • Scaffolding scaffolding
  • 9. www.citrusgreening.org Gene isoform sequencing (Iso-Seq) Accurate gene models are necessary for targeting assays • Majority of genes are alternatively spliced to produce multiple transcript isoforms. • Iso-Seq generates full-length cDNA sequences (full-length transcripts and gene isoforms). Current MCOT (de novo and genome-based) transcriptome is useful but fragmented Korf 2013
  • 11. www.citrusgreening.org Mapping to D. citri genome Isoforms mapped to D. citri v1.92 Total isoforms: 314,275 Isoseq provides a comprehensive (de novo and genome-based) transcriptome with full-length transcripts and a range of isoforms Counts Number of genes 18,799 (30,562 in MCOT) Number of isoforms 61,086 Average number of isoforms/gene 3.24 N50 2.7 Kb Longest 9 Kb Shortest 100 bp
  • 12. www.citrusgreening.org Evaluating the assembly Complete Fragmented Missing Diaci 1.1 74.8% 0.3% 24.9% Diaci 1.92 85.2% 0.1% 14.7% Overall alignment rate Concordant alignment rate Diaci 1.1 82% 0.62% Diaci 1.92 88% 60% Benchmarking sets of Universal Single-Copy Orthologs based on a set of 3350 single-copy orthologs from hemipteran species Paired-end RNAseq alignment MCOT Isoseq (full-length transcripts) Diaci 1.1 1054 bp 470 bp Diaci 1.92 1321 bp 699 bp Average length of aligned coding sequence NNN
  • 13. www.citrusgreening.org Improved genome and annotation will expedite identification of targets for interdiction 13 Genome Pacbio v1.92 Annotation Isoseq Target for interdiction molecules Pathway Databases Expression Networks ……. Host Vector Pathogen
  • 14. www.citrusgreening.org Thank you!! Utilizing system biology resources to decipher a tritrophic disease complex Prashant Hosmani Wednesday, 10:30 AM - 10:45 AM Member Symposium: Applying Emerging Genomic Techniques to Control Invasive Species