Sl4.0 and ITAG4.0

•

1 j'aime•3,031 vues

solgenomics

Solanum lycopersicum Heinz 1706 genome assembly and annotation SL4.0 and ITAG4.0

Sciences

SL4.0 assembly
● 80X Pacbio coverage with RSII and Sequel (13kb read N50)
● Canu assembly (N50 5.5 Mb)
● Hi-C scaffolding (12 chromosomes and unplaced contigs)
● Corrected with Illumina DNAseq (coverage 60x)
● Filtered for mitochondrial and chloroplast contigs
● Validated with Bionano optical maps and 10X linked reads

Comparison with the previous assemblies
Genome Assembly versions SL4.0 SL3.0 SL2.5
Assembly Size (bp) 782,520,133 828,076,956 823,944,041
Non-N bases 782,475,302 746,357,470 737,636,348
N’s (bp) 44,831 81,719,486 86,307,693
Chr 00 / unplaced contig size (bp)
9,643,350 20,852,292 21,805,821
Number of Chr 00 contigs 152 3,141 4,410
Repeat content
(RepeatModeler/RepeatMasker)
64.19% 56.39% 56.34%
Repeat content (REPET) 71.77% 61.55% 60.94%
Assembly completeness estimation
based on kmer's
99.24% 98.96% 98.83%

SL3.0 vs SL4.0
Genome assembly co-linearity

Input data for genome annotation
- Full-length cDNA sequenced using PacBio IsoSeq (Breaker and Mature
green fruit stages)
- RNAseq Illumina data from >1,300 libraries with >14 billion reads
- Disease resistance data (Martin and Jones labs)
- 3’ and 5’ UTR enriched data (Giovannoni, Aharoni and Sinha labs)
- Public data from NCBI SRA
- NCBI EST sequences (~300 K)
- Full-length cDNA sequences (~13 K) from Micro-Tom (Aoki et. al., 2010)

Annotation of protein-coding gene models
ITAG4.0 ITAG2.4
Number of protein-coding genes 34,075 34,725
Average transcript length 1,303 1,209
Average number of exons per gene 4.74 4.61
Fraction of genes with 5' UTR 0.49 0.34
Fraction of genes with 3' UTR 0.58 0.41
Long non-coding RNA in ITAG4.0 - 5,874 with 6,694 alternately spliced isoforms

Annotation Edit Distance (AED)
Annotation Edit Distance (AED)
provides a means to evaluate
quality of annotations given the
evidence set.
AED cumulative plot shows
improvements in the ITAG4.0
compared to ITAG2.4.

Novel protein coding genes in ITAG4.0
Novel genes in ITAG4.0
are enriched in stress
response genes.
GO-terms enriched in
novel genes are shown as
fold enriched in minus
log10 of their
corresponding P-values.

Thank you!
Submit your annotation corrections using Tomato Apollo annotation editor - contact SGN for account
https://solgenomics.net/contact/form

Contenu connexe

Tendances

Gene traps for plant development and genomicskcyaadav

Crop wild relative utilization in plant breedingAbdul GHAFOOR

Whole genome sequencing of arabidopsis thalianaBhavya Sree

Quantitative trait loci (QTL) analysis and its applications in plant breedingPGS

Map based gene cloning in plant.pptxBanoth Madhu

Difference between genetic linkage and physical mapKanimoli Mathivathana

Isolation of promoters and other regularly elementsSachin Ekatpure

The Wheat Genomesampath perumal

Gene introgression from wild relatives to cultivated plantsManjappa Ganiger

Pangenomics.pptxMARUTHI PRASAD

Genomic selectionpandadebadatta

Construction of physical mappingshyam I

Genetic Variability of quality traits of RiceBalaji Thorat

Genetic diversity analysisAKHISHA P. A.

Plant breeding pptjeetendra yadav

SNP Genotyping TechnologiesSivamaniBalasubramaniam

Transcriptomics: A time efficient tool for crop improvementSajid Sheikh

The wheat genome sequence: a foundation for accelerating improvment of bread ...Borlaug Global Rust Initiative

The role of genetic diversity for building resilience for food securityBioversity International

Molecular plant breeding some basic informationbawonpon chonnipat

Tendances (20)

Gene traps for plant development and genomics

Crop wild relative utilization in plant breeding

Whole genome sequencing of arabidopsis thaliana

Quantitative trait loci (QTL) analysis and its applications in plant breeding

Map based gene cloning in plant.pptx

Difference between genetic linkage and physical map

Isolation of promoters and other regularly elements

The Wheat Genome

Gene introgression from wild relatives to cultivated plants

Pangenomics.pptx

Genomic selection

Construction of physical mapping

Genetic Variability of quality traits of Rice

Genetic diversity analysis

Plant breeding ppt

SNP Genotyping Technologies

Transcriptomics: A time efficient tool for crop improvement

The wheat genome sequence: a foundation for accelerating improvment of bread ...

The role of genetic diversity for building resilience for food security

Molecular plant breeding some basic information

Similaire à Sl4.0 and ITAG4.0

Chigot poster2007Elsa von Licy

Dna microarraysElsa von Licy

Rnaseq forgenefindingSucheta Tripathy

7 0Sean Paul

Building Genomic Data Processing and Machine Learning Workflows Using Apache ...Databricks

NUGEN-X-Gen_2011_poster_trancriptome_sequencing_RNA-SeqHimanshu Sethi

Using field-based DNA sequencing to accelerate phylogenomicsJoe Parker

Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)GigaScience, BGI Hong Kong

600 base reads on the Ion S5™ Next-Generation Sequencing System enables accur...Thermo Fisher Scientific

2015.04.08-Next-generation-sequencing-issuesDongyan Zhao

Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...Jennifer Shelton

Bio305 genome analysis and annotation 2012Mark Pallen

140127 abrf interlaboratory study proposalGenomeInABottle

RSEM and DE packagesRavi Gandham

Dorobantu Adina BMS2 - Molecular Biology FLR.pdfAdinaGeorgiana7

20081216 06陳倩琪紅麴菌基因體之定序與分析Monascus2008

Church_GenomeAccess_2013_genome2013Deanna Church

Ramorum2016 finalSucheta Tripathy

Project Presentationbutest

Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...Integrated DNA Technologies

Similaire à Sl4.0 and ITAG4.0 (20)

Chigot poster2007

Dna microarrays

Rnaseq forgenefinding

7 0

Building Genomic Data Processing and Machine Learning Workflows Using Apache ...

NUGEN-X-Gen_2011_poster_trancriptome_sequencing_RNA-Seq

Using field-based DNA sequencing to accelerate phylogenomics

Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)

600 base reads on the Ion S5™ Next-Generation Sequencing System enables accur...

2015.04.08-Next-generation-sequencing-issues

Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...

Bio305 genome analysis and annotation 2012

140127 abrf interlaboratory study proposal

RSEM and DE packages

Dorobantu Adina BMS2 - Molecular Biology FLR.pdf

20081216 06陳倩琪紅麴菌基因體之定序與分析

Church_GenomeAccess_2013_genome2013

Ramorum2016 final

Project Presentation

Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...

Plus de solgenomics

Cassavabase-PhenoApps demo ISTRC 2018solgenomics

Cassavabase-PhenoApp sample trackingsolgenomics

breeding informatics solutions at SGNsolgenomics

Musabase PAG 2018solgenomics

Cassavabase workshop ibadan March17solgenomics

SolGS Hyderabad conference 2016solgenomics

Musa base phenotyping workflow demosolgenomics

SolGS workshop 2016solgenomics

Cassavabase workshop IITA oct2016solgenomics

Sql cheat sheetsolgenomics

Introduction to SQLsolgenomics

YamBase phenotyping workflow demosolgenomics

Introduction to YamBasesolgenomics

Cassavabase general presentation PAG 2016solgenomics

Cassavabase SolGS presentation PAG 2016solgenomics

Cassavabase SolGS poster PAG 2016solgenomics

1 introduction to cassavabase solgenomics

2 Cassavabase workshop: search menusolgenomics

3a Cassavabase worksop: manage breeding-program ands locationssolgenomics

3b Cassavabase workshop: manage accessionssolgenomics

Plus de solgenomics (20)

Cassavabase-PhenoApps demo ISTRC 2018

Cassavabase-PhenoApp sample tracking

breeding informatics solutions at SGN

Musabase PAG 2018

Cassavabase workshop ibadan March17

SolGS Hyderabad conference 2016

Musa base phenotyping workflow demo

SolGS workshop 2016

Cassavabase workshop IITA oct2016

Sql cheat sheet

Introduction to SQL

YamBase phenotyping workflow demo

Introduction to YamBase

Cassavabase general presentation PAG 2016

Cassavabase SolGS presentation PAG 2016

Cassavabase SolGS poster PAG 2016

1 introduction to cassavabase

2 Cassavabase workshop: search menu

3a Cassavabase worksop: manage breeding-program ands locations

3b Cassavabase workshop: manage accessions

Dernier

ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...Chayanika Das

AZOTOBACTER AS BIOFERILIZER.PPTXGovt. N.P.G College of Science Raipur (C.G)

Science (Communication) and Wikipedia - Potentials and PitfallsDobusch Leonhard

Pests of Sunflower_Binomics_Identification_Dr.UPRPirithiRaju

Let’s Say Someone Did Drop the Bomb. Then What?LUMINATIVE MEDIA/PROJECT COUNSEL MEDIA GROUP

PLASMODIUM. PPTXGovt. N.P.G College of Science Raipur (C.G)

Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Sérgio Sacani

complex analysis best book for solving questions.pdfSubhamKumar3239

Gas-ExchangeS-in-Plants-and-Animals.pptxGiovaniTrinidad

Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Christina Parmionova

Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learningvschiavoni

LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2AuEnriquezLontok

Environmental acoustics- noise criteria.pptxpriyankatabhane

Interferons.pptx.Govt. N.P.G College of Science Raipur (C.G)

Measures of Central Tendency.pptx for UGSoniaBajaj10

cybrids.pptx production_advanges_limitationSanghamitraMohapatra5

GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh

KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfGABYFIORELAMALPARTID1

LAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary MicrobiologyChayanika Das

GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide

Dernier (20)

ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...

AZOTOBACTER AS BIOFERILIZER.PPTX

Science (Communication) and Wikipedia - Potentials and Pitfalls

Pests of Sunflower_Binomics_Identification_Dr.UPR

Let’s Say Someone Did Drop the Bomb. Then What?

PLASMODIUM. PPTX

Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...

complex analysis best book for solving questions.pdf

Gas-ExchangeS-in-Plants-and-Animals.pptx

Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...

Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning

LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2

Environmental acoustics- noise criteria.pptx

Interferons.pptx.

Measures of Central Tendency.pptx for UG

cybrids.pptx production_advanges_limitation

GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides

KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf

LAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary Microbiology

GenAI talk for Young at Wageningen University & Research (WUR) March 2024

Sl4.0 and ITAG4.0

1. Solanum lycopersicum Heinz 1706 genome assembly and annotation SL4.0 and ITAG4.0 Sol Genomics Network https://solgenomics.net/

2. SL4.0 assembly ● 80X Pacbio coverage with RSII and Sequel (13kb read N50) ● Canu assembly (N50 5.5 Mb) ● Hi-C scaffolding (12 chromosomes and unplaced contigs) ● Corrected with Illumina DNAseq (coverage 60x) ● Filtered for mitochondrial and chloroplast contigs ● Validated with Bionano optical maps and 10X linked reads

3. Comparison with the previous assemblies Genome Assembly versions SL4.0 SL3.0 SL2.5 Assembly Size (bp) 782,520,133 828,076,956 823,944,041 Non-N bases 782,475,302 746,357,470 737,636,348 N’s (bp) 44,831 81,719,486 86,307,693 Chr 00 / unplaced contig size (bp) 9,643,350 20,852,292 21,805,821 Number of Chr 00 contigs 152 3,141 4,410 Repeat content (RepeatModeler/RepeatMasker) 64.19% 56.39% 56.34% Repeat content (REPET) 71.77% 61.55% 60.94% Assembly completeness estimation based on kmer's 99.24% 98.96% 98.83%

4. SL3.0 vs SL4.0 Genome assembly co-linearity

5. Input data for genome annotation - Full-length cDNA sequenced using PacBio IsoSeq (Breaker and Mature green fruit stages) - RNAseq Illumina data from >1,300 libraries with >14 billion reads - Disease resistance data (Martin and Jones labs) - 3’ and 5’ UTR enriched data (Giovannoni, Aharoni and Sinha labs) - Public data from NCBI SRA - NCBI EST sequences (~300 K) - Full-length cDNA sequences (~13 K) from Micro-Tom (Aoki et. al., 2010)

6. Annotation of protein-coding gene models ITAG4.0 ITAG2.4 Number of protein-coding genes 34,075 34,725 Average transcript length 1,303 1,209 Average number of exons per gene 4.74 4.61 Fraction of genes with 5' UTR 0.49 0.34 Fraction of genes with 3' UTR 0.58 0.41 Long non-coding RNA in ITAG4.0 - 5,874 with 6,694 alternately spliced isoforms

7. Annotation Edit Distance (AED) Annotation Edit Distance (AED) provides a means to evaluate quality of annotations given the evidence set. AED cumulative plot shows improvements in the ITAG4.0 compared to ITAG2.4.

8. Novel protein coding genes in ITAG4.0 Novel genes in ITAG4.0 are enriched in stress response genes. GO-terms enriched in novel genes are shown as fold enriched in minus log10 of their corresponding P-values.

9. Thank you! Submit your annotation corrections using Tomato Apollo annotation editor - contact SGN for account https://solgenomics.net/contact/form

Sl4.0 and ITAG4.0

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Sl4.0 and ITAG4.0

Similaire à Sl4.0 and ITAG4.0 (20)

Plus de solgenomics

Plus de solgenomics (20)

Dernier

Dernier (20)

Sl4.0 and ITAG4.0