5. Input data for genome annotation
- Full-length cDNA sequenced using PacBio IsoSeq (Breaker and Mature
green fruit stages)
- RNAseq Illumina data from >1,300 libraries with >14 billion reads
- Disease resistance data (Martin and Jones labs)
- 3’ and 5’ UTR enriched data (Giovannoni, Aharoni and Sinha labs)
- Public data from NCBI SRA
- NCBI EST sequences (~300 K)
- Full-length cDNA sequences (~13 K) from Micro-Tom (Aoki et. al., 2010)
6. Annotation of protein-coding gene models
ITAG4.0 ITAG2.4
Number of protein-coding genes 34,075 34,725
Average transcript length 1,303 1,209
Average number of exons per gene 4.74 4.61
Fraction of genes with 5' UTR 0.49 0.34
Fraction of genes with 3' UTR 0.58 0.41
Long non-coding RNA in ITAG4.0 - 5,874 with 6,694 alternately spliced isoforms
7. Annotation Edit Distance (AED)
Annotation Edit Distance (AED)
provides a means to evaluate
quality of annotations given the
evidence set.
AED cumulative plot shows
improvements in the ITAG4.0
compared to ITAG2.4.
8. Novel protein coding genes in ITAG4.0
Novel genes in ITAG4.0
are enriched in stress
response genes.
GO-terms enriched in
novel genes are shown as
fold enriched in minus
log10 of their
corresponding P-values.
9. Thank you!
Submit your annotation corrections using Tomato Apollo annotation editor - contact SGN for account
https://solgenomics.net/contact/form