SlideShare a Scribd company logo
Barbados Workshop on the Computational Identification 
and Analysis of Transposable Elements 
April 18th - 25th, 2014 
Florian Maumus with Hadi Quesneville (URGI-INRA, Versailles, France)
REPET package 
Genome TEdenovo TEannot Repeat annotation 
Hadi 
Quesneville
De novo repeatome detection 
Deep repeatome annotation 
Repeat annotation in large genomes
De novo repeatome detection 
Deep repeatome annotation 
Repeat annotation in large genomes
Repeat complement = Repeatome 
The Repeatome includes: 
Transposable elements 
Endogenous viruses 
Tandem repeats 
Ribozymes 
Genes 
… 
7 
= What you get with repeat-finders!
Burst and Decay
Dark matter, the genomic humus 
« Repeats » Old repeats Dark matter 
Detected Detectable? Background Noise 
Burst Decay Melt
Complexity of the repeatome 
Turnover ++ 
Recent activity +++ 
Turnover - 
Recent activity - 
young 
old
Different history, different challenges 
Maize 
2.3 Gb genome 
About 85% repeats 
Human 
3.2 Gb genome 
About 50% repeats
LECA: 
Core eukaryotic genes + 
Copia, Gypsy, LINEs, 
DNA transposons… 
TEs have been jumping around genes over evolutionary times
Contents include: 
Professional Tool Roll 
Archaeology Margin Trowel 
Battiferro Leaf & Square 
Battiferro forged ornamental tools lance 
Battiferro Trowel and Square 
Aluminium scale rulers 
Small Tools Set 
Hand Shovel 
Small Brush 
Mason Line* 
Line Pegs 
Line Level 
Plumb Bob 
Retractable 
Hi-Viz Grip Knife 
Battiferro Trowel* 
*Optional. 
Archeology toolbox
Repeatome toolbox 
K-mer strict : Tallymer, DSK 
K-mer based : RepeatScout, P-clouds 
Similarity, e.g Recon 
Combined 
RepeatModeler (RepeatScout + Recon) 
TEdenovo (Recon + Piler + Grouper; + RepeatScout in v2.2)
REPET: TEdenovo 
TEdenovo pipeline  Consensus library 
+ RepeatScout (v2.2) 
REPET Classification utility
REPET Classification tool 
Consensus library 
TR search 
Tandem 
Repeat 
Finder 
BLASTx 
tBLASTx 
Repbase 
Pfam hmm 
GyDB hmm 
rDNA 
tRNA 
Host 
genes 
Consensus 1: termLTRs 0,12% TR Bx: AtGypsy; Btx: none profiles: IN, RT LTR retro 
Consensus 2: none 0,32% TR Bx: none; Btx: none profiles: LRR Host gene 
Consensus 3: none 0,23% TR Bx: none; Btx: none profiles: none Unclassified 
Summary of evidences Proposed 
Classification
REPET: TEannot 
TEannot pipeline  genome annotation
TEdenovo 
Performance, Complementarity ? 
RepeatScout 
RepeatModeler
Experimental model 
Arabidopsis thaliana 
120 Mb
Consensus sequences
Sensitivity & Specificity 
0 10 20 30 40 50 Mb 
Genome coverage
All 
TEdenovo+RS+RM 
RepeatScout 
RepeatModeler 
TEdenovo 
TRF 
Tallymer 
Sensitivity 
Percent reference coverage 
0 10 20 30 40 50 60 70 80 90 100
All 
TEdenovo+RS+RM 
RepeatScout 
RepeatModeler 
TEdenovo 
Biological Sensitivity 
Percent 24-nt sRNA coverage 
(Lister et al., 2008) 
0 10 20 30 40 50 60 70 80 90 100
TEdenovo RepeatModeler RepeatScout 
35 
30 
25 
20 
15 
10 
5 
0 
Genome coverage increase (%) 
REPET, RepeatScout, and RepeatModeler employ 
complementary computational methods that together 
enable to better represent repeatome complexity.
Conclusions I 
TEdenovo outcompetes RepeatModeler and RepeatScout 
Greater coverage with 
Less consensus 
Larger consensus 
Larger copies 
Complementarity of TEdenovo, RepeatModeler and RepeatScout 
Comprehensive annotation of complex repeatomes
De novo repeatome detection 
Deep repeatome annotation 
Repeat annotation in large genomes
Arabidopsis 
120 Mb 
Experimental model 
CDS Repeatome Dark matter 
0% 100% 
Three strategies with REPET: 
Annotate genome with genomic copies 
Use relaxed parameters for HSP detection 
Use P-clouds to detect short repeat fragments
Iterative annotation 
Annotate genome with genomic copies 
(Expand the knowledge)
Iterative annotation 
Annotate genome with genomic copies 
(Expand the knowledge)
Iterative annotation 
Annotate genome with genomic copies 
(Expand the knowledge)
Iterative annotation 
Annotate genome with genomic copies 
Genome 
TEdenovo 
Consensus 
TEannot 
Genomic copies 
TEannot 
Genomic copies 
TEannot 
Genomic copies 
TEannot 
Genomic copies
Genome 
Reference 
24-nt sRNA 
Tallymer 
RepeatScout 
RepeatModeler 
0 10 20 30 40 50 60 70 80 90 100 
TEdenovo_1 
TEdenovo_2 
TEdenovo_3 
TEdenovo_4 
Iterative annotation 
Annotate genome with genomic copies
AA 
AC 
AG 
AT 
CA 
CC 
CG 
CT 
GA 
GC 
GG 
TA 
TC 
GT 
TG 
TT 
0,15 
0,05 
-0,05 
CDS 
TEdenovo 
delta_2vs1 
delta_3vs2 
delta_4vs3 
Dinucleotide composition
Relevance 
Genome annotation using the delta_2vs1 copies 
masks as much as 23 Mb (19.5%) of the genome 
Covers 66% of the reference annotation 
and 56% of the TEdenovo annotation 
The supplementary annotations from 
TEdenovo_2 are highly representative of the A. 
thaliana repeatome.
Relaxed (parameters) annotation
Relaxed (parameters) annotation
Relaxed (parameters) annotation 
Default : Identity > 90%, Evalue<1e-300 
Cool : Identity > 85%, Evalue < 1e-50 
Soft : Identity > 80%, Evalue < 1e-20 
Consensus size
Relaxed (parameters) annotation 
24 nt sRNA 
Tallymer 
Reference 
RepeatScout 
RepeatModeler 
0 10 20 30 40 50 60 70 80 90 100 
TEdenovo_1 
TEdenovo_cool 
TEdenovo_soft 
TEdenovo_soft_2
Copy/consensus identity along chr1 
TEdenovo 
Cool 
Soft 
()
Deep annotation of the A. thaliana repeatome 
RepeatScout 
RepeatModeler 
TEdenovo 
Repbase 
(+Buisine et al.) 
Remove 
redundancy 
Bundle library 
TEannot 
Consensus size
Deep annotation of the A. thaliana repeatome 
selected 
not 
selected 
TEannot 
P-clouds 
Complete 
bundle 
annotation
Copies Consensus P-clouds 
In-cloud k-mers 
De Koning et al.
• TEdenovo
• Bundle
• Bundle + P-clouds 
=> Repeated and repeat-derived sequences contribute 
at least 30% to the A. thaliana genome 
Enhanced repeat detection in gene-rich regions
Arabidopsis repeats browser 
Genes 
Buisine et al. 
RepeatScout 
RepeatModeler 
REPET 
Deep annotations 
24-nt sRNA
Conclusions II 
Innovative approaches for deep repeatome annotation 
About one third of the A. thaliana genome of repetitive origin (vs 24%) 
Increased sensitivity and detection of old repeat remnants 
Improved genome evolution and epigenetic analyses 
Continuum between repeatome and genomic dark matter 
Time
De novo repeatome detection 
Deep repeatome annotation 
Repeat annotation in large genomes
All genomes should benefit the greater quality of 
TEdenovo 
Adapted from Nina V. Fedoroff (2012) and Steven M. Carr
Limitations with REPET 
All-by-all genome comparison => LOTS (Gb) of high scoring pairs (HSPs) 
HSP files > 1 Gb are not handled by Piler 
Grouper can last for weeks 
Impossible to run TEdenovo on whole large and/or highly 
repeated genomes until recently
Solutions 
Use a sample of whole genome as input for TEdenovo (e.g. 300Mb) 
(As recommended for RepeatModeler)
Tomato genomes 
S. pennellii : 942 Mb 
S. lycopersicum : 782 Mb
0 0.5 1 Gb 
TEdenovo 
(n HSP >= 5) 
320 Mb input 
Consensus library 
TEannot
Mb 
82% of the Solanum pennellii ATGC space masked
Conclusions III 
Efficient annotation of large plant genomes with REPET 
Still quite a long process !
De novo repeat annotation in large genomes 
Future developments 
Parallelize Grouper 
Parallelize the “Long join” procedure 
Establish phyla-specific approaches 
Develop strategies to annotate genomes with different 
composition 
old, complex repeatomes as compared to large plant 
genomes
De novo repeat annotation in large genomes 
Future challenges & perspectives 
Propose TEdenovo and TEannot pipelines on GALAXY 
Deliver REPET compilation for use on a cloud
Véronique 
Jamilloux 
Tina Alaeitabar 
Timothée 
Chaumier 
Olivier Inizan 
Mark Moissette 
Hadi 
Quesneville 
THANK YOU !

More Related Content

Similar to Bits of the Green Junk

2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-researchc.titus.brown
 
Pcr, Polymerase chain reaction principle of PCR, #PCR
Pcr, Polymerase chain reaction principle of PCR, #PCRPcr, Polymerase chain reaction principle of PCR, #PCR
Pcr, Polymerase chain reaction principle of PCR, #PCR
RAHUL SINWER
 
Cot curve
Cot curve Cot curve
Cot curve
EmaSushan
 
Unlike DNA replication in the cell, PCR uses heat to separate DNA st.pdf
Unlike DNA replication in the cell, PCR uses heat to separate DNA st.pdfUnlike DNA replication in the cell, PCR uses heat to separate DNA st.pdf
Unlike DNA replication in the cell, PCR uses heat to separate DNA st.pdf
temperaturejeans
 
In situ hybridization methods and techniques course slides Pat Heslop-Harrison
In situ hybridization methods and techniques course slides Pat Heslop-HarrisonIn situ hybridization methods and techniques course slides Pat Heslop-Harrison
In situ hybridization methods and techniques course slides Pat Heslop-Harrison
Pat (JS) Heslop-Harrison
 
Advances Of Molecular Genetics Of Poultry
Advances Of Molecular Genetics Of PoultryAdvances Of Molecular Genetics Of Poultry
Advances Of Molecular Genetics Of Poultry
Dr Alok Bharti
 
Real-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe ParkerReal-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe Parker
Joe Parker
 
Plant Chromosomes: European Cytogeneticists outline: Trude Schwarzacher and P...
Plant Chromosomes: European Cytogeneticists outline: Trude Schwarzacher and P...Plant Chromosomes: European Cytogeneticists outline: Trude Schwarzacher and P...
Plant Chromosomes: European Cytogeneticists outline: Trude Schwarzacher and P...
Pat (JS) Heslop-Harrison
 
Polymerase chain reaction (pcr)
Polymerase chain reaction (pcr)Polymerase chain reaction (pcr)
Polymerase chain reaction (pcr)
RameshPandi4
 
Genome evolution - tales of scales DNA to crops,months to billions of years, ...
Genome evolution - tales of scales DNA to crops,months to billions of years, ...Genome evolution - tales of scales DNA to crops,months to billions of years, ...
Genome evolution - tales of scales DNA to crops,months to billions of years, ...
Pat (JS) Heslop-Harrison
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030
GenomeInABottle
 
Msr11.ppt
Msr11.pptMsr11.ppt
NAIMA method
NAIMA methodNAIMA method
NAIMA method
dandandany
 
http://lectures.gersteinlab.org/ppt/Gencode-winter08-20090121-pseudogenes/Gen...
http://lectures.gersteinlab.org/ppt/Gencode-winter08-20090121-pseudogenes/Gen...http://lectures.gersteinlab.org/ppt/Gencode-winter08-20090121-pseudogenes/Gen...
http://lectures.gersteinlab.org/ppt/Gencode-winter08-20090121-pseudogenes/Gen...
Mark Gerstein
 
9739142.ppt
9739142.ppt9739142.ppt
9739142.ppt
dawitg2
 
Sept2016 sv nist_intro
Sept2016 sv nist_introSept2016 sv nist_intro
Sept2016 sv nist_intro
GenomeInABottle
 
molecular basis of inheritance - supernotes.pdf
molecular basis of inheritance - supernotes.pdfmolecular basis of inheritance - supernotes.pdf
molecular basis of inheritance - supernotes.pdf
GeetanjaliSaraswat1
 
Aug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsAug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomics
GenomeInABottle
 

Similar to Bits of the Green Junk (20)

2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-research
 
Pcr, Polymerase chain reaction principle of PCR, #PCR
Pcr, Polymerase chain reaction principle of PCR, #PCRPcr, Polymerase chain reaction principle of PCR, #PCR
Pcr, Polymerase chain reaction principle of PCR, #PCR
 
Cot curve
Cot curve Cot curve
Cot curve
 
Unlike DNA replication in the cell, PCR uses heat to separate DNA st.pdf
Unlike DNA replication in the cell, PCR uses heat to separate DNA st.pdfUnlike DNA replication in the cell, PCR uses heat to separate DNA st.pdf
Unlike DNA replication in the cell, PCR uses heat to separate DNA st.pdf
 
In situ hybridization methods and techniques course slides Pat Heslop-Harrison
In situ hybridization methods and techniques course slides Pat Heslop-HarrisonIn situ hybridization methods and techniques course slides Pat Heslop-Harrison
In situ hybridization methods and techniques course slides Pat Heslop-Harrison
 
Pcr
Pcr Pcr
Pcr
 
Advances Of Molecular Genetics Of Poultry
Advances Of Molecular Genetics Of PoultryAdvances Of Molecular Genetics Of Poultry
Advances Of Molecular Genetics Of Poultry
 
Real-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe ParkerReal-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe Parker
 
Plant Chromosomes: European Cytogeneticists outline: Trude Schwarzacher and P...
Plant Chromosomes: European Cytogeneticists outline: Trude Schwarzacher and P...Plant Chromosomes: European Cytogeneticists outline: Trude Schwarzacher and P...
Plant Chromosomes: European Cytogeneticists outline: Trude Schwarzacher and P...
 
Polymerase chain reaction (pcr)
Polymerase chain reaction (pcr)Polymerase chain reaction (pcr)
Polymerase chain reaction (pcr)
 
Genome evolution - tales of scales DNA to crops,months to billions of years, ...
Genome evolution - tales of scales DNA to crops,months to billions of years, ...Genome evolution - tales of scales DNA to crops,months to billions of years, ...
Genome evolution - tales of scales DNA to crops,months to billions of years, ...
 
Iplant pag
Iplant pagIplant pag
Iplant pag
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030
 
Msr11.ppt
Msr11.pptMsr11.ppt
Msr11.ppt
 
NAIMA method
NAIMA methodNAIMA method
NAIMA method
 
http://lectures.gersteinlab.org/ppt/Gencode-winter08-20090121-pseudogenes/Gen...
http://lectures.gersteinlab.org/ppt/Gencode-winter08-20090121-pseudogenes/Gen...http://lectures.gersteinlab.org/ppt/Gencode-winter08-20090121-pseudogenes/Gen...
http://lectures.gersteinlab.org/ppt/Gencode-winter08-20090121-pseudogenes/Gen...
 
9739142.ppt
9739142.ppt9739142.ppt
9739142.ppt
 
Sept2016 sv nist_intro
Sept2016 sv nist_introSept2016 sv nist_intro
Sept2016 sv nist_intro
 
molecular basis of inheritance - supernotes.pdf
molecular basis of inheritance - supernotes.pdfmolecular basis of inheritance - supernotes.pdf
molecular basis of inheritance - supernotes.pdf
 
Aug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsAug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomics
 

Recently uploaded

SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
Wasswaderrick3
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
frank0071
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 

Recently uploaded (20)

SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 

Bits of the Green Junk

  • 1. Barbados Workshop on the Computational Identification and Analysis of Transposable Elements April 18th - 25th, 2014 Florian Maumus with Hadi Quesneville (URGI-INRA, Versailles, France)
  • 2.
  • 3. REPET package Genome TEdenovo TEannot Repeat annotation Hadi Quesneville
  • 4. De novo repeatome detection Deep repeatome annotation Repeat annotation in large genomes
  • 5. De novo repeatome detection Deep repeatome annotation Repeat annotation in large genomes
  • 6.
  • 7. Repeat complement = Repeatome The Repeatome includes: Transposable elements Endogenous viruses Tandem repeats Ribozymes Genes … 7 = What you get with repeat-finders!
  • 9. Dark matter, the genomic humus « Repeats » Old repeats Dark matter Detected Detectable? Background Noise Burst Decay Melt
  • 10. Complexity of the repeatome Turnover ++ Recent activity +++ Turnover - Recent activity - young old
  • 11. Different history, different challenges Maize 2.3 Gb genome About 85% repeats Human 3.2 Gb genome About 50% repeats
  • 12. LECA: Core eukaryotic genes + Copia, Gypsy, LINEs, DNA transposons… TEs have been jumping around genes over evolutionary times
  • 13.
  • 14. Contents include: Professional Tool Roll Archaeology Margin Trowel Battiferro Leaf & Square Battiferro forged ornamental tools lance Battiferro Trowel and Square Aluminium scale rulers Small Tools Set Hand Shovel Small Brush Mason Line* Line Pegs Line Level Plumb Bob Retractable Hi-Viz Grip Knife Battiferro Trowel* *Optional. Archeology toolbox
  • 15.
  • 16. Repeatome toolbox K-mer strict : Tallymer, DSK K-mer based : RepeatScout, P-clouds Similarity, e.g Recon Combined RepeatModeler (RepeatScout + Recon) TEdenovo (Recon + Piler + Grouper; + RepeatScout in v2.2)
  • 17. REPET: TEdenovo TEdenovo pipeline  Consensus library + RepeatScout (v2.2) REPET Classification utility
  • 18. REPET Classification tool Consensus library TR search Tandem Repeat Finder BLASTx tBLASTx Repbase Pfam hmm GyDB hmm rDNA tRNA Host genes Consensus 1: termLTRs 0,12% TR Bx: AtGypsy; Btx: none profiles: IN, RT LTR retro Consensus 2: none 0,32% TR Bx: none; Btx: none profiles: LRR Host gene Consensus 3: none 0,23% TR Bx: none; Btx: none profiles: none Unclassified Summary of evidences Proposed Classification
  • 19. REPET: TEannot TEannot pipeline  genome annotation
  • 20. TEdenovo Performance, Complementarity ? RepeatScout RepeatModeler
  • 23. Sensitivity & Specificity 0 10 20 30 40 50 Mb Genome coverage
  • 24. All TEdenovo+RS+RM RepeatScout RepeatModeler TEdenovo TRF Tallymer Sensitivity Percent reference coverage 0 10 20 30 40 50 60 70 80 90 100
  • 25. All TEdenovo+RS+RM RepeatScout RepeatModeler TEdenovo Biological Sensitivity Percent 24-nt sRNA coverage (Lister et al., 2008) 0 10 20 30 40 50 60 70 80 90 100
  • 26. TEdenovo RepeatModeler RepeatScout 35 30 25 20 15 10 5 0 Genome coverage increase (%) REPET, RepeatScout, and RepeatModeler employ complementary computational methods that together enable to better represent repeatome complexity.
  • 27. Conclusions I TEdenovo outcompetes RepeatModeler and RepeatScout Greater coverage with Less consensus Larger consensus Larger copies Complementarity of TEdenovo, RepeatModeler and RepeatScout Comprehensive annotation of complex repeatomes
  • 28. De novo repeatome detection Deep repeatome annotation Repeat annotation in large genomes
  • 29. Arabidopsis 120 Mb Experimental model CDS Repeatome Dark matter 0% 100% Three strategies with REPET: Annotate genome with genomic copies Use relaxed parameters for HSP detection Use P-clouds to detect short repeat fragments
  • 30. Iterative annotation Annotate genome with genomic copies (Expand the knowledge)
  • 31. Iterative annotation Annotate genome with genomic copies (Expand the knowledge)
  • 32. Iterative annotation Annotate genome with genomic copies (Expand the knowledge)
  • 33. Iterative annotation Annotate genome with genomic copies Genome TEdenovo Consensus TEannot Genomic copies TEannot Genomic copies TEannot Genomic copies TEannot Genomic copies
  • 34. Genome Reference 24-nt sRNA Tallymer RepeatScout RepeatModeler 0 10 20 30 40 50 60 70 80 90 100 TEdenovo_1 TEdenovo_2 TEdenovo_3 TEdenovo_4 Iterative annotation Annotate genome with genomic copies
  • 35. AA AC AG AT CA CC CG CT GA GC GG TA TC GT TG TT 0,15 0,05 -0,05 CDS TEdenovo delta_2vs1 delta_3vs2 delta_4vs3 Dinucleotide composition
  • 36. Relevance Genome annotation using the delta_2vs1 copies masks as much as 23 Mb (19.5%) of the genome Covers 66% of the reference annotation and 56% of the TEdenovo annotation The supplementary annotations from TEdenovo_2 are highly representative of the A. thaliana repeatome.
  • 39. Relaxed (parameters) annotation Default : Identity > 90%, Evalue<1e-300 Cool : Identity > 85%, Evalue < 1e-50 Soft : Identity > 80%, Evalue < 1e-20 Consensus size
  • 40. Relaxed (parameters) annotation 24 nt sRNA Tallymer Reference RepeatScout RepeatModeler 0 10 20 30 40 50 60 70 80 90 100 TEdenovo_1 TEdenovo_cool TEdenovo_soft TEdenovo_soft_2
  • 41. Copy/consensus identity along chr1 TEdenovo Cool Soft ()
  • 42. Deep annotation of the A. thaliana repeatome RepeatScout RepeatModeler TEdenovo Repbase (+Buisine et al.) Remove redundancy Bundle library TEannot Consensus size
  • 43. Deep annotation of the A. thaliana repeatome selected not selected TEannot P-clouds Complete bundle annotation
  • 44. Copies Consensus P-clouds In-cloud k-mers De Koning et al.
  • 45.
  • 48. • Bundle + P-clouds => Repeated and repeat-derived sequences contribute at least 30% to the A. thaliana genome Enhanced repeat detection in gene-rich regions
  • 49. Arabidopsis repeats browser Genes Buisine et al. RepeatScout RepeatModeler REPET Deep annotations 24-nt sRNA
  • 50. Conclusions II Innovative approaches for deep repeatome annotation About one third of the A. thaliana genome of repetitive origin (vs 24%) Increased sensitivity and detection of old repeat remnants Improved genome evolution and epigenetic analyses Continuum between repeatome and genomic dark matter Time
  • 51. De novo repeatome detection Deep repeatome annotation Repeat annotation in large genomes
  • 52. All genomes should benefit the greater quality of TEdenovo Adapted from Nina V. Fedoroff (2012) and Steven M. Carr
  • 53. Limitations with REPET All-by-all genome comparison => LOTS (Gb) of high scoring pairs (HSPs) HSP files > 1 Gb are not handled by Piler Grouper can last for weeks Impossible to run TEdenovo on whole large and/or highly repeated genomes until recently
  • 54. Solutions Use a sample of whole genome as input for TEdenovo (e.g. 300Mb) (As recommended for RepeatModeler)
  • 55. Tomato genomes S. pennellii : 942 Mb S. lycopersicum : 782 Mb
  • 56. 0 0.5 1 Gb TEdenovo (n HSP >= 5) 320 Mb input Consensus library TEannot
  • 57. Mb 82% of the Solanum pennellii ATGC space masked
  • 58. Conclusions III Efficient annotation of large plant genomes with REPET Still quite a long process !
  • 59. De novo repeat annotation in large genomes Future developments Parallelize Grouper Parallelize the “Long join” procedure Establish phyla-specific approaches Develop strategies to annotate genomes with different composition old, complex repeatomes as compared to large plant genomes
  • 60. De novo repeat annotation in large genomes Future challenges & perspectives Propose TEdenovo and TEannot pipelines on GALAXY Deliver REPET compilation for use on a cloud
  • 61. Véronique Jamilloux Tina Alaeitabar Timothée Chaumier Olivier Inizan Mark Moissette Hadi Quesneville THANK YOU !