SlideShare une entreprise Scribd logo
1  sur  61
Barbados Workshop on the Computational Identification 
and Analysis of Transposable Elements 
April 18th - 25th, 2014 
Florian Maumus with Hadi Quesneville (URGI-INRA, Versailles, France)
REPET package 
Genome TEdenovo TEannot Repeat annotation 
Hadi 
Quesneville
De novo repeatome detection 
Deep repeatome annotation 
Repeat annotation in large genomes
De novo repeatome detection 
Deep repeatome annotation 
Repeat annotation in large genomes
Repeat complement = Repeatome 
The Repeatome includes: 
Transposable elements 
Endogenous viruses 
Tandem repeats 
Ribozymes 
Genes 
… 
7 
= What you get with repeat-finders!
Burst and Decay
Dark matter, the genomic humus 
« Repeats » Old repeats Dark matter 
Detected Detectable? Background Noise 
Burst Decay Melt
Complexity of the repeatome 
Turnover ++ 
Recent activity +++ 
Turnover - 
Recent activity - 
young 
old
Different history, different challenges 
Maize 
2.3 Gb genome 
About 85% repeats 
Human 
3.2 Gb genome 
About 50% repeats
LECA: 
Core eukaryotic genes + 
Copia, Gypsy, LINEs, 
DNA transposons… 
TEs have been jumping around genes over evolutionary times
Contents include: 
Professional Tool Roll 
Archaeology Margin Trowel 
Battiferro Leaf & Square 
Battiferro forged ornamental tools lance 
Battiferro Trowel and Square 
Aluminium scale rulers 
Small Tools Set 
Hand Shovel 
Small Brush 
Mason Line* 
Line Pegs 
Line Level 
Plumb Bob 
Retractable 
Hi-Viz Grip Knife 
Battiferro Trowel* 
*Optional. 
Archeology toolbox
Repeatome toolbox 
K-mer strict : Tallymer, DSK 
K-mer based : RepeatScout, P-clouds 
Similarity, e.g Recon 
Combined 
RepeatModeler (RepeatScout + Recon) 
TEdenovo (Recon + Piler + Grouper; + RepeatScout in v2.2)
REPET: TEdenovo 
TEdenovo pipeline  Consensus library 
+ RepeatScout (v2.2) 
REPET Classification utility
REPET Classification tool 
Consensus library 
TR search 
Tandem 
Repeat 
Finder 
BLASTx 
tBLASTx 
Repbase 
Pfam hmm 
GyDB hmm 
rDNA 
tRNA 
Host 
genes 
Consensus 1: termLTRs 0,12% TR Bx: AtGypsy; Btx: none profiles: IN, RT LTR retro 
Consensus 2: none 0,32% TR Bx: none; Btx: none profiles: LRR Host gene 
Consensus 3: none 0,23% TR Bx: none; Btx: none profiles: none Unclassified 
Summary of evidences Proposed 
Classification
REPET: TEannot 
TEannot pipeline  genome annotation
TEdenovo 
Performance, Complementarity ? 
RepeatScout 
RepeatModeler
Experimental model 
Arabidopsis thaliana 
120 Mb
Consensus sequences
Sensitivity & Specificity 
0 10 20 30 40 50 Mb 
Genome coverage
All 
TEdenovo+RS+RM 
RepeatScout 
RepeatModeler 
TEdenovo 
TRF 
Tallymer 
Sensitivity 
Percent reference coverage 
0 10 20 30 40 50 60 70 80 90 100
All 
TEdenovo+RS+RM 
RepeatScout 
RepeatModeler 
TEdenovo 
Biological Sensitivity 
Percent 24-nt sRNA coverage 
(Lister et al., 2008) 
0 10 20 30 40 50 60 70 80 90 100
TEdenovo RepeatModeler RepeatScout 
35 
30 
25 
20 
15 
10 
5 
0 
Genome coverage increase (%) 
REPET, RepeatScout, and RepeatModeler employ 
complementary computational methods that together 
enable to better represent repeatome complexity.
Conclusions I 
TEdenovo outcompetes RepeatModeler and RepeatScout 
Greater coverage with 
Less consensus 
Larger consensus 
Larger copies 
Complementarity of TEdenovo, RepeatModeler and RepeatScout 
Comprehensive annotation of complex repeatomes
De novo repeatome detection 
Deep repeatome annotation 
Repeat annotation in large genomes
Arabidopsis 
120 Mb 
Experimental model 
CDS Repeatome Dark matter 
0% 100% 
Three strategies with REPET: 
Annotate genome with genomic copies 
Use relaxed parameters for HSP detection 
Use P-clouds to detect short repeat fragments
Iterative annotation 
Annotate genome with genomic copies 
(Expand the knowledge)
Iterative annotation 
Annotate genome with genomic copies 
(Expand the knowledge)
Iterative annotation 
Annotate genome with genomic copies 
(Expand the knowledge)
Iterative annotation 
Annotate genome with genomic copies 
Genome 
TEdenovo 
Consensus 
TEannot 
Genomic copies 
TEannot 
Genomic copies 
TEannot 
Genomic copies 
TEannot 
Genomic copies
Genome 
Reference 
24-nt sRNA 
Tallymer 
RepeatScout 
RepeatModeler 
0 10 20 30 40 50 60 70 80 90 100 
TEdenovo_1 
TEdenovo_2 
TEdenovo_3 
TEdenovo_4 
Iterative annotation 
Annotate genome with genomic copies
AA 
AC 
AG 
AT 
CA 
CC 
CG 
CT 
GA 
GC 
GG 
TA 
TC 
GT 
TG 
TT 
0,15 
0,05 
-0,05 
CDS 
TEdenovo 
delta_2vs1 
delta_3vs2 
delta_4vs3 
Dinucleotide composition
Relevance 
Genome annotation using the delta_2vs1 copies 
masks as much as 23 Mb (19.5%) of the genome 
Covers 66% of the reference annotation 
and 56% of the TEdenovo annotation 
The supplementary annotations from 
TEdenovo_2 are highly representative of the A. 
thaliana repeatome.
Relaxed (parameters) annotation
Relaxed (parameters) annotation
Relaxed (parameters) annotation 
Default : Identity > 90%, Evalue<1e-300 
Cool : Identity > 85%, Evalue < 1e-50 
Soft : Identity > 80%, Evalue < 1e-20 
Consensus size
Relaxed (parameters) annotation 
24 nt sRNA 
Tallymer 
Reference 
RepeatScout 
RepeatModeler 
0 10 20 30 40 50 60 70 80 90 100 
TEdenovo_1 
TEdenovo_cool 
TEdenovo_soft 
TEdenovo_soft_2
Copy/consensus identity along chr1 
TEdenovo 
Cool 
Soft 
()
Deep annotation of the A. thaliana repeatome 
RepeatScout 
RepeatModeler 
TEdenovo 
Repbase 
(+Buisine et al.) 
Remove 
redundancy 
Bundle library 
TEannot 
Consensus size
Deep annotation of the A. thaliana repeatome 
selected 
not 
selected 
TEannot 
P-clouds 
Complete 
bundle 
annotation
Copies Consensus P-clouds 
In-cloud k-mers 
De Koning et al.
• TEdenovo
• Bundle
• Bundle + P-clouds 
=> Repeated and repeat-derived sequences contribute 
at least 30% to the A. thaliana genome 
Enhanced repeat detection in gene-rich regions
Arabidopsis repeats browser 
Genes 
Buisine et al. 
RepeatScout 
RepeatModeler 
REPET 
Deep annotations 
24-nt sRNA
Conclusions II 
Innovative approaches for deep repeatome annotation 
About one third of the A. thaliana genome of repetitive origin (vs 24%) 
Increased sensitivity and detection of old repeat remnants 
Improved genome evolution and epigenetic analyses 
Continuum between repeatome and genomic dark matter 
Time
De novo repeatome detection 
Deep repeatome annotation 
Repeat annotation in large genomes
All genomes should benefit the greater quality of 
TEdenovo 
Adapted from Nina V. Fedoroff (2012) and Steven M. Carr
Limitations with REPET 
All-by-all genome comparison => LOTS (Gb) of high scoring pairs (HSPs) 
HSP files > 1 Gb are not handled by Piler 
Grouper can last for weeks 
Impossible to run TEdenovo on whole large and/or highly 
repeated genomes until recently
Solutions 
Use a sample of whole genome as input for TEdenovo (e.g. 300Mb) 
(As recommended for RepeatModeler)
Tomato genomes 
S. pennellii : 942 Mb 
S. lycopersicum : 782 Mb
0 0.5 1 Gb 
TEdenovo 
(n HSP >= 5) 
320 Mb input 
Consensus library 
TEannot
Mb 
82% of the Solanum pennellii ATGC space masked
Conclusions III 
Efficient annotation of large plant genomes with REPET 
Still quite a long process !
De novo repeat annotation in large genomes 
Future developments 
Parallelize Grouper 
Parallelize the “Long join” procedure 
Establish phyla-specific approaches 
Develop strategies to annotate genomes with different 
composition 
old, complex repeatomes as compared to large plant 
genomes
De novo repeat annotation in large genomes 
Future challenges & perspectives 
Propose TEdenovo and TEannot pipelines on GALAXY 
Deliver REPET compilation for use on a cloud
Véronique 
Jamilloux 
Tina Alaeitabar 
Timothée 
Chaumier 
Olivier Inizan 
Mark Moissette 
Hadi 
Quesneville 
THANK YOU !

Contenu connexe

Similaire à Bits of the Green Junk

2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-researchc.titus.brown
 
Pcr, Polymerase chain reaction principle of PCR, #PCR
Pcr, Polymerase chain reaction principle of PCR, #PCRPcr, Polymerase chain reaction principle of PCR, #PCR
Pcr, Polymerase chain reaction principle of PCR, #PCRRAHUL SINWER
 
Unlike DNA replication in the cell, PCR uses heat to separate DNA st.pdf
Unlike DNA replication in the cell, PCR uses heat to separate DNA st.pdfUnlike DNA replication in the cell, PCR uses heat to separate DNA st.pdf
Unlike DNA replication in the cell, PCR uses heat to separate DNA st.pdftemperaturejeans
 
In situ hybridization methods and techniques course slides Pat Heslop-Harrison
In situ hybridization methods and techniques course slides Pat Heslop-HarrisonIn situ hybridization methods and techniques course slides Pat Heslop-Harrison
In situ hybridization methods and techniques course slides Pat Heslop-HarrisonPat (JS) Heslop-Harrison
 
Advances Of Molecular Genetics Of Poultry
Advances Of Molecular Genetics Of PoultryAdvances Of Molecular Genetics Of Poultry
Advances Of Molecular Genetics Of PoultryDr Alok Bharti
 
Real-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe ParkerReal-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe ParkerJoe Parker
 
Plant Chromosomes: European Cytogeneticists outline: Trude Schwarzacher and P...
Plant Chromosomes: European Cytogeneticists outline: Trude Schwarzacher and P...Plant Chromosomes: European Cytogeneticists outline: Trude Schwarzacher and P...
Plant Chromosomes: European Cytogeneticists outline: Trude Schwarzacher and P...Pat (JS) Heslop-Harrison
 
Polymerase chain reaction (pcr)
Polymerase chain reaction (pcr)Polymerase chain reaction (pcr)
Polymerase chain reaction (pcr)RameshPandi4
 
Genome evolution - tales of scales DNA to crops,months to billions of years, ...
Genome evolution - tales of scales DNA to crops,months to billions of years, ...Genome evolution - tales of scales DNA to crops,months to billions of years, ...
Genome evolution - tales of scales DNA to crops,months to billions of years, ...Pat (JS) Heslop-Harrison
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030GenomeInABottle
 
http://lectures.gersteinlab.org/ppt/Gencode-winter08-20090121-pseudogenes/Gen...
http://lectures.gersteinlab.org/ppt/Gencode-winter08-20090121-pseudogenes/Gen...http://lectures.gersteinlab.org/ppt/Gencode-winter08-20090121-pseudogenes/Gen...
http://lectures.gersteinlab.org/ppt/Gencode-winter08-20090121-pseudogenes/Gen...Mark Gerstein
 
9739142.ppt
9739142.ppt9739142.ppt
9739142.pptdawitg2
 
molecular basis of inheritance - supernotes.pdf
molecular basis of inheritance - supernotes.pdfmolecular basis of inheritance - supernotes.pdf
molecular basis of inheritance - supernotes.pdfGeetanjaliSaraswat1
 
Aug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsAug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsGenomeInABottle
 

Similaire à Bits of the Green Junk (20)

2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-research
 
Pcr, Polymerase chain reaction principle of PCR, #PCR
Pcr, Polymerase chain reaction principle of PCR, #PCRPcr, Polymerase chain reaction principle of PCR, #PCR
Pcr, Polymerase chain reaction principle of PCR, #PCR
 
Cot curve
Cot curve Cot curve
Cot curve
 
Unlike DNA replication in the cell, PCR uses heat to separate DNA st.pdf
Unlike DNA replication in the cell, PCR uses heat to separate DNA st.pdfUnlike DNA replication in the cell, PCR uses heat to separate DNA st.pdf
Unlike DNA replication in the cell, PCR uses heat to separate DNA st.pdf
 
In situ hybridization methods and techniques course slides Pat Heslop-Harrison
In situ hybridization methods and techniques course slides Pat Heslop-HarrisonIn situ hybridization methods and techniques course slides Pat Heslop-Harrison
In situ hybridization methods and techniques course slides Pat Heslop-Harrison
 
Pcr
Pcr Pcr
Pcr
 
Advances Of Molecular Genetics Of Poultry
Advances Of Molecular Genetics Of PoultryAdvances Of Molecular Genetics Of Poultry
Advances Of Molecular Genetics Of Poultry
 
Real-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe ParkerReal-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe Parker
 
Plant Chromosomes: European Cytogeneticists outline: Trude Schwarzacher and P...
Plant Chromosomes: European Cytogeneticists outline: Trude Schwarzacher and P...Plant Chromosomes: European Cytogeneticists outline: Trude Schwarzacher and P...
Plant Chromosomes: European Cytogeneticists outline: Trude Schwarzacher and P...
 
Polymerase chain reaction (pcr)
Polymerase chain reaction (pcr)Polymerase chain reaction (pcr)
Polymerase chain reaction (pcr)
 
Genome evolution - tales of scales DNA to crops,months to billions of years, ...
Genome evolution - tales of scales DNA to crops,months to billions of years, ...Genome evolution - tales of scales DNA to crops,months to billions of years, ...
Genome evolution - tales of scales DNA to crops,months to billions of years, ...
 
Iplant pag
Iplant pagIplant pag
Iplant pag
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030
 
Msr11.ppt
Msr11.pptMsr11.ppt
Msr11.ppt
 
NAIMA method
NAIMA methodNAIMA method
NAIMA method
 
http://lectures.gersteinlab.org/ppt/Gencode-winter08-20090121-pseudogenes/Gen...
http://lectures.gersteinlab.org/ppt/Gencode-winter08-20090121-pseudogenes/Gen...http://lectures.gersteinlab.org/ppt/Gencode-winter08-20090121-pseudogenes/Gen...
http://lectures.gersteinlab.org/ppt/Gencode-winter08-20090121-pseudogenes/Gen...
 
9739142.ppt
9739142.ppt9739142.ppt
9739142.ppt
 
Sept2016 sv nist_intro
Sept2016 sv nist_introSept2016 sv nist_intro
Sept2016 sv nist_intro
 
molecular basis of inheritance - supernotes.pdf
molecular basis of inheritance - supernotes.pdfmolecular basis of inheritance - supernotes.pdf
molecular basis of inheritance - supernotes.pdf
 
Aug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsAug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomics
 

Dernier

Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 

Dernier (20)

Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 

Bits of the Green Junk

  • 1. Barbados Workshop on the Computational Identification and Analysis of Transposable Elements April 18th - 25th, 2014 Florian Maumus with Hadi Quesneville (URGI-INRA, Versailles, France)
  • 2.
  • 3. REPET package Genome TEdenovo TEannot Repeat annotation Hadi Quesneville
  • 4. De novo repeatome detection Deep repeatome annotation Repeat annotation in large genomes
  • 5. De novo repeatome detection Deep repeatome annotation Repeat annotation in large genomes
  • 6.
  • 7. Repeat complement = Repeatome The Repeatome includes: Transposable elements Endogenous viruses Tandem repeats Ribozymes Genes … 7 = What you get with repeat-finders!
  • 9. Dark matter, the genomic humus « Repeats » Old repeats Dark matter Detected Detectable? Background Noise Burst Decay Melt
  • 10. Complexity of the repeatome Turnover ++ Recent activity +++ Turnover - Recent activity - young old
  • 11. Different history, different challenges Maize 2.3 Gb genome About 85% repeats Human 3.2 Gb genome About 50% repeats
  • 12. LECA: Core eukaryotic genes + Copia, Gypsy, LINEs, DNA transposons… TEs have been jumping around genes over evolutionary times
  • 13.
  • 14. Contents include: Professional Tool Roll Archaeology Margin Trowel Battiferro Leaf & Square Battiferro forged ornamental tools lance Battiferro Trowel and Square Aluminium scale rulers Small Tools Set Hand Shovel Small Brush Mason Line* Line Pegs Line Level Plumb Bob Retractable Hi-Viz Grip Knife Battiferro Trowel* *Optional. Archeology toolbox
  • 15.
  • 16. Repeatome toolbox K-mer strict : Tallymer, DSK K-mer based : RepeatScout, P-clouds Similarity, e.g Recon Combined RepeatModeler (RepeatScout + Recon) TEdenovo (Recon + Piler + Grouper; + RepeatScout in v2.2)
  • 17. REPET: TEdenovo TEdenovo pipeline  Consensus library + RepeatScout (v2.2) REPET Classification utility
  • 18. REPET Classification tool Consensus library TR search Tandem Repeat Finder BLASTx tBLASTx Repbase Pfam hmm GyDB hmm rDNA tRNA Host genes Consensus 1: termLTRs 0,12% TR Bx: AtGypsy; Btx: none profiles: IN, RT LTR retro Consensus 2: none 0,32% TR Bx: none; Btx: none profiles: LRR Host gene Consensus 3: none 0,23% TR Bx: none; Btx: none profiles: none Unclassified Summary of evidences Proposed Classification
  • 19. REPET: TEannot TEannot pipeline  genome annotation
  • 20. TEdenovo Performance, Complementarity ? RepeatScout RepeatModeler
  • 23. Sensitivity & Specificity 0 10 20 30 40 50 Mb Genome coverage
  • 24. All TEdenovo+RS+RM RepeatScout RepeatModeler TEdenovo TRF Tallymer Sensitivity Percent reference coverage 0 10 20 30 40 50 60 70 80 90 100
  • 25. All TEdenovo+RS+RM RepeatScout RepeatModeler TEdenovo Biological Sensitivity Percent 24-nt sRNA coverage (Lister et al., 2008) 0 10 20 30 40 50 60 70 80 90 100
  • 26. TEdenovo RepeatModeler RepeatScout 35 30 25 20 15 10 5 0 Genome coverage increase (%) REPET, RepeatScout, and RepeatModeler employ complementary computational methods that together enable to better represent repeatome complexity.
  • 27. Conclusions I TEdenovo outcompetes RepeatModeler and RepeatScout Greater coverage with Less consensus Larger consensus Larger copies Complementarity of TEdenovo, RepeatModeler and RepeatScout Comprehensive annotation of complex repeatomes
  • 28. De novo repeatome detection Deep repeatome annotation Repeat annotation in large genomes
  • 29. Arabidopsis 120 Mb Experimental model CDS Repeatome Dark matter 0% 100% Three strategies with REPET: Annotate genome with genomic copies Use relaxed parameters for HSP detection Use P-clouds to detect short repeat fragments
  • 30. Iterative annotation Annotate genome with genomic copies (Expand the knowledge)
  • 31. Iterative annotation Annotate genome with genomic copies (Expand the knowledge)
  • 32. Iterative annotation Annotate genome with genomic copies (Expand the knowledge)
  • 33. Iterative annotation Annotate genome with genomic copies Genome TEdenovo Consensus TEannot Genomic copies TEannot Genomic copies TEannot Genomic copies TEannot Genomic copies
  • 34. Genome Reference 24-nt sRNA Tallymer RepeatScout RepeatModeler 0 10 20 30 40 50 60 70 80 90 100 TEdenovo_1 TEdenovo_2 TEdenovo_3 TEdenovo_4 Iterative annotation Annotate genome with genomic copies
  • 35. AA AC AG AT CA CC CG CT GA GC GG TA TC GT TG TT 0,15 0,05 -0,05 CDS TEdenovo delta_2vs1 delta_3vs2 delta_4vs3 Dinucleotide composition
  • 36. Relevance Genome annotation using the delta_2vs1 copies masks as much as 23 Mb (19.5%) of the genome Covers 66% of the reference annotation and 56% of the TEdenovo annotation The supplementary annotations from TEdenovo_2 are highly representative of the A. thaliana repeatome.
  • 39. Relaxed (parameters) annotation Default : Identity > 90%, Evalue<1e-300 Cool : Identity > 85%, Evalue < 1e-50 Soft : Identity > 80%, Evalue < 1e-20 Consensus size
  • 40. Relaxed (parameters) annotation 24 nt sRNA Tallymer Reference RepeatScout RepeatModeler 0 10 20 30 40 50 60 70 80 90 100 TEdenovo_1 TEdenovo_cool TEdenovo_soft TEdenovo_soft_2
  • 41. Copy/consensus identity along chr1 TEdenovo Cool Soft ()
  • 42. Deep annotation of the A. thaliana repeatome RepeatScout RepeatModeler TEdenovo Repbase (+Buisine et al.) Remove redundancy Bundle library TEannot Consensus size
  • 43. Deep annotation of the A. thaliana repeatome selected not selected TEannot P-clouds Complete bundle annotation
  • 44. Copies Consensus P-clouds In-cloud k-mers De Koning et al.
  • 45.
  • 48. • Bundle + P-clouds => Repeated and repeat-derived sequences contribute at least 30% to the A. thaliana genome Enhanced repeat detection in gene-rich regions
  • 49. Arabidopsis repeats browser Genes Buisine et al. RepeatScout RepeatModeler REPET Deep annotations 24-nt sRNA
  • 50. Conclusions II Innovative approaches for deep repeatome annotation About one third of the A. thaliana genome of repetitive origin (vs 24%) Increased sensitivity and detection of old repeat remnants Improved genome evolution and epigenetic analyses Continuum between repeatome and genomic dark matter Time
  • 51. De novo repeatome detection Deep repeatome annotation Repeat annotation in large genomes
  • 52. All genomes should benefit the greater quality of TEdenovo Adapted from Nina V. Fedoroff (2012) and Steven M. Carr
  • 53. Limitations with REPET All-by-all genome comparison => LOTS (Gb) of high scoring pairs (HSPs) HSP files > 1 Gb are not handled by Piler Grouper can last for weeks Impossible to run TEdenovo on whole large and/or highly repeated genomes until recently
  • 54. Solutions Use a sample of whole genome as input for TEdenovo (e.g. 300Mb) (As recommended for RepeatModeler)
  • 55. Tomato genomes S. pennellii : 942 Mb S. lycopersicum : 782 Mb
  • 56. 0 0.5 1 Gb TEdenovo (n HSP >= 5) 320 Mb input Consensus library TEannot
  • 57. Mb 82% of the Solanum pennellii ATGC space masked
  • 58. Conclusions III Efficient annotation of large plant genomes with REPET Still quite a long process !
  • 59. De novo repeat annotation in large genomes Future developments Parallelize Grouper Parallelize the “Long join” procedure Establish phyla-specific approaches Develop strategies to annotate genomes with different composition old, complex repeatomes as compared to large plant genomes
  • 60. De novo repeat annotation in large genomes Future challenges & perspectives Propose TEdenovo and TEannot pipelines on GALAXY Deliver REPET compilation for use on a cloud
  • 61. Véronique Jamilloux Tina Alaeitabar Timothée Chaumier Olivier Inizan Mark Moissette Hadi Quesneville THANK YOU !