Ecological and Biogeographical Approaches to Transposable Element and GO term interaction in arthropod genomes
1. Are transposable elements associated with
particular gene functions?
Kristian Brevik1
, Ania Muszewska2
, Benjamin Pelissie3
, Sean Schoville3
, Yolanda Chen1
1) Department of Plant and Soil Science, University of Vermont
2) Institute of Biochemistry and Biophysics, Polish Academy of Sciences
3) Department of Entomology, University of Wisconsin-Madison
2. 1. Genome
Assembly
2. Gene Prediction
3. Gene Ontology
Mapping
4. Transposable
Element Detection
5. Interaction
Detection
6. Network
Analysis
- Transposable Elements (TEs) are mobile DNA elements that
and move and copy themselves within genomes
- distributed non-randomly
- Different types of TE have different distributions
Bos taurus,
Saylor et al. 2013
Zea mays, He et al. 2013
3. 1. Genome
Assembly
2. Gene Prediction
3. Gene Ontology
Mapping
4. Transposable
Element Detection
5. Interaction
Detection
6. Network
Analysis
- Why do TE distributions vary?
- Where they insert (near genes, not near genes, etc)
- How error prone they are
- How they transpose
- RNA TEs
- DNA TEs
- What types of mutations they can
generate
4. 1. Genome
Assembly
2. Gene Prediction
3. Gene Ontology
Mapping
4. Transposable
Element Detection
5. Interaction
Detection
6. Network
Analysis
TEs were initially named “controlling elements” by McClintock in the
1940s - but now many genome analyses remove them
What about other interaction types - including neutral, commensal,
and cascade interactions?
“TEs are parasitic, selfish junk,
which occasionally may be
functional (mutualism)”
5. 1. Genome
Assembly
2. Gene Prediction
3. Gene Ontology
Mapping
4. Transposable
Element Detection
5. Interaction
Detection
6. Network
Analysis
How about adapting ecological/biogeographical
methods?
the spatial distribution and abundance of transposable elements within
the genome and their interactions with other elements
Fits between “genome %’s” and TE-specific functional analysis
6. 1. Genome
Assembly
2. Gene Prediction
3. Gene Ontology
Mapping
4. Transposable
Element Detection
5. Interaction
Detection
6. Network
Analysis
What will these methods reveal?
- How TEs are distributed throughout the genome and their relative abundances
- Comparisons between species - diversity and abundance measures
- Connectance
- Nestedness
- Shannon Diversity
7. 1. Genome
Assembly
2. Gene Prediction
3. Gene Ontology
Mapping
4. Transposable
Element Detection
5. Interaction
Detection
6. Network
Analysis
Genome
A
Genome
B
Genome
C
Overview of Methods
8. 1. Genome
Assembly
2. Gene Prediction
3. Gene Ontology
Mapping
4. Transposable
Element Detection
5. Interaction
Detection
6. Network
Analysis
Genome
A
Genome
B
Genome
C
Overview of Methods
Gene
Predictions
A
Gene
Predictions
B
Gene
Predictions
C
9. 1. Genome
Assembly
2. Gene Prediction
3. Gene Ontology
Mapping
4. Transposable
Element Detection
5. Interaction
Detection
6. Network
Analysis
Genome
A
Genome
B
Genome
C
Overview of Methods
Gene
Predictions
A
Gene
Predictions
B
Gene
Predictions
C
Ontologies
A
Ontologies
B
Ontologies
C
Ontologies are a
way to classify
(putative) gene
functions
10. 1. Genome
Assembly
2. Gene Prediction
3. Gene Ontology
Mapping
4. Transposable
Element Detection
5. Interaction
Detection
6. Network
Analysis
Genome
A
Genome
B
Genome
C
Overview of Methods
Gene
Predictions
A
Gene
Predictions
B
Gene
Predictions
C
Ontologies
A
Ontologies
B
Ontologies
C
Detection:
All Genomes Together
11. 1. Genome
Assembly
2. Gene Prediction
3. Gene Ontology
Mapping
4. Transposable
Element Detection
5. Interaction
Detection
6. Network
Analysis
Genome
A
Genome
B
Genome
C
Overview of Methods
Gene
Predictions
A
Gene
Predictions
B
Gene
Predictions
C
Ontologies
A
Ontologies
B
Ontologies
C
TEs A
TEs B
TEs C
Detection:
All Genomes Together
12. 1. Genome
Assembly
2. Gene Prediction
3. Gene Ontology
Mapping
4. Transposable
Element Detection
5. Interaction
Detection
6. Network
Analysis
Genome
A
Genome
B
Genome
C
Overview of Methods
Gene
Predictions
A
Gene
Predictions
B
Gene
Predictions
C
Ontologies
A
Ontologies
B
Ontologies
C
TEs A
TEs B
TEs C
Detection:
All Genomes Together
Interactions
A
Interactions
B
Interactions
C
13. 1. Genome
Assembly
2. Gene Prediction
3. Gene Ontology
Mapping
4. Transposable
Element Detection
5. Interaction
Detection
6. Network
Analysis
Genome
A
Genome
B
Genome
C
Overview of Methods
Gene
Predictions
A
Gene
Predictions
B
Gene
Predictions
C
Ontologies
A
Ontologies
B
Ontologies
C
TEs A
TEs B
TEs C
Detection:
All Genomes Together
Interactions
A
Interactions
B
Interactions
C
Comparisons
within and
between
species
Ecological
Metrics
15. 1. Genome
Assembly
2. Gene Prediction
3. Gene Ontology
Mapping
4. Transposable
Element Detection
5. Interaction
Detection
6. Network
Analysis
Manual Annotations are biased towards what is interesting, so we use a computational
approach:
Method: Maker
{i5k
16. 1. Genome
Assembly
2. Gene Prediction
3. Gene Ontology
Mapping
4. Transposable
Element Detection
5. Interaction
Detection
6. Network
Analysis
Focus on the function/role of genes:
We used PFAM to find the Gene Ontology terms associated with predicted genes
1404 GO terms found
(out of ~45,000 in databases)
Method: Hmmer2GO
17. 1. Genome
Assembly
2. Gene Prediction
3. Gene Ontology
Mapping
4. Transposable
Element Detection
5. Interaction
Detection
6. Network
Analysis
Transposable Elements vary between species
18. 1. Genome
Assembly
2. Gene Prediction
3. Gene Ontology
Mapping
4. Transposable
Element Detection
5. Interaction
Detection
6. Network
Analysis
Method: RepeatModeler (run on all genomes simultaneously), RepeatMasker
19. 1. Genome
Assembly
2. Gene Prediction
3. Gene Ontology
Mapping
4. Transposable
Element Detection
5. Interaction
Detection
6. Network
Analysis
Assumption of what constituted an
“interaction” between genes and TEs
- Full gene + 2000 basepairs on either side
- capture regulatory regions, exons
Method: Bedtools “slop”
Feschotte et al, Nature Reviews Genetics 9, 397-405 (May 2008)
Transposable Element
20. 1. Genome
Assembly
2. Gene Prediction
3. Gene Ontology
Mapping
4. Transposable
Element Detection
5. Interaction
Detection
6. Network
Analysis
There are tens of thousands of interactions between
genes and TEs in each species Intersections Species
16666 Hyalella azteca
19785 Frankliniella occidentalis
30284 Homalodisca vitripennis
32703 Pachypsylla venusta
34527 Copidosoma floridanum
37403 Gerris buenoi
46281 Anoplophora glabripennis
47667 Cimex lectularius
49201 Leptinotarsa decemlineata
65444 Eurytemora affinis
97492 Blattella germanica
21. 1. Genome
Assembly
2. Gene Prediction
3. Gene Ontology
Mapping
4. Transposable
Element Detection
5. Interaction
Detection
6. Network
Analysis
Shannon Diversity (takes into account number of TEs and their evenness): 1.15-1.99
Species Shannon Diversity
Anoplophora glabripennis 1.7330207
Bactrocera dorsalis 1.9955195
Blattella germanica 1.7062933
Cimex lectularius 1.9012871
Copidosoma floridanum 1.6126607
Eurytemora affinis 1.2553917
Frankliniella occidentalis 1.1525005
Gerris buenoi 1.6512879
Homalodisca vitripennis 1.5631955
Hyalella azteca 1.5241026
Leptinotarsa decemlineata 1.9160786
Pachypsylla venusta 1.6212771
22. 1. Genome
Assembly
2. Gene Prediction
3. Gene Ontology
Mapping
4. Transposable
Element Detection
5. Interaction
Detection
6. Network
Analysis
Shannon Diversity (takes into account number of TEs and their evenness): 1.15-1.99
23. 1. Genome
Assembly
2. Gene Prediction
3. Gene Ontology
Mapping
4. Transposable
Element Detection
5. Interaction
Detection
6. Network
Analysis
For network analysis, we looked at how many TEs were found in each GO term “habitat”
GO terms as “habitats” and
TEs as “species”
Bipartite Network:
No TE-TE or Gene-Gene
interactions
(GOtermsare~genefunctions)
25. 1. Genome
Assembly
2. Gene Prediction
3. Gene Ontology
Mapping
4. Transposable
Element Detection
5. Interaction
Detection
6. Network
Analysis
● Bipartite Network (analogous to pollinator/plant)
● Connectance: Number of connections which are realized
Species Connectance
Leptinotarsa decemlineata 0.147
Blattella germanica 0.14
Pachypsylla venusta 0.132
Homalodisca vitripennis 0.0988
Eurytemora affinis 0.0965
Hyalella azteca 0.0936
Gerris buenoi 0.0849
Anoplophora glabripennis 0.0847
Copidosoma floridanum 0.075
Cimex lectularius 0.075
Bactrocera dorsalis 0.0676
Frankliniella dorsalis 0.0495
26. 1. Genome
Assembly
2. Gene Prediction
3. Gene Ontology
Mapping
4. Transposable
Element Detection
5. Interaction
Detection
6. Network
Analysis
Results: Which transposable elements are found in certain “GO term habitats”?
- Standardized by the number of basepairs of each GO term (summed genes)
~148 significant interactions shared between species (z-test, Bonferroni corrected)
23443 without correction
27. 1. Genome
Assembly
2. Gene Prediction
3. Gene Ontology
Mapping
4. Transposable
Element Detection
5. Interaction
Detection
6. Network
Analysis
Interactions shared between the most species
29. 1. Genome
Assembly
2. Gene Prediction
3. Gene Ontology
Mapping
4. Transposable
Element Detection
5. Interaction
Detection
6. Network
Analysis
Nestedness:
Can show if a GO terms contain
subsets of TEs from more
‘populated” GO terms,
Or
If GO terms are “specialized”
habitats
Method: WNODF in falcon (R)
← Habitats →
←TEs/Species→
32. 1. Genome
Assembly
2. Gene Prediction
3. Gene Ontology
Mapping
4. Transposable
Element Detection
5. Interaction
Detection
6. Network
Analysis
Summary/Next Steps
● There are differences between species
○ Nestedness (generalist TEs vs specialist TEs)
○ Overall Shannon Diversity
○ Connectance
● Expand and refine
○ Additional species
○ Finer-grained analyses
○ ...
33. 1. Genome
Assembly
2. Gene Prediction
3. Gene Ontology
Mapping
4. Transposable
Element Detection
5. Interaction
Detection
6. Network
Analysis
As each step is refined,
this analysis will (hopefully!) improve
- sequencing
- assembly methods
- even the human
genome still has gaps!
- which program(s)?
- include manual annotation?
- OrthoDB?
- all genes? Conserved
regions?
- which program(s)?
- need ‘ground-truthing’
- all genomes
simultaneously or
separately then merge?
-filtering?
- co-occurance = interaction?
- What is biologically relevant?
- How many KB (5 up, 2 down)?
- 3D distance
- motif preference
- epigenetics (histone mods!)
- chromatin formation
- which program?
- GOSLIM?
- ~48,000 GO terms
- always changing
- which program?
- What metrics are useful?
- ‘genome ecology’
- bipartite or not?
- interactions between genes
- motif preferences? (CCTGG, etc)
- null hypothesis for TE distributions
-
Other Stuff
34. Assembly Gene Prediction
Transposable
Element Detection
Intersections Gene Ontology Network Analysis
Acknowledgements
1. Sean Schoville, Benjamin Pelisse, Michael Crossley, Zachary Cohen
2. Yolanda Chen, Chase Stratton, Elisabeth Hodgdon, Jorge Ruiz-Arocho, Paolo Filho, Andrea Swan, Sean Quigley
3. Stephanie McKay
4. i5k
5. All the genome project leads!
Funding: UVM USDA Hatch Grant
35. 1. Genome
Assembly
2. Gene Prediction
3. Gene Ontology
Mapping
4. Transposable
Element Detection
5. Interaction
Detection
6. Network
Analysis
Questions?
- sequencing
- assembly methods
- even the human
genome still has gaps!
- which program(s)?
- include manual annotation?
- OrthoDB?
- all genes? Conserved
regions?
- which program(s)?
- need ‘ground-truthing’
- all genomes
simultaneously or
separately then merge?
-filtering?
- co-occurance = interaction?
- What is biologically relevant?
- How many KB (5 up, 2 down)?
- 3D distance
- motif preference
- epigenetics (histone mods!)
- chromatin formation
- which program?
- GOSLIM?
- ~48,000 GO terms
- always changing
- which program?
- What metrics are useful?
- ‘genome ecology’
- bipartite or not?
- interactions between genes
- motif preferences? (CCTGG, etc)
- null hypothesis for TE distributions
-
Other Stuff