1. AdAptA project
Local adaptation in marginal
alpine populations: an integrated perspective
Carlos Lara-Romero
ETH. April 2015.
2. • Alpine environments are highly vulnerable to global warming
•Main response of alpine plants Upward range shifts trancking their current climatic niche
Theoretical background
Paulí et al 2012 Science, Marris 2007 Nature, Dullinger et al 2012 Glob. Ecol Biogeogr, Lara-Romero et al 2014 Plos One
3. • Alpine environments are highly vulnerable to global warming
•Main response of alpine plants Upward range shifts trancking their current climatic niche
•Mediterranean alpine plants Upward migration is not an option (The scalator effect)
Theoretical background
Paulí et al 2012 Science, Marris 2007 Nature, Dullinger et al 2012 Glob. Ecol Biogeogr, Lara-Romero et al 2014 Plos One
4. • Alpine environments are highly vulnerable to global warming
•Main response of alpine plants Upward range shifts trancking their current climatic niche
•Mediterranean alpine plants Upward migration is not an option (The scalator effect)
• Adaptation and phenotypic plasticity are the main response against new environmental
conditions
Theoretical background
Paulí et al 2012 Science, Marris 2007 Nature, Dullinger et al 2012 Glob. Ecol Biogeogr, Lara-Romero et al 2014 Plos One
5. Objectives & Study species
OBJETIVES
[1] To assess the main limitations on reproductive performance of Mediterranean alpine
plants and to test whether local adaptation at small spatial scales has a significant effect on their
fitness.
Silene ciliata Pourret (A Mediterranean alpine specialist)
6. Objectives & Study species
Silene ciliata Pourret (A Mediterranean alpine specialist)
OBJETIVES
[1] To assess the main limitations on reproductive performance of Mediterranean alpine
plants and to test whether local adaptation at small spatial scales has a significant effect on their
success.
7. Silene ciliata Pourret (A Mediterranean alpine specialist)
Results
• Significant variation in vegetative and reproductive traits
between low and high elevations
Giménez-Benavides et al 2007 Anals of Botany, García-Fernández et al 2012 OIKOS, Lara-Romero et al 2014 Plos One
8. Silene ciliata Pourret (A Mediterranean alpine specialist)
Results
• Significant variation in vegetative and reproductive traits
between low and high elevations
• Summer drought Selective pressure at low elevations
P (mm)
T (ºC)
Elevation
Giménez-Benavides et al 2007 Anals of Botany, García-Fernández et al 2012 OIKOS, Lara-Romero et al 2014 Plos One
9. Silene ciliata Pourret (A Mediterranean alpine specialist)
Results
• Significant variation in vegetative and reproductive traits
between low and high elevations
• Summer drought Selective pressure at low elevations
• Seedling establishment Demographic bottleneck
Giménez-Benavides et al 2007 Anals of Botany, García-Fernández et al 2012 OIKOS, Lara-Romero et al 2014 Plos One
P (mm)
T (ºC)
Elevation
10. Silene ciliata Pourret (A Mediterranean alpine specialist)
Results
• Significant variation in vegetative and reproductive traits
between low and high elevations
• Summer drought Selective pressure at low elevations
• Seedling establishment Demographic bottleneck
• Local adaptation at seedling stage Drought tolerance
Giménez-Benavides et al 2007 Anals of Botany, García-Fernández et al 2012 OIKOS, Lara-Romero et al 2014 Plos One
11. Objectives
Prof. Alex Widmer Dr. Niklaus Zemp
OBJETIVES
[1] To assess the main limitations on reproductive performance of Mediterranean alpine
plants and to test whether local adaptation at small spatial scales has a significant effect on their
fitness.
[2] To identify genes expressed during the development of S. ciliata seedlings and select
candidate genes that may be involved in adaptation processes.
12. Mountain 3
Mountain 2Mountain 1
Transcriptome comparisons between high and low populations during the seedling stage
Genomic data
6 seedlings
3 High vs 3 Low
1 seedling per population (n = 6)
13. RNA extraction and
Illumina sequencing
Seed collection &
Greenhouse sowing
Work flow. Genomic data
Reference-based
transcriptome assembly
BWA
Silene latifolia Reference Genome
T G T C G G T C T
T G T C G G T C T
T G T C A G T C T
T G T C A G T C T
SNP calling – Reads2SNP
High
Low
Differential expression
Candidate Genes
Candidate Genes
High
Low
Functional annotation
&
Enrichment analysis
14. RNA extraction and
Illumina sequencing
Seed collection &
Greenhouse sowing
Work flow. Genomic data
Reference-based
transcriptome assembly
BWA
Silene latifolia Reference Genome
T G T C G G T C T
T G T C G G T C T
T G T C A G T C T
T G T C A G T C T
SNP calling – Reads2SNP
High
Low
Differential expression
Candidate Genes
Candidate Genes
Optimal
Marginal
Functional annotation
&
Enrichment analysis
The novo
transcriptome
assembly
15. RNA extraction and
Illumina sequencing
Seed collection &
Greenhouse sowing
Work flow. Genomic data
Reference-based
transcriptome assembly
BWA
Silene latifolia Reference Genome
T G T C G G T C T
T G T C G G T C T
T G T C A G T C T
T G T C A G T C T
SNP calling – Reads2SNP
High
Low
Differential expression
Candidate Genes
Candidate Genes
High
Low
Functional annotation
&
Enrichment analysis
16. Genomic data
Pilot study
Study design (n=6) limits detection of outlier SNPs
Impossibility of implementing classical approaches
(e.g., pairwise Fst)
How can candidate genes be detected based on single
individual per population?
18. Differential expression analysis
129 contigs differentially expressed
GO term & Enrichment analysis
• 114 contigs annotated
• Response to extracellular stimulus
(n=9) & external stimulus (n=19)
overrepresented
Comparison of expression levels (RPKM) between high and low elevations
RPKM (Reads per kilobase per million mapped reads)
19. SNP calling & outlier detection
Reads2SNP
• 7 reads needed to infer genotype
• Deletion of paralogous SNPs
• Biallelic SNPs with no missing data
• Depth of coverage and posterior probability did not affect outlier detection.
147 118 SNPs & 12 688 contigs
(mean =13.7)
20. SNP calling & outlier detection
Reads2SNP
• 7 reads needed to infer genotype
• Deletion of paralogous SNPs
• Biallelic SNPs with no missing data
• Depth of coverage and posterior probability did not affect outlier detection.
147 118 SNPs & 12 688 contigs
(mean =13.7)
Strategies for selection of candidate genes
[1] Contingency table and Pearson’s Chi-square test (X2)
[2] Dispersal parameter (m, Muller et al 2010 Evolutionary Applications)
[3] Allelic frequency differentials (AFDs)
21. SNP calling & outlier detection
High Low Expected
A1 14 3 9
A2 4 15 9
Contingency table and Pearson’s Chi-square test (X2)
A1 A1 A1 A1 A1 A1 Plant #1 2 400 m
A1 A1 A2 A1 A1 A1 Plant #2 2 370 m
A1 A2 A1 A1 A1 A2 Plant #3 2 450 m
A2 A2 A2 A2 A2 A2 Plant #4 1 750 m
A2 A2 A2 A1 A1 A2 Plant #5 1 650 m
A1 A2 A2 A2 A2 A2 Plant #6
Gene i with 3 SNPs
SNP #1 SNP #2 SNP #3 Environmental variable
High
Low
22. SNP calling & outlier detection
Selection Candidate genes
• Outlier: p value < 0.05 after FDR correction
• 646 genes (contigs) selected
• Enrichment analysis (GO-Term - Biolog. processes)
• Single-organism metabolic processes (n = 155)
Contingency table and Pearson’s Chi-square test (X2)
A1 A1 A1 A1 A1 A1 Plant #1 2 400 m
A1 A1 A2 A1 A1 A1 Plant #2 2 370 m
A1 A2 A1 A1 A1 A2 Plant #3 2 450 m
A2 A2 A2 A2 A2 A2 Plant #4 1 750 m
A2 A2 A2 A1 A1 A2 Plant #5 1 650 m
A1 A2 A2 A2 A2 A2 Plant #6
Gene i with 3 SNPs
SNP #1 SNP #2 SNP #3 Environmental variable
High
Low
High Low Expected
A1 14 3 9
A2 4 15 9
23. A1 A1 A1 A1 A1 A1 Plant #1 2 400 m
A1 A1 A2 A1 A1 A1 Plant #2 2 370 m
A1 A2 A1 A1 A1 A2 Plant #3 2 450 m
A2 A2 A2 A2 A2 A2 Plant #4 1 750 m
A2 A2 A2 A1 A1 A2 Plant #5 1 650 m
A1 A2 A2 A2 A2 A2 Plant #6 1 900 m
Gene i with 3 SNPs
SNP #1 SNP #2 SNP #3 Environmental variable
SNP calling & outlier detection
Dispersal parameter (mx)
Muller et al 2010 Evolutionary Applications
High
Low
24. A1 A1 A1 A1 A1 A1 Plant #1 2 400 m
A1 A1 A2 A1 A1 A1 Plant #2 2 370 m
A1 A2 A1 A1 A1 A2 Plant #3 2 450 m
A2 A2 A2 A2 A2 A2 Plant #4 1 750 m
A2 A2 A2 A1 A1 A2 Plant #5 1 650 m
A1 A2 A2 A2 A2 A2 Plant #6 1 900 m
Gene i with 3 SNPs
SNP #1 SNP #2 SNP #3 Environmental variable
SNP calling & outlier detection
Dispersal parameter (mx)
Muller et al 2010 Evolutionary Applications
High
Low
25. SNP calling & outlier detection
A2
A2 A2
A2
High
Low
β
β = 1937.5 m
Muller et al 2010 Evolutionary Applications
Dispersal parameter (mx)
A1 A1 A1 A1 A1 A1 Plant #1 2 400 m
A1 A1 A2 A1 A1 A1 Plant #2 2 370 m
A1 A2 A1 A1 A1 A2 Plant #3 2 450 m
A2 A2 A2 A2 A2 A2 Plant #4 1 750 m
A2 A2 A2 A1 A1 A2 Plant #5 1 650 m
A1 A2 A2 A2 A2 A2 Plant #6 1 900 m
Gene i with 3 SNPs
SNP #1 SNP #2 SNP #3 Environmental variable
High
Low
26. SNP calling & outlier detection
A2
A2 A2
A2
β
mi1
mi2
mi3
mi4
Selection Candidate genes
• Dispersion of each allele ( mx ) Average distance of the allele to β
Muller et al 2010 Evolutionary Applications
Dispersal parameter (mx)
A1 A1 A1 A1 A1 A1 Plant #1 2 400 m
A1 A1 A2 A1 A1 A1 Plant #2 2 370 m
A1 A2 A1 A1 A1 A2 Plant #3 2 450 m
A2 A2 A2 A2 A2 A2 Plant #4 1 750 m
A2 A2 A2 A1 A1 A2 Plant #5 1 650 m
A1 A2 A2 A2 A2 A2 Plant #6 1 900 m
Gene i with 3 SNPs
SNP #1 SNP #2 SNP #3 Environmental variable
High
Low
High
Low
27. SNP calling & outlier detection
A2 A2
A2 A2
β mi1
mi2
mi3
mi4
Selection Candidate genes
• Dispersion of each allele ( mx ) Average distance of the allele to β
• Outlier: permutations to detect alleles more geographically clustered
than expected at random
Muller et al 2010 Evolutionary Applications
Dispersal parameter (mx)
A1 A1 A1 A1 A1 A1 Plant #1 2 400 m
A1 A1 A2 A1 A1 A1 Plant #2 2 370 m
A1 A2 A1 A1 A1 A2 Plant #3 2 450 m
A2 A2 A2 A2 A2 A2 Plant #4 1 750 m
A2 A2 A2 A1 A1 A2 Plant #5 1 650 m
A1 A2 A2 A2 A2 A2 Plant #6 1 900 m
Gene i with 3 SNPs
SNP #1 SNP #2 SNP #3 Environmental variable
High
Low
High
Low
28. SNP calling & outlier detection
A2 A2
A2 A2
β mi1
mi2
mi3
mi4
Selection Candidate genes
• Dispersion of each allele ( mx ) Average distance of the allele to β
• Outlier: permutations to detect alleles more geographically clustered
than expected at random
• 486 candidate genes
• Enrichment analysis (Biolog. process)
• Lipid metabolic process (n = 53)
• Single-organism metabolic processes (n = 59)
• Generation of precursor metabolites and energy (n = 31)
Muller et al 2010 Evolutionary Applications
Dispersal parameter (mx)
A1 A1 A1 A1 A1 A1 Plant #1 2 400 m
A1 A1 A2 A1 A1 A1 Plant #2 2 370 m
A1 A2 A1 A1 A1 A2 Plant #3 2 450 m
A2 A2 A2 A2 A2 A2 Plant #4 1 750 m
A2 A2 A2 A1 A1 A2 Plant #5 1 650 m
A1 A2 A2 A2 A2 A2 Plant #6 1 900 m
Gene i with 3 SNPs
SNP #1 SNP #2 SNP #3 Environmental variable
High
Low
High
Low
29. SNP calling & outlier detection
Minor allele frequency differentials (AFDs) between high and low elevations
AFD
1 0.5 0 0.5
1
Frequency
Turner et al 2010 Nature; Stölting et al 2015 New Phytologist
30. SNP calling & outlier detection
AFD
-3 -2 -1 0 +1 +2 +3
Frequency
Selection Candidate genes
• Outlier: AFDs > 3 SDs the genome-wide average (p-value < 0.001)
• 1222 SNPS & 419 candidate genes
• Enrichment analysis (Biolog. process)
• Carbohydrate metabolic process
Turner et al 2010 Nature; Stölting et al 2015, New Phytologist
Minor allele frequency differentials (AFDs) between high and low elevations
31. SNP calling & outlier detection
336
20
606
124
6
13
275
Dispersal param. Allele freq.
AFD
SNP overlap among different selection approaches
Venn diagrams showing the extent of overlap among selection
approaches based on allele frequencies
6 genes overlapped among three approaches
GO TERM: response to stress & metabolic process
163 genes overlapped among two approaches
• 143 annotated genes
• Enrichment analysis (before FDR correction)
- Response to abiotic stimulus (n = 53)
- Response to stress (n = 59)
- Several additional terms related to metabolic
processes and response to stimulus
32. Thanks for your attention
Prof. Jose M. Iriondo
Group leader
Javier Morente-López
Ph.D student
Luisa Rubio
Ph.D student
Dr. Alfredo García-Fernández
Editor's Notes
In my talk I will give a breif descripton of the project I am working on and I will also show you some of my work during my stay at ETH with a particularly relevant question
My work at ETH is part of ADAPTA project which aims to provide an integrated perspective on the local adaptation in margnal alpine populations
First of all let me provides you some of the theoretical background behind the project.
High-mountain plant species are among the organisms considered to be especially vulnerable to global warming
The main response of alpine plants to the new climatic conditions appears to be upward range shifts tracking their current climatic niche.
The critical point is that upward migration is not an option for mediterranean alpine platns because species already inhabit summit areas. And the dispersal limitations of these species constrains latitudinal migration.
THIS MEANS THAT phenotypic plasticity and/or genetic change through an adaptive evolutionary process have to be the main response…
Taking S. ciliata as study species the project aimed to…
The species is a mediterranean alpine specialist distributed across the north mediterranean basin (beisin)
Our study relies(relais) in populations of central Spain, which have a common evolutionary history.
I have not time to discuss in depth this part of the project. But I am going to show some relevant results for my talk.
Previous studies on the demography of the species identified significant differences in ecologically relevants traits related to reproductive performance and vegetative grow.
In other handthe elevational gradient is associated with an environmental stress gradient, with the lowest population experiencing the most stressful conditions, constraining seedling establishment….
that seems to be the main demographic bottleneck of the species.
Common garden experiments, however, showed evidence for local adaptation in seed germination and seedling survival in these low-edge populations …
…. And some experiments in controlled conditions suggest that this adaptation could be related with drought tolerance
With this in mind, our second objetive is to elucidate the genetic basis of these repsonses. More specifically we aimed to… LEER
We lack of knowledge about NGS and genome-wide asoseison studies. But lakili we have been able to stablish a colaboration with ETH and particularly whit Alex and Nik that are helping us in our first steps in this topic.
We aims to perform trasncriptome comparisons between high and low elevations during the seedling stage.
At the moment, we have performed the massive sequencing of the transcriptome of 6 seedlings grown under controlled conditions (one for each of 6 study population: 1 located at high elevaation and 1 at low elevation in three mountains).
High elevations are meant to represent the environmental optimum for the species, whereas low elevation represent marginal environmental conditions (warmer and drier).
At the moment we are following a reference-based transcriptome assembly.
but, we are also working in the novo transcriptome assembly that we will have ready in the next months.
Transcriptome analysis are involving the identification of polymorphisms and diferential expression levels in candidate genes between indivduals from high and low elevation.
We expect to find some candidate genes related with responses to abiotic stimulus, particularly drought stress.
From the sample size you can infer that our wok is currently in a very incipient stay. Our aim at this point is to identify some good candidate genes to be used in subsequent steps.
Question…
We have implemented some alternative strategies for selection of candidate genes.
Now I will to briefly explain these strategies.
Regarding differncial expression analysis in thefigure is showed the comparison of expression levels between high and low elevations
We estimated the mean RPKM per contig and per elevation
one hundred and twentz nine genes were differentiallz expressed between elevations. This genes are represented by Red and blue circles in the figure
We then performed an enrichment analysis to find which Biological proccess were over-represened in this set of candidate genes detecting two terms overrepresented compared with all genes covered by the refence-based assembly.
Namely,
Expression estimation for transcripts. A) Left compasion of RPKM (Reads per kilobase per millino mapped reads) among elevations. B) Distirbution of RPKM differntials obsered between igh and low elevaitons.
We used Reads2SNP for SNP calling. After several filtering process we identify about one hundred and fifty thousand of SNPS distribured in twelve thousand eigth hundred contis. With an average of fourteen SNPS per contig.
We have implemented three alternative strategies for selection of candidate genes based on frequency distribution of the SNPS Within and between elevations.
In the first approach we used the frequency distribion of the alleses to construct contingency tables.
In the table on the top I shows an example for a gene with 3 SNPs.
The distribution of the first and second allele in the six plants is showed in colums. With this information we can construct a contingency table with the observed frequency of the First and second allele in high and low populations.
In order to select candaite genes we applied a Persons’s chi square test to identify genes with a non-random distirbution of the alleles among elevations.
Using this framework, we detected 646 candidate genes. Enrichment analysis detected only one significantly enriched process
The second approach is based on the dispersal parameter previously proposed to detecting geographical clustering between individual alleles.
Instead of use the geographical location of each plant as in previous stuides, we used the elevation of each population.
To explain this approach Let’s consider one example with the second allele of the first SNPs.
First, for each allele, we computed mean elevation or barycentre.
Then we estimated the average distance of the allele to the barycentre: this value is called the dispersion of an allele or dispersal parameter.
Singletons were excluded from the data set this means that we only considered alleles present in at least two populations.
For the selection of candidates genes We used permutations to detect alleles more geographically clustered than expected at random.
With this approximation we selected four hundred eighty six genes that were enriched for three GO terms. Namely…
In the third approach we estimated minor allele frequency differentials between elevations
An SNP were considered outlier if AFDs were > 3 SDs the genome-wide average.
We detected 1 222 highly diffrenteiated SNPS between high and low elevations that were distributed in 419 candidate genes enriched for carbohydrate metabolic process.
Before I finish I would like show you this venn diagram showing…LEER
6 genes overlapped among all approaches and 163 among at least two approaches. Enrichment analysis detected enrichment for biological processes related with response to stress and biotic stimulus and several aditional terms realted to metabolic processes.
False discovery rate correction
Which estimates are more appropiate?
Which additional estimates should be investigated?