Pests of castor_Binomics_Identification_Dr.UPR.pdf
Early generation selection in an intra population recurrent selection breeding program within a synthetic population
1. Early generation selection in a recurrent selection
breeding program within a synthetic population
–
Using genomewide markers to speed-up the process
Seminar on genomic selection 17/10/2014
Tuong-Vi Cao, UMR AGAP, CIRAD-BIOS
2. Genomic selection based on genome-wide genotype-phenotype
relations is a promising approach for
breeding :
1. to access more selection candidates (higher
intensity of selection) and
2. to reduce the duration of selection cycles (maximize
genetic gain/unit time)
This is even more interesting since molecular
information is becoming more accessible while
phenotypic information is becoming limiting, in terms of
resources allocation.
3. The upland rice breeding program of CIAT initiated this
approach and first results based on cross-validation within
calibration population data) showed that it is possible to
use such an approach but the accuracy is rather low
globally. Some reasons are already stressed (only one
year*location evaluation, only additive effects modelled).
4. My present contribution is about :
1. the way phenotypic predictor may be defined and
modelled to take into account dominance and
epistatic interactions, and
2. the way to integrate markers to reduce further the
duration of selection cycles.
5. What has been done and what is the question ?
[Ms Ms]
1/4
EEP 2010
Seed increase
through SSD
[Ms ms]
1/2
[ms ms]
1/4
392 S1:2 progenies segregating
for ms gene
EEP 2010
S0:1 progenies segregating :
¼ [Ms Ms] + ½ [Ms ms] + ¼ [ms ms]
pl4
pl2 pl3 pl1
S1:2
DNA extraction of 8 S1:2
plants and genotyping
for ms locus
EEP 2011 A
EELL 2008
Four synthetic populations
segregating for ms gene :
½ [ms ms] + ½ [ms Ms]
MS MF
PCT-4A PCT-4C PCT-11
MS MF
PCT-4B
S0:1
Extraction of 100 S0:1
progenies per population
on MF plants
6. What has been done and what is the question ?
[Ms Ms]
1/4
EEP 2010
Seed increase
through SSD
[Ms ms]
1/2
[ms ms]
1/4
392 S1:progenies segregating
for ms gene
EEP 2010
S0:1 progenies segregating :
¼ [Ms Ms] + ½ [Ms ms] + ¼ [ms ms]
pl4
pl2 pl3 pl1
S1:2
DNA extraction of 8 S1:2
plants and genotyping
for ms locus
EEP 2011 A
EELL 2008
Four synthetic populations
segregating for ms gene :
½ [ms ms] + ½ [ms Ms]
MS MF
PCT-4A PCT-4C PCT-11
MS MF
PCT-4B
S0:1
Extraction of 100 S0:1
progenies per population
on MF plants
2
7. What has been done and what is the question ?
[Ms Ms]
1/4
[Ms ms]
1/2
[ms ms]
1/4
392 S1:2 progenies segregating
for ms genes
S2:3
S2:3
Phenotyping S2:4
Bulk seed
increase
S2:3
S2:3
DNA extraction of 15 S2:3
plants per progeny
Choice of one [Ms Ms] plant
per S1:2 progeny to constitute
the calibration population.
GBS genotyping to infer the genotype of
S2 plants
Phenotyping of S2:4 progenies to
calibrate the model
Bulk seed
increase
8. What has been done and what is the question ?
• The S2 population as the base population structure for
calibration is an option because a partially fixed material:
– is more homogenous and easier to phenotype (minimum intra-progeny
variation and maximum between progeny variation)
– minimizes the bias due to dominance effects.
• However, it is time and resources consuming :
– to produce material to calibrate the prediction model (S2
population to be sampled, S2:3 bulks to be genotyped, S2:4
progenies to be phenotyped)
– to produce the breeding material until S2 generation before
being predicted in each cycle.
• Hence, is it possible to save time & resources through :
– Early phenotyping for calibrating the model ?
– Early prediction of breeding candidates ?
9. Genetic model
• For simplicity, let us suppose two biallelic loci M and N,
• Let M N
i k
be a genotype in S0 generation,
M N
j l
• The genotypic value is
A
A
G S 0
D
A
AA
AA
D
AD
DD
ijkl
jkl
AD
AD
ikl
AD
ijl
ijk
jk
AA
il
jl
AA
ik
kl
ij
l
k
j
A
i
ijkl
Additive effects associated with alleles i or j of M
locus and alleles k or l of N locus
Dominance effects associated with M and N loci
respectively
Additive*additive epistasis associated with one
allele of M locus and one allele of N locus
Additive*dominance epistasis associated with 2
alleles of first locus and 1 allele of second locus
Dominance*dominance epistasis associated
with all alleles
10. Genetic model
• At meiosis, the genotype produces four gametes with
frequencies depending on the recombination rate r,
• If selfed, the genotype produces ten genotypes in the S1
generation …
Gametes and their respective frequencies
k iNM
1 r l iNM
2
r M j Nk
2
r M j Nl
2
1 r
2
Gametes and their
respective
frequencies
MiNk
1 r Giikk Giikl Gijkk Gijkl
2
MiNl
r
2
Giikl Giill Gijkl Gijll
M j Nk
r
2
Gijkk Gijkl Gjjkk Gjjkl
M j Nl
1 r
2
Gijkl Gijll Gjjkl Gjjll
Genotypic value / Genotype
11. Genetic model
• With respective frequencies shown below :
Genotype Frequency
Giikk ¼ (1-r)²
Gjjll ¼ (1-r)²
Giill ¼ r²
Gjjkk ¼ r²
Gijkl ½ (1-r)²
Gijkl ½ r²
Giikl ½ r (1-r)
Gijkk ½ r (1-r)
Gijll ½ r (1-r)
Gjjkl ½ r (1-r)
Non recombinant double homozygote genotypes
Recombinant double homozygote genotypes
Non recombinant double heterozygote genotype
Recombinant double heterozygote genotype
Partially recombinant genotypes, homozygote for
one locus and heterozygote for the other locus
12. Genetic components of generation means
• The frequencies form a vector, V1, associated with the S1
generation :
Genotype Frequency
Giikk ¼ (1-r)²
Gjjll ¼ (1-r)²
Giill ¼ r²
Gjjkk ¼ r²
Gijkl V1=
½ (1-r)²
Gijkl ½ r²
Giikl ½ r (1-r)
Gijkk ½ r (1-r)
Gijll ½ r (1-r)
Gjjkl ½ r (1-r)
If V2 is the vector of frequencies of the
S2 generation, then one can find the
relationship between V1 and V2 …
13. Genetic components of generation means
• This relation is V2 = M*V1
• It holds for any couple of
successive generations
(Vn+1=M*Vn).
• M matrix is used to
estimate genotypic
values and genetic
covariances between
successive generations.
r r
1 0 0 0 (1 )² ² 0 0
1
1
r r
0 1 0 0 (1
)² ² 0 0
r r
0 0 1 0 ² (1
)² 0 0
r r
0 0 0 1 ² (1
)² 0 0
r r
0 0 0 0 (1
)² ² 0 0 0 0
r r
0 0 0 0 ² (1
)² 0 0 0 0
r r r r
0 0 0 0 (1 ) (1
) 0 0 0
r r r r
0 0 0 0 (1 ) (1
) 0 0 0
r r r r
0 0 0 0 (1 ) (1 ) 0 0 0
1
2
1
1
1
1
1
1
1
r r r r
2
1
1
1
1
1
1
1
2
1
1
1
2
2
2
1
1
1
2
2
2
1
1
1
2
2
2
2
1
1
1
2
1
1
1
2
2
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
0 0 0 0 (1 ) (1 ) 0 0 0
M
Ongoing questions :
• Is it possible to relate the frequencies of any generation (including RILs) to the
ones of first generation directly (i.e. S0 plant or F1 cross)?
• If yes, it is also possible to relate any generation mean and genetic covariance
the ones of unselfed S0 plant or F1 cross ?
14. Genetic components of generation means
• Thus if r = ½ (for simplicity), the genotypic mean value of S1
M N
progeny of a S0 plant/cross is :
AA
AA
1
1
i k
M N
D
D
D
1
DD
AD
DD
AD
DD
1
1
A
A
A
1
AD
DD
DD
AD
D
1
D
AD
DD
AD
1
• If successive generations are allowed to segregate and
AD
recombine until complete fixation (i.e. neither selection nor
drift), the expected mean value of the RILs will be :
ijkl
jjkl
DD
ijll
ijkk
iikl
jjkk
DD
iill
jjll
DD
iikk
jkl
AD
ikl
ijl
ijk
jkk
AD
jjk
AD
ill
iil
jll
AD
jjl
AD
ikk
iik
jk
AA
il
jl
AA
ik
ll
kk
jj
D
ii
kl
ij
l
k
j
A
i
G S
ijkl
4
8
16
2
4
4
4
2
1
G S
1
D
D
D
2
1
DD
AD
DD
AD
1
AA
AA
1
A
A
G S
jjkk
DD
iill
jjll
DD
iikk
jkk
AD
jkk
AD
AD
ill
AD
iil
jll
AD
jjl
AD
ikk
iik
jk
AA
il
jl
AA
ik
ll
kk
jj
D
ii
A
l
k
j
A
i
ijkl
4
2
2
ijkl
j l
15. Line value concept : definition and prediction
• Line value (LV) is the mean value of all RILs that a plant or a
cross can produce through successive selfings (or haplo-diploïdisation).
• LV may be predicted by any couple of successive generations :
G
G
Sn Sn
1
• If a F1 and its F2 self are both phenotyped, then [2*GF2-GF1]
predicts the mean value of RILs derivable from the cross. The
genetic components may be written as follows :
1
G G A A A A D D D D AA AA AA
AA
F F i j k l ii jj kk ll ik jl il jk
1
AD AD AD AD AD AD AD AD
1
1
1
• This predictor equals the expected LV (S∞
1
Gijkl) except for the DD
terms.
ijkl
ijkl
2*
iik ikk jjl jll iil ill jkk jkk
DD DD iikk jjll DD iill DD jjkk DD DD iikl ijkk DD ijll DD jjkl DD
ijkl
2
4
8
2
2
2
2 * 2 1
16. Line value concept : definition and prediction
• The difference in DD terms between the expected line
value (S∞
Gijkl) and its prediction (2*GF2-GF1) :
1
–The prediction includes the quantity DD=
which is associated with heterozygote structures.
1
–While the line value includes the quantity DD’=
associated with homozygote structures.
This means that if DD=DD’=0, then the prediction of LV
obtained from early generations will be exactly equal to
the expected LV (S∞
Gijkl).
iikl ijkk ijll jjkl ijkl DD DD DD DD DD
2
4
1
DD DD DD DD
iikk jjll iill jjkk 8
17. Applying LV concept to RS breeding scheme :
advantages & specifics aspects
• Efficient & early prediction of the potential of plants or
crosses to produce performant inbred lines, even for traits
with dominance and epistatic interactions.
• In the context of the CIAT rice breeding scheme, unique S0
plants can not be phenotyped properly, so successive
selves can be used to construct the predictor of interest,
which is [2 * S2Gijkl - S1Gijkl] or [2 * S3Gijkl - S2Gijkl],
depending on the quantity of seeds needed for
phenotyping (i.e. monolocal versus multilocal
experimentation).
18. Applying LV concept to RS breeding scheme :
advantages & specifics aspects
• Advantages of LV predictor compared with S2:4 predictor :
– Gain in the duration of the calibration process (1 or 2 generations)
– Gain in the duration of a selection cycle (prediction of S0:2
progenies instead of S2:4 progenies)
– No bias due to dominance (as in single generation phenotyping)
• Specific aspects to focus on :
– Bulk multiplication of seeds is mandatory (to maintain allelic
frequencies to be able to develop the equations)
– The ms locus controlling male sterility is difficult to manage if
genotyping for the locus is not available to differentiate S0 plants
– Number of progenies to be phenotyped is halved if equal
resources is considered (as two generations needed to be
phenotyped)
19. Accelerating further the process
using genomewide markers
• Line value may be used as phenotype in a genomic model
instead of single selfed progeny value. The procedure
consists in:
– GBS Genotyping of S0 plants,
– Phenotyping of S1 and S2 (or S2 and S3) progenies,
• Gain at two levels compared with S2 genotyping and S2:4
phenotyping:
– Calibration takes 2 generations (S1 and S2) or 3 generations (S2
and S3) instead of 4 generations
– Prediction takes place on S0 plants directly without multiplying
until S2 generation
20. Accelerating further the process
using genomewide markers
Procedure when genotyping of ms locus is available :
– Genotyping of S0 plants for ms locus
– GBS genotyping of S0 plants that cary [Ms Ms] genotype at ms
locus only
– Seed increase of [Ms Ms] S0 plants until S2 or S3 generations
– Phenotyping of S1 + S2 (or S2 + S3)
21. Conclusion
This procedure optimises the GS scheme for some aspects :
• Calibration of the model based on very early generations
• Early prediction of the breeding population (S0). This
maximizes the genetic gain par unit time.
• Line value predictor are less unbiased by complex effects even
if these may be important in early generations, in particular
dominance
My presentation is about reflexions and questions based on the experience of genomic selection conducted on upland rice in CIAT
To begin with, I remind what has been done and try to identify some aspects that can be optimised ?