1) Trait-associated SNPs provide insights into the genetic basis of heterosis or hybrid vigor in maize. GWAS identified over 1,000 associations between SNPs and seven yield-related traits.
2) Including dominance effects in models explains more of the observed heterosis and genetic variation than additive effects alone. The ratio of SNPs exhibiting positive versus negative dominance is correlated with heterosis for a given trait.
3) Field-based phenotyping using sensors on robots and UAVs can study dynamic traits influenced by environment and GxE interactions, overcoming limitations of endpoint traits in controlled conditions. This will improve predictive models for plant breeding and variety recommendations.
development of diagnostic enzyme assay to detect leuser virus
2015. Patrik Schnable. Trait associated SNPs provide insights into heterosis in maize
1. Trait-associated SNPs provide insights
into heterosis in maize
Patrick S. Schnable
Iowa State University
China Agriculture University
Data2Bio, LLC
ICRISAT
19 February 2015
2. How to Translate Genomic Data into
Biological Understanding and Crop
Improvement?
2
B73 Reference Genome NGS data in NCBI SRA (Feb. 2014)
zmHapMap1
zmHapMap2
CAU resequencing
ISU Zeanome (RNA-seq)
Ames Diversity Panel
IBM RILs RNA-seq
CAAS resequencing
And many others
Schnable, Ware et al., Science, 2009
Tera(1012)Bases
$32M
3. Associate Genes (or genetic markers)
with Traits
• Which of the ~50,000 maize genes control
important traits?
• GWAS (Genome-wide association studies)
– Typically conducted on diversity panels
– By exploiting historical recombination events they
yield higher resolution associations than QTL
studies
– Identifies associations between genetic markers
(e.g., SNPs) and traits
• Forward and reverse genetics
4. Y: phenotypic trait;
Pi: Fix effect of cross type population (N=4);
Sl: Fix effect of sub-population (N=25).
Approaches for GWAS
4
Y = u+ biPi
i=1
4
å + alSl
l=1
25
å + dSNP +e
• Single-marker GWAS approach
– SNP effects tested one at a time
– Using PLINK command line tool
• Stepwise regression approach
– SNPs fitted in a step-wise manner
– Using GenSel4 Stepwise (alpha=0.05,MaxMarkers=300)
• Bayesian-based approach
– SNPs fitted simultaneously into a model
– Using GenSel4 BayesC (chainLength=41,000, burnin=1,000)
5. GWAS for Yield-Related Traits
Kernel Count
Total Kernel Weight
Avg. Kernel Weight
Cob
Length
Cob Diameter
Cob Weight
Kernel Row Number
Jinliang Yang
(杨金良)
Jeff Ross-Ibarra Lab,
UC Davis
6. Yu, J. et al. Genetics 2008;178:539-551
Nested Association Mapping (NAM) Population
6
Four related populations (N=7,000
lines):
• NAM RILs (N=5,000 lines) + IBM
RILs (N=300 lines)
• Subset of MxRILs (N=300 lines;
IBM + NAM)
• Subset of BxRILs (N=800 lines;
IBM + NAM)
• NAM Partial Diallel (N=250 lines)
7. High Density Genotypic Data
• SNPs from three sources:
– Maize HapMap1* (1.6M)
– Maize HapMap2* (18.4M)
– Our RNA-seq SNPs from
5 tissues (4.9M)
7
# Concordance among overlapping variant sites
HapMap1
0.7 M
HapMap2
16.6 M
0.4 M
98.7%
1.2 M
96.6%
0.3 M
96.9%
0.2 M
RNA-seq
3.2 M
##
#
Imputation or
Projection
NAM RILs
BxNAM RILs
MxNAM RILs
NAM Diallels
*Gore, M.A., et. al.,
Science, 2009;
Chia, H-M, et. al., Nature
Genetics, 2012.
Merging and Filtering
Minor Allele Freq. (MAF) >= 0.1
SNP Missing Rate < 0.6
Merged SNP set
13.0M
8. Phenotypic distributions
8
• CD=Cob Diameter, AKW=Avg. Kernel Weight, CL=Cob Length, CW=Cob
Weight, KC=Kernel Count, TKW=Total Kernel Weight
Based on ~100k observations/trait from 9 locations; ~20% our data and 80%
from: Brown, P. J., et. al., PLoS Genetics, 2011
9. Different GWAS Approaches are
Complementary
9
40/77 (52%) KAVs, representing 39 chromosomal bins
(bin size =100kb), have been cross-validated.
Genotyped TAS
Cross-validated TAS
Single-variant GWAS (-log10(P-Value))
Bayesian-basedGWAS(ModelFreq)
Bayesian-based
and single-variant
N=16/21(76%)
Bayesian-based
N=9/26(35%)
Single-variant
N=10/15(67%)
Stepwise
regression
N=6/14(43%)
13. Missing heritability
Inclusion of dominant gene action
improves predictions
13
Percentage of HPH
heritability
Four GWAS populations Only Diallel population
Additive Dominance General
heritability
Percentage of HPH
Missing heritability
14. Classical Models for Heterosis
Over-dominance
x
AA bb aa BB Aa Bb
Complementation
Zamir
Additive or dominant gene action Over-dominant gene action
15. Degree of Dominance for TASs
15
Degree of dominance (h), where d denotes dominant
effect and a denotes additive effect.
h =
d
a
A A B BBA
a
d
positive
dominance
h > 0.5
negative
dominance
h < -0.5
additive
-0.5 <= h <= 0.5
16. Trait Associated SNP Effects
16
*Dominance includes true dominance, over-dominance and pseudo-overdominance
17. Phenotype (P) = Genotype
(G) + Environment (E) +
GxE
• Genotype: NGS revolution and GBS
• Environment: weather, soil type, water,
nutrients, disease pressure, agronomic
practices etc.
• GxE interactions complicate phenotypic
predictions, but offer fascinating avenues
of investigation
17
L
SL
L L
SL
S
S
L
S
SL S
SSL
L
L
SL
S
SL
SS
L
S
SS
SL SL
S
The Drought Monitor focuses on broad-
scale conditions. Local conditions may
vary. See accompanying text summary for
forecast statements.S
L
U.S. Drought Monitor October 1, 2013
Valid 7 a.m. EDT
(Released Thursday, Oct. 3, 2013)
Intensity:
D0 Abnormally Dry
D1 Moderate Drought
D2 Severe Drought
D3 Extreme Drought
D4 Exceptional Drought
Author:
David Miskus
Drought Impact Types:
S = Short-Term, typically less than
6 months (e.g. agriculture, grasslands)
L = Long-Term, typically greater than
6 months (e.g. hydrology, ecology)
Delineates dominant impacts
NOAA/NWS/NCEP/CPC
18. E and GxE complicate
phenotypic predictions
• Strategies for dealing with “E” and “GxE”
– Study traits that are stable across E
– Conduct studies in controlled environments,
taking E and GxE out of the equation
– Control for and study the effects of E and GxE
statistically…embrace the opportunity to gain
a deeper understanding of the underlying
biology
18
28. Phenotype (P) = Genotype (G) + Environment (E) + GxE
Predictive Models Will:
• Improve the accuracy of selection in plant breeding
programs, thereby increasing the rate of genetic
gain per year
• Enhance our ability to efficiently breed crops to
withstand the increased weather variability
associated with global climate change
• Improved ability to provide farmers with evidence-
based recommendations for the appropriate
varieties to plant in a given field, under a particular
management practice in a given year, leading to
greater farmer profits and enhanced yield stability
28
29. Summary
• DNA sequence variation (SNP) can explain 40-70% of
genetic variation (considering only additive gene action)
or 80-90% (including dominant gene action)
• Dominant effects explain much of the missing heritability
• Ratio of loci exhibiting positive dominant gene action to
those exhibit negative dominant gene action is correlated
with the degree of heterosis for that trait
• Determining which loci confer positive and negative
heterosis for specific traits may increase our ability to
predict hybrid performance
• Phenomics is a bottleneck in GWAS, GS and breeding
• Field-based sensors will allow us to study the genetics of
dynamic traits rather than being limited to end-point traits
30. PSS has IP and equity interests
in Data2Bio LLC
31. Data2Bio, LLC
31
•Founded in 2010, Data2Bio designs,
executes, analyzes and interprets
research projects involving next
generation sequencing
•Core strengths are experimental
design, genomics, bioinformatics, and
breeding support
•Academic and private-sector
customers on all continents except
Antarctica
•Proprietary genomic technologies
associated with DNA barcoding and
genotyping-by-sequencing (tGBS™),
as well as proprietary bioinformatic
pipelines