Jie Zheng at the #ICG12 GigaScience Prize Track: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome. ICG12, Shenzhen, 26th October 2017
Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome
1. PhenoSpD: an integrated toolkit for phenotypic
correlation estimation and multiple testing
correction using GWAS summary statistics
Jie Zheng
The 12th International Conference on Genomics
27th Oct 2017
4. Phenome wide association study (PheWAS)
• PheWAS analyzes many phenotypes
compared to a single or multiple
genetic variant(s).
• PheWAS is common place, e.g.
• MR-PheWAS. Millard et al, Sci Rep,
2015
• Haycock et al, JAMA Oncology, 2017
It is likely that longer telomeres increase risk for
several cancers but reduce risk for some non-
neoplastic diseases, including cardiovascular
diseases.
5. Post GWAS era: a database of harmonized GWAS summary
data in MRC Integrative Epidemiology Unit in Bristol
6. The network of post GWAS analysis software
Centralized
Database
PhenoSpD MR-Base
LD Hub
13. Scope of MR-Base
MR-Base
SNP lookups
12 two-sample
MR
methodologies
MR-Base
R- package
Database
~2000 GWAS
(1100 with full data)
14. PhenoSpD: why we need it?
• Molecular phenotypes such as
metabolites are highly correlated.
• Multiple testing correction is a
headache problem: Bonferroni
correction is definitely over killed.
• When individual-level phenotype
data is available, phenotypic
correlation matrix can be calculated
easily.
• However, in real world, phenotype
data is normally not available.
• In MR-Base / LD Hub, we only have
GWAS summary statistics.
• We need a magic hand to correct
multiple testing!
Wurtz et al, J Am Coll Cardiol. 2013
15. PhenoSpD: how it works
1. Harmonize GWAS summary statistics
2. Estimate phenotypic correlation matrix
using metaCCA / LD score regression
3. Apply Spectral decomposition (SpD) to
estimate the equivalent number of
independent variables in the
phenotypic correlation matrix
16. MetaCCA
• Summary statistics-based multivariate association
testing using canonical correlation analysis –
Cichonska et al Bioinformatics 2016
• As a sub-product, it provides a way to estimate
phenotypic correlation matrix 𝑌𝑌, which is equal
to the Pearson correlation between regression
coefficients (betas) of two GWASs
• The assumption is, both traits are from the same
samples
• PS: 1000 Genomes is not the best option to
estimate LD matrix between SNPs. See Benner et
al AJHG 2017, and LDstore
17. LD score regression
• Method to estimate SNP heritability and
genetic correlations -- Bulik-Sullivan et al NG
2014, 2015
• It is also provides a way to estimate phenotypic
correlations between two traits, which is the
intercept term of the bi-variate LD score
regression.
• Compare to metaCCA, it adjusted for sample
overlap automatically
• Both genetic and phenotypic correlation
matrixes can be found in LD Hub
18. SNPSpD and MatSpD
• SNPSpD: A simple correction for multiple testing for SNPs in LD using
spectral decomposition (SpD). Nyholt 2004 AJHG
• MatSpD: MatrixSpD, estimate the equivalent number of independent
variables in a correlation (r) matrix
• The same method can be used to estimate the number of
independent variables in a phenotypic correlation matrix
19. Simulation
• How accurate is the phenotypic correlation estimation using GWAS results?
• Is there any parameters strongly affecting such estimation?
21. Accuracy tests using real data
The estimated phenotypic correlations have
good agreement with observed phenotypic
correlations
The exceptions are traits with limited sample size
(therefore limited sample overlap).
• Shin et al provided the observed phenotypic correlation matrix for 452 metabolites, which can be used as a
test dataset
• So we compared the observed phenotypic correlation with the estimated phenotypic correlation using
PhenoSpD.
22. Growth importance of PhenoSpD
• PhenoSpD is particularly useful for multiple GWASs from the same
samples, e.g. complex molecular traits such as metabolites and
cytokines
• It can also be applied to all traits in MR-Base / LD Hub, which we can
split traits into groups, e.g. all traits in GIANT consortium are highly
possible to be correlated and majority of them are from the same
sample
23. Real case application in MR-Base and LD Hub
Consortium / First
author
Category N_traits N_SNPs N_correlations N_independent_traits
Kettunen Blood metabolites 123 9826292 7503 44.9
Shin Metaoblites 451 2482345 101475 324.4
Roederer Immune system
phenotypes
151 1585187 11325 94.2
CARDIOGRAM 2 335391 1 1
TRICL 4 335391 6 3
TAG 4 1449634 6 3.98
SSGAC 7 1449634 21 6
PGC 4 335391 6 3.644
Leptin 2 1449634 1 1
MAGIC 16 1449634 120 11.098
IIBDGC 3 335391 3 2
Hrgene 8 1449634 28 7
HaemGen 6 1449634 15 5
GPC 6 1449634 15 5
GLGC 4 1449634 6 3
GIANT 15 1449634 105 10.1097
GEFOS 3 1449634 3 3
CKDGen 9 335391 36 8
EGG 4 1449634 6 4
GIS 2 2029112 1 1
GUGC 2 2449580 1 1
ENIGMA 7 7237736 21 6
UK Biobank 5 9440243 9 5
Others 24 / / 24
All 862 / 120713 577.3317
Number of independent traits in MR-Base
Consortium /
First author
Category N_traits N_SNPs N_correlations N_independent_traits
All traits All traits 221 / 24310 134.1167
Number of independent traits in LD Hub
24. Growth importance of PhenoSpD
• There is a great potential to apply PhenoSpD to multiple traits in large
scale biobanks and cohorts such as UK Biobank, China Kadoorie
Biobank, HUNT study (all traits in one sample)
25. UK Biobank release from Ben Neale’s group
• RAPID GWAS OF THOUSANDS OF
PHENOTYPES FOR 337,000 SAMPLES IN
THE UK BIOBANK
(http://www.nealelab.is/blog/2017/7/
19/rapid-gwas-of-thousands-of-
phenotypes-for-337000-samples-in-
the-uk-biobank)
• GWAS summary statistics of 337,000
European samples are available for
over 2,400 human traits, everyone can
access and download the results.
• ~600 traits are heritable, which are the
most valuable data
26. PhenoSpD application
• Assess the potential causal relationship between genetic variation, DNA methylation and 139
complex traits.
• PhenoSpD:
139 outcomes 62 independent outcomes
Hypothesis free MR of DNA methylation on 139 human traits
27. Links for PhenoSpD
• PhenoSpD Paper is on bioRxiv:
https://www.biorxiv.org/content/early/2017/07/25/148627
• R scripts of PhenoSpD can be found on MRC-IEU github:
https://github.com/MRCIEU/PhenoSpD
• LD Hub: http://ldsc.broadinstitute.org/ldhub/
• MR-Base: www.mrbase.org
28. Acknowledgements
• LD Hub team
• Jie Zheng
• David M Evans
• Benjamin Neale
• MR-Base team
• Gibran Hemani
• Jie Zheng
• George Davey Smith
• Tom Gaunt
• Philip Haycock
• PhenoSpD team
• Jie Zheng
• Tom Richardson
• Louise Millard
• Gibran Hemani
• Chris Raistrick
• Bjarni Vilhjalmsson
• Philip Haycock
• Tom Gaunt