Speed, Holmes & Balding - bioRxiv - August 2019
Evaluating and improving heritability models using summary statistics
There is currently much debate regarding the best way to model how heritability varies across the genome. The authors of GCTA recommend the GCTA-LDMS-I Model, the authors of LD Score Regression recommend the Baseline LD Model, while David Balding and colleagues have instead recommended the LDAK Model. Here they provide a statistical framework for assessing heritability models using summary statistics from genome-wide association studies. Using data from studies of 31 complex human traits (average sample size 136,000), they show that the Baseline LD Model is the most realistic of the existing heritability models, but that it can be significantly improved by incorporating features from the LDAK Model. Their framework also provides a method for estimating the selection-related parameter α from summary statistics, finding strong evidence (P<1e-6) of negative selection for traits including height, systolic blood pressure and college education.
Figure 1. Genetic architecture estimates from different heritability models
a & b, Average estimates of SNP heritability and confounding bias from twelve heritability models; values are relative to those from the BLD-LDAK Model, and calculated using either the 14 UKBb or 17 Public GWAS (bar heights indicate 95% confidence intervals). There are 24 versions of the GCTA+1Fun and LDAK+1Fun Models (one for each function indicator); values here correspond to the versions with highest average logl. c, Concordance correlation coefficient between average estimates of functional enrichments from seven heritability models; values are calculated using either the 14 UKBb or 17 Public GWAS. d, Average estimates of the proportion of SNP heritability contributed by coding SNPs, conserved regions and DNase I hypersensitive sites (three of the 24 functional categories of SNPs) from seven heritability models; values are calculated using either the 14 UKBb or 17 Public GWAS (vertical segments indicate 95% confidence intervals). The estimates of functional enrichments are obtained by dividing these estimates by the proportion of SNPs in each category (dashed horizontal lines). e, Estimates of α for the 14 UKBb GWAS, obtained using the BLD-LDAK+Alpha Model (horizontal lines indicate 95% confidence intervals). α is significantly negative (P<0.05/31) for the eight red traits. See main text for details of the heritability models.