Petko P. Fiziev https://orcid.org/0000-0002-1572-4621, Jeremy McRae https://orcid.org/0000-0003-3411-9248, Jacob C. Ulirsch https://orcid.org/0000-0002-7947-0827, Jacqueline S. Dron https://orcid.org/0000-0002-3045-6530, Tobias Hamp, Yanshen Yang, Pierrick Wainschtein https://orcid.org/0000-0002-5203-6481, Zijian Ni https://orcid.org/0000-0003-1181-8337, Joshua G. Schraiber, Hong Gao https://orcid.org/0000-0001-6274-4513, Dylan Cable, Yair Field https://orcid.org/0000-0002-5327-1678, Francois Aguet https://orcid.org/0000-0001-9414-300X, Marc Fasnacht, Ahmed Metwally https://orcid.org/0000-0002-0155-7412, Jeffrey Rogers https://orcid.org/0000-0002-7374-6490, Tomas Marques-Bonet https://orcid.org/0000-0002-5597-3075, Heidi L. Rehm https://orcid.org/0000-0002-6025-0015, Anne O’Donnell-Luria https://orcid.org/0000-0001-6418-9592, Amit V. Khera https://orcid.org/0000-0001-6535-5839, and Kyle Kai-How Farh https://orcid.org/0000-0001-6947-8537 [email protected]Authors Info & Affiliations


2 Jun 2023

Vol 380, Issue 6648

Structured Abstract


Genome-wide association studies (GWASs) have identified thousands of common genetic variants that are predictive of common disease susceptibility, but these variants individually have mild effects on disease owing to the effects of natural selection. By contrast, rare genetic variants can have large effects on common disease risk, but their use in genetic risk prediction has been limited to date owing to the difficulty of distinguishing pathogenic from benign variants and estimating the magnitude of their effects.


PrimateAI-3D is a three-dimensional convolutional neural network for missense variant–effect prediction, which was trained with common genetic variants from the population sequencing of 233 primate species. By applying this method to estimate the pathogenicity of rare coding variants in 454,712 UK Biobank individuals, we aimed to improve rare-variant association tests and genetic risk prediction for common diseases and complex traits.


We performed rare-variant burden tests for 90 well-powered, clinically relevant phenotypes in the UK Biobank exome dataset. Stratifying missense variants with PrimateAI-3D greatly improved gene discovery, revealing 73% more significant gene-phenotype associations (false discovery rate <0.05) compared with not using PrimateAI-3D. When benchmarked against prior studies, gene-phenotype pairs identified with our method were better supported by orthogonal genetic evidence from GWAS and genes from related Mendelian disorders. In addition, PrimateAI-3D scores showed the strongest correlation among existing variant interpretation algorithms for predicting the quantitative effects of rare variants on continuous clinical phenotypes.

Having validated our method for finding gene-phenotype relationships, we next constructed a rare-variant polygenic risk score (PRS) model by combining the rare-variant genes for each phenotype, weighting variants by their PrimateAI-3D prediction score and the direction and effect size of each associated gene. For comparison, we constructed common-variant PRS models and evaluated the performance of the two models for genetic risk prediction in a withheld-test subset of the cohort. Although common variants better explained overall population variance, rare-variant PRSs had more power at the ends of the distribution to identify individuals at the greatest risk for disease, and thus may be more relevant for population genetic screening and risk management. By contrast to common-variant PRS models derived from European populations that show poor generalization to non-Europeans, rare-variant PRSs were substantially more portable to different cohorts and ancestry groups that were not seen during model training. Moreover, because they incorporate orthogonal information from nonoverlapping sets of variants, we combined rare- and common-variant PRS models into a unified model and observed further improvement in genetic risk prediction for common diseases.

To understand the extent by which rare-variant PRSs can be expected to improve with increases in discovery cohort size, we repeated our analyses in down-sampled subsets of the UK Biobank cohort. We found that the number of genes contributing to the rare-variant PRS increased linearly, with no signs of plateauing at a half-million exomes. Newly discovered rare-variant genes were strongly enriched at GWAS loci, forming allelic series with effect sizes that were ~10-fold larger on average than the respective common GWAS variant. Among well-powered GWAS loci that could be unambiguously assigned to a single gene, the majority showed subthreshold signal on the rare-variant burden test, indicating that rare penetrant variants exist at a large fraction of GWAS loci and can be incorporated into the rare-variant PRS with further advances in cohort size and variant effect prediction.


Understanding the impact of rare variants in common diseases is of prime interest for both precision medicine and the discovery of drug targets. By leveraging advances in variant effect prediction, we have demonstrated major improvements in rare-variant burden testing and genetic risk prediction. Notably, we observed that nearly all individuals carried at least one rare penetrant variant for the phenotypes we examined, demonstrating the utility of personal genome sequencing for otherwise healthy individuals in the general population.

Polygenic contribution of rare genetic variants to complex human traits, shown for serum cholesterol as a representative example.

(Left) Rare-variant burden tests capture the direction and effect sizes of genes in known lipid biosynthesis pathways. (Top right) When used in a rare-variant polygenic risk score, individuals at opposite ends of the PRS separate into high- and low-cholesterol groups. (Bottom right) Rare variants in these genes have larger effects compared with common variants identified by GWAS and are strongly predictive of individuals who are phenotypic outliers.


We examined 454,712 exomes for genes associated with a wide spectrum of complex traits and common diseases and observed that rare, penetrant mutations in genes implicated by genome-wide association studies confer ~10-fold larger effects than common variants in the same genes. Consequently, an individual at the phenotypic extreme and at the greatest risk for severe, early-onset disease is better identified by a few rare penetrant variants than by the collective action of many common variants with weak effects. By combining rare variants across phenotype-associated genes into a unified genetic risk model, we demonstrate superior portability across diverse global populations compared with common-variant polygenic risk scores, greatly improving the clinical utility of genetic-based risk prediction.

References and Notes


Acknowledgments

We thank D. MacArthur, J. Pritchard, M. Rivas, N. Ersaro, and I. Mitra for helpful discussions, and the participants and investigators in the UK Biobank (Resource Application Number 33751) and MGB studies (protocol 2018P001236) who made this work possible.

Funding: T.M.B. is supported by funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement no. 864203), PID2021-126004NB-100 (MICIIN/FEDER, UE) and Secretaria d'Universitats i Recerca, and CERCA Programme del Departament d'Economia i Coneixement de la Generalitat de Catalunya (GRC 2021 SGR 00177).

Author contributions: P.P.F., J.M., J.C.U., J.S.D., T.H., Y.Y., P.W., Z.N., J.G.S., H.G., A.M., D.C., F.A., M.F., Y.F, and K.K.-H.F. performed the analysis and wrote the manuscript. J.R., T.M.B., H.L.R., A.O.L., A.V.K., and K.F. supervised the work.

Competing interests: H.L.R. receives funding from Illumina, Inc. and Microsoft Corporation to support rare disease gene discovery and diagnosis. A.V.K. is an employee of Verve Therapeutics, Inc., has served as a scientific advisor to Amgen Inc., Novartis AG, Silence Therapeutics PLC, Korro Bio, Inc., Veritas International SL, Color Health, Inc., Third Rock Ventures, Illumina Inc., Ambry Genetics Corporation, and Foresite Labs. A.V.K. holds equity in Verve Therapeutics, Inc., Color Health, Inc., and Foresite Labs. Employees of Illumina, Inc. are indicated in the list of author affiliations. Patents related to this work are (i) "Covariate correction including drug use from temporal data"; filing no. 63/351317; P. Fiziev, J. McRae, and K.-H. Farh; (ii) "Optimized burden test based on nested t tests that maximize separation between carriers and non-carriers"; filing no. 63/351283; P. Fiziev, J. McRae, and K.-H. Farh; (iii) "Rare variant polygenic risk scores"; filing no. 63/351299; P. Fiziev, J. McRae, and K.-H. Farh; and (iv) "Transformer language model for variant pathogenicity"; filing no. US 17/975,536 and US 17/975,547; J. Ede, T. Hamp, A. Dietrich, Y. Wu, and K.-H. Farh.

Data and materials availability: PrimateAI-3D prediction scores are available with a non-commercial license upon request and are displayed at https://primad.basespace.illumina.com. Source code is available at https://github.com/Illumina/PrimateAI-3D, with archived versions of the rare variant burden test and polygenic score at (79) and (80).



