Abstract
Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98th or <2nd percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments. Introduction Elevated low-density lipoprotein cholesterol (LDL-C) is one of the cardinal risk factors for coronary artery disease, the leading cause of death in the United States.1 LDL-C is a complex trait whose variation is influenced by the environment and genes; approximately 40%–50% of the variation is estimated as heritable.2,3 Rare mutations have been identified in families affected by Mendelian forms of lipid-related disorders. Family members carrying these rare variants typically demonstrate extreme lipid phenotypes in childhood and, for those with high LDL-C, premature cardiovascular disease. Family studies have shown that extremely high cholesterol levels can result from mutations in LDLR (MIM 606945), PCSK9 (MIM 607786), APOB (MIM 107730), ABCG5 (MIM 605459), ABCG8 (MIM 605460), and LDLRAP1 (MIM 605747), whereas extremely low cholesterol levels can result from mutations in PCSK9, MTTP (MIM 590075), APOB (Rahalkar and Hegele4), and ANGPTL35 (MIM 603874). Targeted sequencing studies in subjects with low cholesterol levels have detected rare mutations in LDLR,6 PCSK9,7 and NPC1L18 (MIM 608010), but the overall contribution of rare and low-frequency variants to population variation in cholesterol levels remains poorly defined. Genome-wide association studies (GWASs) focused primarily on common variants have identified 157 loci associated with lipid levels, including LDL-C.9 Although GWASs have identified loci with robust evidence of association with LDL-C, only 10%–12% of the total variance in LDL-C can be attributed to these common variants,9 despite 40%–50% estimated heritability.2,3 We evaluated the hypothesis that rare or low-frequency variants, which are not well covered by GWASs and not easily imputed, are also associated with LDL-C. In the current study, we performed a two-stage association study to evaluate low-frequency variation in protein-coding regions across the genome for association with LDL-C. We examined the spectrum of coding variants in associated genes in an unbiased manner. To address these goals, the NHLBI Grand Opportunity (GO) Exome Sequencing Project (ESP)10 completed exome sequencing and analysis of 2,005 individuals, including 307 individuals with extremely high and 247 with extremely low LDL-C (>98th percentile or <2nd percentile) from population-based cohorts (stage 1). We followed up with the most promising 17 genes in 1,302 additional sequenced individuals, including 157 individuals with extremely high and 144 with extremely low LDL-C (stage 2). We also performed genotype-based follow-up of variants in 15 genes in up to 52,221 participants from population-based cohorts.
View more >>