Identification of lung cancer histology-specific variants applying Bayesian framework variant prioritization approaches within the TRICL and ILCCO consortia.
ABSTRACT: Large-scale genome-wide association studies (GWAS) have likely uncovered all common variants at the GWAS significance level. Additional variants within the suggestive range (0.0001> P > 5×10(-8)) are, however, still of interest for identifying causal associations. This analysis aimed to apply novel variant prioritization approaches to identify additional lung cancer variants that may not reach the GWAS level. Effects were combined across studies with a total of 33456 controls and 6756 adenocarcinoma (AC; 13 studies), 5061 squamous cell carcinoma (SCC; 12 studies) and 2216 small cell lung cancer cases (9 studies). Based on prior information such as variant physical properties and functional significance, we applied stratified false discovery rates, hierarchical modeling and Bayesian false discovery probabilities for variant prioritization. We conducted a fine mapping analysis as validation of our methods by examining top-ranking novel variants in six independent populations with a total of 3128 cases and 2966 controls. Three novel loci in the suggestive range were identified based on our Bayesian framework analyses: KCNIP4 at 4p15.2 (rs6448050, P = 4.6×10(-7)) and MTMR2 at 11q21 (rs10501831, P = 3.1×10(-6)) with SCC, as well as GAREM at 18q12.1 (rs11662168, P = 3.4×10(-7)) with AC. Use of our prioritization methods validated two of the top three loci associated with SCC (P = 1.05×10(-4) for KCNIP4, represented by rs9799795) and AC (P = 2.16×10(-4) for GAREM, represented by rs3786309) in the independent fine mapping populations. This study highlights the utility of using prior functional data for sequence variants in prioritization analyses to search for robust signals in the suggestive range.
Project description:Lung cancer is the leading cause of cancer death worldwide. Although several genetic variants associated with lung cancer have been identified in the past, stringent selection criteria of genome-wide association studies (GWAS) can lead to missed variants. The objective of this study was to uncover missed variants by using the known association between lung cancer and first-degree family history of lung cancer to enrich the variant prioritization for lung cancer susceptibility regions. In this two-stage GWAS study, we first selected a list of variants associated with both lung cancer and family history of lung cancer in four GWAS (3,953 cases, 4,730 controls), then replicated our findings for 30 variants in a meta-analysis of four additional studies (7,510 cases, 7,476 controls). The top ranked genetic variant rs12415204 in chr10q23.33 encoding FFAR4 in the Discovery set was validated in the Replication set with an overall OR of 1.09 (95% CI=1.04, 1.14, P=1.63×10(-4)). When combining the two stages of the study, the strongest association was found in rs1158970 at Ch4p15.2 encoding KCNIP4 with an OR of 0.89 (95% CI=0.85, 0.94, P=9.64×10(-6)). We performed a stratified analysis of rs12415204 and rs1158970 across all eight studies by age, gender, smoking status, and histology, and found consistent results across strata. Four of the 30 replicated variants act as expression quantitative trait loci (eQTL) sites in 1,111 nontumor lung tissues and meet the genome-wide 10% FDR threshold.
Project description:Genome-wide association studies (GWAS) have identified 58 susceptibility alleles across 37 regions associated with the risk of colorectal cancer (CRC) with P < 5×10(-8) Most studies have been conducted in non-Hispanic whites and East Asians; however, the generalizability of these findings and the potential for ethnic-specific risk variation in Hispanic and Latino (HL) individuals have been largely understudied. We describe the first GWAS of common genetic variation contributing to CRC risk in HL (1611 CRC cases and 4330 controls). We also examine known susceptibility alleles and implement imputation-based fine-mapping to identify potential ethnicity-specific association signals in known risk regions. We discovered 17 variants across 4 independent regions that merit further investigation due to suggestive CRC associations (P < 1×10(-6)) at 1p34.3 (rs7528276; Odds Ratio (OR) = 1.86 [95% confidence interval (CI): 1.47-2.36); P = 2.5×10(-7)], 2q23.3 (rs1367374; OR = 1.37 (95% CI: 1.21-1.55); P = 4.0×10(-7)), 14q24.2 (rs143046984; OR = 1.65 (95% CI: 1.36-2.01); P = 4.1×10(-7)) and 16q12.2 [rs142319636; OR = 1.69 (95% CI: 1.37-2.08); P=7.8×10(-7)]. Among the 57 previously published CRC susceptibility alleles with minor allele frequency ≥1%, 76.5% of SNPs had a consistent direction of effect and 19 (33.3%) were nominally statistically significant (P < 0.05). Further, rs185423955 and rs60892987 were identified as novel secondary susceptibility variants at 3q26.2 (P = 5.3×10(-5)) and 11q12.2 (P = 6.8×10(-5)), respectively. Our findings demonstrate the importance of fine mapping in HL. These results are informative for variant prioritization in functional studies and future risk prediction modeling in minority populations.
Project description:Sepsis is the dysregulated host response to an infection which leads to life-threatening organ dysfunction that varies by host genomic factors. We conducted a genome-wide association study (GWAS) in 740 adult septic patients and focused on 28day mortality as outcome. Variants with suggestive evidence for an association (p≤10-5) were validated in two additional GWA studies (n=3470) and gene coding regions related to the variants were assessed in an independent exome sequencing study (n=74). In the discovery GWAS, we identified 243 autosomal variants which clustered in 14 loci (p≤10-5). The best association signal (rs117983287; p=8.16×10-8) was observed for a missense variant located at chromosome 9q21.2 in the VPS13A gene. VPS13A was further supported by additional GWAS (p=0.03) and sequencing data (p=0.04). Furthermore, CRISPLD2 (p=5.99×10-6) and a region on chromosome 13q21.33 (p=3.34×10-7) were supported by both our data and external biological evidence. We found 14 loci with suggestive evidence for an association with 28day mortality and found supportive, converging evidence for three of them in independent data sets. Elucidating the underlying biological mechanisms of VPS13A, CRISPLD2, and the chromosome 13 locus should be a focus of future research activities.
Project description:The most common side effect of angiotensin-converting enzyme inhibitor (ACEi) drugs is cough. We conducted a genome-wide association study (GWAS) of ACEi-induced cough among 7080 subjects of diverse ancestries in the Electronic Medical Records and Genomics (eMERGE) network. Cases were subjects diagnosed with ACEi-induced cough. Controls were subjects with at least 6 months of ACEi use and no cough. A GWAS (1595 cases and 5485 controls) identified associations on chromosome 4 in an intron of KCNIP4. The strongest association was at rs145489027 (minor allele frequency=0.33, odds ratio (OR)=1.3 (95% confidence interval (CI): 1.2-1.4), P=1.0 × 10(-8)). Replication for six single-nucleotide polymorphisms (SNPs) in KCNIP4 was tested in a second eMERGE population (n=926) and in the Genetics of Diabetes Audit and Research in Tayside, Scotland (GoDARTS) cohort (n=4309). Replication was observed at rs7675300 (OR=1.32 (1.01-1.70), P=0.04) in eMERGE and at rs16870989 and rs1495509 (OR=1.15 (1.01-1.30), P=0.03 for both) in GoDARTS. The combined association at rs1495509 was significant (OR=1.23 (1.15-1.32), P=1.9 × 10(-9)). These results indicate that SNPs in KCNIP4 may modulate ACEi-induced cough risk.
Project description:A form of hereditary cerebellar ataxia has recently been described in the Norwegian Buhund dog breed. This study aimed to identify the genetic cause of the disease. Whole-genome sequencing of two Norwegian Buhund siblings diagnosed with progressive cerebellar ataxia was carried out, and sequences compared with 405 whole genome sequences of dogs of other breeds to filter benign common variants. Nine variants predicted to be deleterious segregated among the genomes in concordance with an autosomal recessive mode of inheritance, only one of which segregated within the breed when genotyped in additional Norwegian Buhunds. In total this variant was assessed in 802 whole genome sequences, and genotyped in an additional 505 unaffected dogs (including 146 Buhunds), and only four affected Norwegian Buhunds were homozygous for the variant. The variant identified, a T to C single nucleotide polymorphism (SNP) (NC_006585.3:g.88890674T>C), is predicted to cause a tryptophan to arginine substitution in a highly conserved region of the potassium voltage-gated channel interacting protein KCNIP4. This gene has not been implicated previously in hereditary ataxia in any species. Evaluation of KCNIP4 protein expression through western blot and immunohistochemical analysis using cerebellum tissue of affected and control dogs demonstrated that the mutation causes a dramatic reduction of KCNIP4 protein expression. The expression of alternative KCNIP4 transcripts within the canine cerebellum, and regional differences in KCNIP4 protein expression, were characterised through RT-PCR and immunohistochemistry respectively. The voltage-gated potassium channel protein KCND3 has previously been implicated in spinocerebellar ataxia, and our findings suggest that the Kv4 channel complex KCNIP accessory subunits also have an essential role in voltage-gated potassium channel function in the cerebellum and should be investigated as potential candidate genes for cerebellar ataxia in future studies in other species.
Project description:Genome-wide association studies have generated over thousands of susceptibility loci for many human complex traits, and yet for most of these associations the true causal variants remain unknown. Tissue/cell type-specific prediction and prioritization of non-coding regulatory variants will facilitate the identification of causal variants and underlying pathogenic mechanisms for particular complex diseases and traits. By leveraging recent large-scale functional genomics/epigenomics data, we develop an intuitive web server, GWAS4D (http://mulinlab.tmu.edu.cn/gwas4d or http://mulinlab.org/gwas4d), that systematically evaluates GWAS signals and identifies context-specific regulatory variants. The updated web server includes six major features: (i) updates the regulatory variant prioritization method with our new algorithm; (ii) incorporates 127 tissue/cell type-specific epigenomes data; (iii) integrates motifs of 1480 transcriptional regulators from 13 public resources; (iv) uniformly processes Hi-C data and generates significant interactions at 5 kb resolution across 60 tissues/cell types; (v) adds comprehensive non-coding variant functional annotations; (vi) equips a highly interactive visualization function for SNP-target interaction. Using a GWAS fine-mapped set for 161 coronary artery disease risk loci, we demonstrate that GWAS4D is able to efficiently prioritize disease-causal regulatory variants.
Project description:Over the past 15 years, genome-wide association studies (GWASs) have enabled the systematic identification of genetic loci associated with traits and diseases. However, due to resolution issues and methodological limitations, the true causal variants and genes associated with traits remain difficult to identify. In this post-GWAS era, many biological and computational fine-mapping approaches now aim to solve these issues. Here, we review fine-mapping and gene prioritization approaches that, when combined, will improve the understanding of the underlying mechanisms of complex traits and diseases. Fine-mapping of genetic variants has become increasingly sophisticated: initially, variants were simply overlapped with functional elements, but now the impact of variants on regulatory activity and direct variant-gene 3D interactions can be identified. Moreover, gene manipulation by CRISPR/Cas9, the identification of expression quantitative trait loci and the use of co-expression networks have all increased our understanding of the genes and pathways affected by GWAS loci. However, despite this progress, limitations including the lack of cell-type- and disease-specific data and the ever-increasing complexity of polygenic models of traits pose serious challenges. Indeed, the combination of fine-mapping and gene prioritization by statistical, functional and population-based strategies will be necessary to truly understand how GWAS loci contribute to complex traits and diseases.
Project description:BACKGROUND:Risk variants identified so far for colorectal cancer explain only a small proportion of familial risk of this cancer, particularly in Asians. METHODS:We performed a genome-wide association study (GWAS) of colorectal cancer in East Asians, including 23,572 colorectal cancer cases and 48,700 controls. To identify novel risk loci, we selected 60 promising risk variants for replication using data from 58,131 colorectal cancer cases and 67,347 controls of European descent. To identify additional risk variants in known colorectal cancer loci, we performed conditional analyses in East Asians. RESULTS:An indel variant, rs67052019 at 1p13.3, was found to be associated with colorectal cancer risk at P = 3.9 × 10-8 in Asians (OR per allele deletion = 1.13, 95% confidence interval = 1.08-1.18). This association was replicated in European descendants using a variant (rs2938616) in complete linkage disequilibrium with rs67052019 (P = 7.7 × 10-3). Of the remaining 59 variants, 12 showed an association at P < 0.05 in the European-ancestry study, including rs11108175 and rs9634162 at P < 5 × 10-8 and two variants with an association near the genome-wide significance level (rs60911071, P = 5.8 × 10-8; rs62558833, P = 7.5 × 10-8) in the combined analyses of Asian- and European-ancestry data. In addition, using data from East Asians, we identified 13 new risk variants at 11 loci reported from previous GWAS. CONCLUSIONS:In this large GWAS, we identified three novel risk loci and two highly suggestive loci for colorectal cancer risk and provided evidence for potential roles of multiple genes and pathways in the etiology of colorectal cancer. In addition, we showed that additional risk variants exist in many colorectal cancer risk loci identified previously. IMPACT:Our study provides novel data to improve the understanding of the genetic basis for colorectal cancer risk.
Project description:Back pain is the #1 cause of years lived with disability worldwide, yet surprisingly little is known regarding the biology underlying this symptom. We conducted a genome-wide association study (GWAS) meta-analysis of chronic back pain (CBP). Adults of European ancestry were included from 15 cohorts in the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium, and from the UK Biobank interim data release. CBP cases were defined as those reporting back pain present for ?3-6 months; non-cases were included as comparisons ("controls"). Each cohort conducted genotyping using commercially available arrays followed by imputation. GWAS used logistic regression models with additive genetic effects, adjusting for age, sex, study-specific covariates, and population substructure. The threshold for genome-wide significance in the fixed-effect inverse-variance weighted meta-analysis was p<5×10(-8). Suggestive (p<5×10(-7)) and genome-wide significant (p<5×10(-8)) variants were carried forward for replication or further investigation in the remaining UK Biobank participants not included in the discovery sample. The discovery sample comprised 158,025 individuals, including 29,531 CBP cases. A genome-wide significant association was found for the intronic variant rs12310519 in SOX5 (OR 1.08, p = 7.2×10(-10)). This was subsequently replicated in 283,752 UK Biobank participants not included in the discovery sample, including 50,915 cases (OR 1.06, p = 5.3×10(-11)), and exceeded genome-wide significance in joint meta-analysis (OR 1.07, p = 4.5×10(-19)). We found suggestive associations at three other loci in the discovery sample, two of which exceeded genome-wide significance in joint meta-analysis: an intergenic variant, rs7833174, located between CCDC26 and GSDMC (OR 1.05, p = 4.4×10(-13)), and an intronic variant, rs4384683, in DCC (OR 0.97, p = 2.4×10(-10)). In this first reported meta-analysis of GWAS for CBP, we identified and replicated a genetic locus associated with CBP (SOX5). We also identified 2 other loci that reached genome-wide significance in a 2-stage joint meta-analysis (CCDC26/GSDMC and DCC).