Powerful Set-Based Gene-Environment Interaction Testing Framework for Complex Diseases.
ABSTRACT: Identification of gene-environment interaction (G × E) is important in understanding the etiology of complex diseases. Based on our previously developed Set Based gene EnviRonment InterAction test (SBERIA), in this paper we propose a powerful framework for enhanced set-based G × E testing (eSBERIA). The major challenge of signal aggregation within a set is how to tell signals from noise. eSBERIA tackles this challenge by adaptively aggregating the interaction signals within a set weighted by the strength of the marginal and correlation screening signals. eSBERIA then combines the screening-informed aggregate test with a variance component test to account for the residual signals. Additionally, we develop a case-only extension for eSBERIA (coSBERIA) and an existing set-based method, which boosts the power not only by exploiting the G-E independence assumption but also by avoiding the need to specify main effects for a large number of variants in the set. Through extensive simulation, we show that coSBERIA and eSBERIA are considerably more powerful than existing methods within the case-only and the case-control method categories across a wide range of scenarios. We conduct a genome-wide G × E search by applying our methods to Illumina HumanExome Beadchip data of 10,446 colorectal cancer cases and 10,191 controls and identify two novel interactions between nonsteroidal anti-inflammatory drugs (NSAIDs) and MINK1 and PTCHD3.
Project description:Adenomatous Polyposis Coli (APC) is the most frequently mutated gene in colorectal cancer. APC negatively regulates the Wnt signaling pathway by promoting the degradation of ?-catenin, but the extent to which APC exerts Wnt/?-catenin-independent tumor-suppressive activity is unclear. To identify interaction partners and ?-catenin-independent targets of endogenous, full-length APC, we applied label-free and multiplexed tandem mass tag-based mass spectrometry. Affinity enrichment-mass spectrometry identified more than 150 previously unidentified APC interaction partners. Moreover, our global proteomic analysis revealed that roughly half of the protein expression changes that occur in response to APC loss are independent of ?-catenin. Combining these two analyses, we identified Misshapen-like kinase 1 (MINK1) as a putative substrate of an APC-containing destruction complex. We validated the interaction between endogenous MINK1 and APC and further confirmed the negative, and ?-catenin-independent, regulation of MINK1 by APC. Increased Mink1/Msn levels were also observed in mouse intestinal tissue and Drosophila follicular cells expressing mutant Apc/APC when compared with wild-type tissue/cells. Collectively, our results highlight the extent and importance of Wnt-independent APC functions in epithelial biology and disease. IMPLICATIONS: The tumor-suppressive function of APC, the most frequently mutated gene in colorectal cancer, is mainly attributed to its role in ?-catenin/Wnt signaling. Our study substantially expands the list of APC interaction partners and reveals that approximately half of the changes in the cellular proteome induced by loss of APC function are mediated by ?-catenin-independent mechanisms.
Project description:Identification of gene-environment interaction (G × E) is important in understanding the etiology of complex diseases. However, partially due to the lack of power, there have been very few replicated G × E findings compared to the success in marginal association studies. The existing G × E testing methods mainly focus on improving the power for individual markers. In this paper, we took a different strategy and proposed a set-based gene-environment interaction test (SBERIA), which can improve the power by reducing the multiple testing burdens and aggregating signals within a set. The major challenge of the signal aggregation within a set is how to tell signals from noise and how to determine the direction of the signals. SBERIA takes advantage of the established correlation screening for G × E to guide the aggregation of genotypes within a marker set. The correlation screening has been shown to be an efficient way of selecting potential G × E candidate SNPs in case-control studies for complex diseases. Importantly, the correlation screening in case-control combined samples is independent of the interaction test. With this desirable feature, SBERIA maintains the correct type I error level and can be easily implemented in a regular logistic regression setting. We showed that SBERIA had higher power than benchmark methods in various simulation scenarios, both for common and rare variants. We also applied SBERIA to real genome-wide association studies (GWAS) data of 10,729 colorectal cancer cases and 13,328 controls and found evidence of interaction between the set of known colorectal cancer susceptibility loci and smoking.
Project description:Copy number variations (CNVs) can contribute to variable degrees of fitness and/or disease predisposition. Recent studies show that at least 1% of any given genome is copy number variable when compared to the human reference sequence assembly. Homozygous deletions (or CNV nulls) that are found in the normal population are of particular interest because they may serve to define non-essential genes in human biology.In a genomic screen investigating CNV in Autism Spectrum Disorders (ASDs) we detected a heterozygous deletion on chromosome 10p12.1, spanning the Patched-domain containing 3 (PTCHD3) gene, at a frequency of ~1.4% (6/427). This finding seemed interesting, given recent discoveries on the role of another Patched-domain containing gene (PTCHD1) in ASD. Screening of another 177 ASD probands yielded two additional heterozygous deletions bringing the frequency to 1.3% (8/604). The deletion was found at a frequency of ~0.73% (27/3,695) in combined control population from North America and Northern Europe predominately of European ancestry. Screening of the human genome diversity panel (HGDP-CEPH) covering worldwide populations yielded deletions in 7/1,043 unrelated individuals and those detected were confined to individuals of European/Mediterranean/Middle Eastern ancestry. Breakpoint mapping yielded an identical 102,624 bp deletion in all cases and controls tested, suggesting a common ancestral event. Interestingly, this CNV occurs at a break of synteny between humans and mouse. Considering all data, however, no significant association of these rare PTCHD3 deletions with ASD was observed. Notwithstanding, our RNA expression studies detected PTCHD3 in several tissues, and a novel shorter isoform for PTCHD3 was characterized. Expression in transfected COS-7 cells showed PTCHD3 isoforms colocalize with calnexin in the endoplasmic reticulum. The presence of a patched (Ptc) domain suggested a role for PTCHD3 in various biological processes mediated through the Hedgehog (Hh) signaling pathway. However, further investigation yielded one individual harboring a homozygous deletion (PTCHD3 null) without ASD or any other overt abnormal phenotype. Exon sequencing of PTCHD3 in other individuals with deletions revealed compound point mutations also resulting in a null state.Our data suggests that PTCHD3 may be a non-essential gene in some humans and characterization of this novel CNV at 10p12.1 will facilitate population and disease studies.
Project description:Cytokinesis is initiated by constriction of the cleavage furrow and terminated by abscission of the intercellular bridge that connects two separating daughter cells. The complicated processes of cytokinesis are coordinated by phosphorylation and dephosphorylation mediated by protein kinases and phosphatases. Mammalian Misshapen-like kinase 1 (MINK1) is a member of the germinal center kinases and is known to regulate cytoskeletal organization and oncogene-induced cell senescence. To search for novel regulators of cytokinesis, we performed a screen using a library of siRNAs and found that MINK1 was essential for cytokinesis. Time-lapse analysis revealed that MINK1-depleted cells were able to initiate furrowing but that abscission was disrupted. STRN4 (Zinedin) is a regulatory subunit of protein phosphatase 2A (PP2A) and was recently shown to be a component of a novel protein complex called striatin-interacting phosphatase and kinase (STRIPAK). Mass spectrometry analysis showed that MINK1 was a component of STRIPAK and that MINK1 directly interacted with STRN4. Similar to MINK1 depletion, STRN4-knockdown induced multinucleated cells and inhibited the completion of abscission. In addition, STRN4 reduced MINK1 activity in the presence of catalytic and structural subunits of PP2A. Our study identifies a novel regulatory network of protein kinases and phosphatases that regulate the completion of abscission.
Project description:Pathway analysis has become popular as a secondary analysis strategy for genome-wide association studies (GWAS). Most of the current pathway analysis methods aggregate signals from the main effects of single nucleotide polymorphisms (SNPs) in genes within a pathway without considering the effects of gene-gene interactions. However, gene-gene interactions can also have critical effects on complex diseases. Protein-protein interaction (PPI) networks have been used to define gene pairs for the gene-gene interaction tests. Incorporating the PPI information to define gene pairs for interaction tests within pathways can increase the power for pathway-based association tests. We propose a pathway association test, which aggregates the interaction signals in PPI networks within a pathway, for GWAS with case-control samples. Gene size is properly considered in the test so that genes do not contribute more to the test statistic simply due to their size. Simulation studies were performed to verify that the method is a valid test and can have more power than other pathway association tests in the presence of gene-gene interactions within a pathway under different scenarios. We applied the test to the Wellcome Trust Case Control Consortium GWAS datasets for seven common diseases. The most significant pathway is the chaperones modulate interferon signaling pathway for Crohn's disease (p-value = 0.0003). The pathway modulates interferon gamma, which induces the JAK/STAT pathway that is involved in Crohn's disease. Several other pathways that have functional implications for the seven diseases were also identified. The proposed test based on gene-gene interaction signals in PPI networks can be used as a complementary tool to the current existing pathway analysis methods focusing on main effects of genes. An efficient software implementing the method is freely available at http://puppi.sourceforge.net.
Project description:Host genetic variability may contribute to susceptibility of bacterial meningitis, but which genes contribute to the susceptibility to this complex disease remains undefined. We performed a genetic association study in 469 community-acquired pneumococcal meningitis cases and 2072 population-based controls from the Utrecht Health Project in order to find genetic variants associated with pneumococcal meningitis susceptibility. A HumanExome BeadChip was used to genotype 102,097 SNPs in the collected DNA samples. Associations were tested with the Fisher exact test. None of the genetic variants tested reached Bonferroni corrected significance (p-value <5?×?10(-7)). Our strongest signals associated with susceptibility to pneumococcal meningitis were rs139064549 on chromosome 1 in the COL11A1 gene (p?=?1.51?×?10(-6); G allele OR 3.21 [95% CI 2.05-5.02]) and rs9309464 in the EXOC6B gene on chromosome 2 (p?=?6.01?×?10(-5); G allele OR 0.66 [95% CI 0.54-0.81]). The sequence kernel association test (SKAT) tests for associations between multiple variants in a gene region and pneumococcal meningitis susceptibility yielded one significant associated gene namely COL11A1 (p?=?1.03?×?10(-7)). Replication studies are needed to validate these results. If replicated, the functionality of these genetic variations should be further studied to identify by which means they influence the pathophysiology of pneumococcal meningitis.
Project description:Cardiovascular (CV)- and lifestyle-associated risk factors (RFs) are increasingly recognized as important for Alzheimer's disease (AD) pathogenesis. Beyond the ?4 allele of apolipoprotein E (APOE), comparatively little is known about whether CV-associated genes also increase risk for AD. Using large genome-wide association studies and validated tools to quantify genetic overlap, we systematically identified single nucleotide polymorphisms (SNPs) jointly associated with AD and one or more CV-associated RFs, namely body mass index (BMI), type 2 diabetes (T2D), coronary artery disease (CAD), waist hip ratio (WHR), total cholesterol (TC), triglycerides (TG), low-density (LDL) and high-density lipoprotein (HDL). In fold enrichment plots, we observed robust genetic enrichment in AD as a function of plasma lipids (TG, TC, LDL, and HDL); we found minimal AD genetic enrichment conditional on BMI, T2D, CAD, and WHR. Beyond APOE, at conjunction FDR?<?0.05 we identified 90 SNPs on 19 different chromosomes that were jointly associated with AD and CV-associated outcomes. In meta-analyses across three independent cohorts, we found four novel loci within MBLAC1 (chromosome 7, meta-p?=?1.44?×?10-9), MINK1 (chromosome 17, meta-p?=?1.98?×?10-7) and two chromosome 11 SNPs within the MTCH2/SPI1 region (closest gene?=?DDB2, meta-p?=?7.01?×?10-7 and closest gene?=?MYBPC3, meta-p?=?5.62?×?10-8). In a large 'AD-by-proxy' cohort from the UK Biobank, we replicated three of the four novel AD/CV pleiotropic SNPs, namely variants within MINK1, MBLAC1, and DDB2. Expression of MBLAC1, SPI1, MINK1 and DDB2 was differentially altered within postmortem AD brains. Beyond APOE, we show that the polygenic component of AD is enriched for lipid-associated RFs. We pinpoint a subset of cardiovascular-associated genes that strongly increase the risk for AD. Our collective findings support a disease model in which cardiovascular biology is integral to the development of clinical AD in a subset of individuals.
Project description:Identifying the viability of protein targets is one of the preliminary steps of drug discovery. Determining the ability of a protein to bind drugs in order to modulate its function, termed the druggability, requires a non-trivial amount of time and resources. Inability to properly measure druggability has accounted for a significant portion of failures in drug discovery. This problem is only further exacerbated by the large sample space of proteins involved in human diseases. With these barriers, the druggability space within the human proteome remains unexplored and has made it difficult to develop drugs for numerous diseases. Hence, we present a new feature developed in eFindSite that employs supervised machine learning to predict the druggability of a given protein. Benchmarking calculations against the Non-Redundant data set of Druggable and Less Druggable binding sites demonstrate that an AUC for druggability prediction with eFindSite is as high as 0.88. With eFindSite, we elucidated the human druggability space to be 10,191 proteins. Considering the disease space from the Open Targets Platform and excluding already known targets from the predicted data set reveal 2731 potentially novel therapeutic targets. eFindSite is freely available as a stand-alone software at https://github.com/michal-brylinski/efindsite .
Project description:Steroid-sensitive nephrotic syndrome (SSNS) accounts for >80% of cases of nephrotic syndrome in childhood. However, the etiology and pathogenesis of SSNS remain obscure. Hypothesizing that coding variation may underlie SSNS risk, we conducted an exome array association study of SSNS. We enrolled a discovery set of 363 persons (214 South Asian children with SSNS and 149 controls) and genotyped them using the Illumina HumanExome Beadchip. Four common single nucleotide polymorphisms (SNPs) in HLA-DQA1 and HLA-DQB1 (rs1129740, rs9273349, rs1071630, and rs1140343) were significantly associated with SSNS at or near the Bonferroni-adjusted P value for the number of single variants that were tested (odds ratio, 2.11; 95% confidence interval, 1.56 to 2.86; P=1.68×10(-6) (Fisher exact test). Two of these SNPs-the missense variants C34Y (rs1129740) and F41S (rs1071630) in HLA-DQA1-were replicated in an independent cohort of children of white European ancestry with SSNS (100 cases and ?589 controls; P=1.42×10(-17)). In the rare variant gene set-based analysis, the best signal was found in PLCG2 (P=7.825×10(-5)). In conclusion, this exome array study identified HLA-DQA1 and PLCG2 missense coding variants as candidate loci for SSNS. The finding of a MHC class II locus underlying SSNS risk suggests a major role for immune response in the pathogenesis of SSNS.