Functional annotation of putative regulatory elements at cancer susceptibility Loci.
ABSTRACT: Most cancer-associated genetic variants identified from genome-wide association studies (GWAS) do not obviously change protein structure, leading to the hypothesis that the associations are attributable to regulatory polymorphisms. Translating genetic associations into mechanistic insights can be facilitated by knowledge of the causal regulatory variant (or variants) responsible for the statistical signal. Experimental validation of candidate functional variants is onerous, making bioinformatic approaches necessary to prioritize candidates for laboratory analysis. Thus, a systematic approach for recognizing functional (and, therefore, likely causal) variants in noncoding regions is an important step toward interpreting cancer risk loci. This review provides a detailed introduction to current regulatory variant annotations, followed by an overview of how to leverage these resources to prioritize candidate functional polymorphisms in regulatory regions.
Project description:Background: The identification of causal variants responsible for disease associations from genome-wide association studies (GWAS) facilitates functional understanding of the disease mechanisms implicated by GWAS. One of the earliest GWAS associations to COPD spans an intragenic region within FAM13A, but the causal variants at this loci have not yet been identified. Massively parallel reporter assays (MPRA) can be used to prioritize functional regulatory variants in a high-throughput manner. Methods: We used an integrated approach using fine-mapping in over 10,000 subjects from COPD GWAS studies, two MPRA experiments, traditional reporter assays, chromatin conformation capture, and CRISPR-based gene editing to characterize COPD-associated regulatory variants in FAM13A in human bronchial epithelial cell lines. Results: Conditional genetic association and fine mapping analyses identified two independent COPD association signals in FAM13A. MPRA identified 45 common functional regulatory variants, and six COPD-associated putative functional variants were prioritized for further functional investigation. Three variants demonstrated significant activity in traditional reporter assays, and one variant, rs2013701, was selected for further testing in the endogenous genomic context based on a direction of effect consistent with postulated mechanisms of FAM13A-mediated COPD susceptibility. CRISPR-based genome editing for this variant confirmed allele-specific effects on FAM13A expression and altered rates of cellular proliferation, providing multiple levels of functional characterization for this COPD-associated variant. Conclusions: Comprehensive screening for regulatory variants near FAM13A identified the presence of extensive functional regulatory variation within a 250kb window of FAM13A in HBECs. Focused functional evaluation of the COPD-associated functional variants in LD with the two independent association signals in this region prioritized the common variant rs2013701, for which multiple parallel lines of functional evidence confirm allelic effects on FAM13A regulation. Overall design: We used an integrated approach using fine-mapping in over 10,000 subjects from COPD GWAS studies, two MPRA experiments, traditional reporter assays, chromatin conformation capture, and CRISPR-based gene editing to characterize COPD-associated regulatory variants in FAM13A in human bronchial epithelial cell lines.
Project description:Genome-wide association studies have generated over thousands of susceptibility loci for many human complex traits, and yet for most of these associations the true causal variants remain unknown. Tissue/cell type-specific prediction and prioritization of non-coding regulatory variants will facilitate the identification of causal variants and underlying pathogenic mechanisms for particular complex diseases and traits. By leveraging recent large-scale functional genomics/epigenomics data, we develop an intuitive web server, GWAS4D (http://mulinlab.tmu.edu.cn/gwas4d or http://mulinlab.org/gwas4d), that systematically evaluates GWAS signals and identifies context-specific regulatory variants. The updated web server includes six major features: (i) updates the regulatory variant prioritization method with our new algorithm; (ii) incorporates 127 tissue/cell type-specific epigenomes data; (iii) integrates motifs of 1480 transcriptional regulators from 13 public resources; (iv) uniformly processes Hi-C data and generates significant interactions at 5 kb resolution across 60 tissues/cell types; (v) adds comprehensive non-coding variant functional annotations; (vi) equips a highly interactive visualization function for SNP-target interaction. Using a GWAS fine-mapped set for 161 coronary artery disease risk loci, we demonstrate that GWAS4D is able to efficiently prioritize disease-causal regulatory variants.
Project description:Given the abundance of new genomic projects and gene annotations, researchers trying to pinpoint causal genetic variants are faced with a challenging task of how to efficiently integrate all current genomic information. The objective of the study was to develop an approach to integrate various genomic annotations for a recently positionally-cloned Tst gene (Thiosulfate Sulfur Transferase, synonym Rhodanese) responsible for the Fob3b2 QTL effect on leanness and improved metabolic parameters. The second aim was to identify and prioritize Tst genetic variants that may be causal for the phenotypic effects.A bioinformatics approach was developed to integrate existing knowledge of regulatory elements of the Tst gene. The entire Tst locus along with flanking segments was sequenced between our unique polygenic mouse Fat and Lean strains that were generated by divergent selection on adiposity for over 60 generations. The bioinformatics-generated regulatory element map of the Tst locus was then combined with genetic variants between the Fat and Lean mice and with comparative analyses of polymorphisms across 17 mouse strains in order to prioritise likely causal polymorphisms. Two candidate regulatory variants were identified, one overlapping an evolutionary constrained Tst intronic element and the other residing in the seed region of a predicted 3'UTR miRNA binding site.This study developed a map of regulatory elements for the Tst locus in mice and identified candidate genetic variants with increased causal likelihood. This map provides a basis for experimental validation and functional analyses of this novel candidate leanness and antidiabetic gene. Our methodological approach is of general utility for analyzing regulation of loci that have limited annotations and experimental evidence and for identifying candidate causal regulatory genetic variants in post-GWAS or post-QTL- cloning studies.
Project description:Genome-wide association studies (GWAS) have identified thousands of robust and replicable genetic associations for complex disease. However, the identification of the causal variants that underlie these associations has been more difficult. This problem of fine-mapping association signals predates GWAS, but the last few years have seen a surge of studies aimed at pinpointing causal variants using both statistical evidence from large association data sets and functional annotations of genetic variants. Combining these two approaches can often determine not only the causal variant but also the target gene. Recent contributions include analyses of custom genotyping arrays, such as the Immunochip, statistical methods to identify credible sets of causal variants and the addition of functional genomic annotations for coding and non-coding variation to help prioritize variants and discern functional consequence and hence the biological basis of disease risk.
Project description:Founder populations are ideally suited for studies on the clinical effects of alleles that are rare in general populations but occur at higher frequencies in these isolated populations. Whole genome sequencing in 98 Hutterites, a founder population of European descent, and subsequent imputation revealed 660,238 single nucleotide polymorphisms that are rare (<1%) or absent in European populations, but occur at frequencies?>1% in the Hutterites. We examined the effects of these rare in European variants on plasma lipid levels in 828 Hutterites and applied a Bayesian hierarchical framework to prioritize potentially causal variants based on functional annotations. We identified two novel non-coding rare variants associated with LDL cholesterol (rs17242388 in LDLR) and HDL cholesterol (rs189679427 between GOT2 and APOOP5), and replicated previous associations of a splice variant in APOC3 (rs138326449) with triglycerides and HDL-C. All three variants are at well-replicated loci in GWAS but are independent from and have larger effect sizes than the known common variation in these regions. Candidate eQTL analyses in in LCLs in the Hutterites suggest that these rare non-coding variants are likely to mediate their effects on lipid traits by regulating gene expression.
Project description:BACKGROUND:Recent analyses in Greenlandic Inuit identified six genetic polymorphisms (rs74771917, rs3168072, rs12577276, rs7115739, rs174602 and rs174570) in the fatty acid desaturase gene cluster (FADS1-FADS2-FADS3) that are associated with multiple metabolic and anthropometric traits. Our objectives were to systematically assess whether dietary polyunsaturated fatty acid (PUFA) intake modifies the associations between genetic variants in the FADS gene cluster and cardiometabolic traits, and to functionally annotate top-ranking candidates to estimate their regulatory potential. METHODS:Data analyses consisted of the following: interaction analyses between the 6 candidate genetic variants and dietary PUFA intake; gene-centric joint analyses to detect interaction signals in the FADS region; haplotype-centric joint tests across 30 haplotype blocks in the FADS region to refine interaction signals; and functional annotation of top-ranking loci from the previous steps. These analyses were undertaken in Swedish adults from the GLACIER Study (N?=?5,160); data on genetic variation and eight cardiometabolic traits were used. RESULTS:Interactions were observed between rs174570 and n-6 PUFA intake on fasting glucose (Pint?=?0.005) and between rs174602 and n-3 PUFA intake on total cholesterol (Pint?=?0.001). Gene-centric analyses demonstrated a statistically significant interaction effect for FADS and n-3 PUFA on triglycerides (Pint?=?0.005) considering genetic main effects as random. Haplotype analyses revealed three blocks (Pint?<?0.011) that could drive the interaction between FADS and n-3 PUFA on triglycerides; functional annotation of these regions showed that each block harbours a number of highly functional regulatory variants; FADS2 rs5792235 demonstrated the highest functionality score. CONCLUSIONS:The association between FADS variants and triglycerides may be modified by PUFA intake. The intronic FADS2 rs5792235 variant is a potential causal variant in the region, having the highest regulatory potential. However, our results suggest that multiple haplotypes may harbour functional variants in a region, rather than a single causal variant.
Project description:Coronary artery disease (CAD) is the leading cause of mortality and morbidity, driven by both genetic and environmental risk factors. Meta-analyses of genome-wide association studies have identified >150 loci associated with CAD and myocardial infarction susceptibility in humans. A majority of these variants reside in non-coding regions and are co-inherited with hundreds of candidate regulatory variants, presenting a challenge to elucidate their functions. Herein, we use integrative genomic, epigenomic and transcriptomic profiling of perturbed human coronary artery smooth muscle cells and tissues to begin to identify causal regulatory variation and mechanisms responsible for CAD associations. Using these genome-wide maps, we prioritize 64 candidate variants and perform allele-specific binding and expression analyses at seven top candidate loci: 9p21.3, SMAD3, PDGFD, IL6R, BMP1, CCDC97/TGFB1 and LMOD1. We validate our findings in expression quantitative trait loci cohorts, which together reveal new links between CAD associations and regulatory function in the appropriate disease context.
Project description:Capecitabine is an oral 5-fluorouracil (5-FU) pro-drug commonly used to treat colorectal carcinoma and other tumours. About 35% of patients experience dose-limiting toxicity. The few proven genetic biomarkers of 5-FU toxicity are rare variants and polymorphisms, respectively, at candidate loci dihydropyrimidine dehydrogenase (DPYD) and thymidylate synthase (TYMS).We investigated 1456 polymorphisms and rare coding variants near 25 candidate 5-FU pathway genes in 968 UK patients from the QUASAR2 clinical trial.We identified the first common DPYD polymorphisms to be consistently associated with capecitabine toxicity, rs12132152 (toxicity allele frequency (TAF)=0.031, OR=3.83, p=4.31×10(-6)) and rs12022243 (TAF=0.196, OR=1.69, p=2.55×10(-5)). rs12132152 was particularly strongly associated with hand-foot syndrome (OR=6.1, p=3.6×10(-8)). The rs12132152 and rs12022243 associations were independent of each other and of previously reported DPYD toxicity variants. Next-generation sequencing additionally identified rare DPYD variant p.Ala551Thr in one patient with severe toxicity. Using functional predictions and published data, we assigned p.Ala551Thr as causal for toxicity. We found that polymorphism rs2612091, which lies within an intron of ENOSF1, was also associated with capecitabine toxicity (TAF=0.532, OR=1.59, p=5.28×10(-6)). ENSOF1 is adjacent to TYMS and there is a poorly characterised regulatory interaction between the two genes/proteins. Unexpectedly, rs2612091 fully explained the previously reported associations between capecitabine toxicity and the supposedly functional TYMS variants, 5'VNTR 2R/3R and 3'UTR 6 bp ins-del. rs2612091 genotypes were, moreover, consistently associated with ENOSF1 mRNA levels, but not with TYMS expression.DPYD harbours rare and common capecitabine toxicity variants. The toxicity polymorphism in the TYMS region may actually act through ENOSF1.
Project description:Genome-wide association studies have identified 20 genomic regions associated with risk of epithelial ovarian cancer (EOC), but many additional risk variants may exist. Here, we evaluated associations between common genetic variants [single nucleotide polymorphisms (SNPs) and indels] in DNA repair genes and EOC risk. We genotyped 2896 common variants at 143 gene loci in DNA samples from 15 397 patients with invasive EOC and controls. We found evidence of associations with EOC risk for variants at FANCA, EXO1, E2F4, E2F2, CREB5 and CHEK2 genes (P ≤ 0.001). The strongest risk association was for CHEK2 SNP rs17507066 with serous EOC (P = 4.74 x 10(-7)). Additional genotyping and imputation of genotypes from the 1000 genomes project identified a slightly more significant association for CHEK2 SNP rs6005807 (r (2) with rs17507066 = 0.84, odds ratio (OR) 1.17, 95% CI 1.11-1.24, P = 1.1×10(-7)). We identified 293 variants in the region with likelihood ratios of less than 1:100 for representing the causal variant. Functional annotation identified 25 candidate SNPs that alter transcription factor binding sites within regulatory elements active in EOC precursor tissues. In The Cancer Genome Atlas dataset, CHEK2 gene expression was significantly higher in primary EOCs compared to normal fallopian tube tissues (P = 3.72×10(-8)). We also identified an association between genotypes of the candidate causal SNP rs12166475 (r (2) = 0.99 with rs6005807) and CHEK2 expression (P = 2.70×10(-8)). These data suggest that common variants at 22q12.1 are associated with risk of serous EOC and CHEK2 as a plausible target susceptibility gene.
Project description:Utilizing data from published tuberculosis (TB) genome-wide association studies (GWAS), we use a bioinformatics pipeline to detect all polymorphisms in linkage disequilibrium (LD) with variants previously implicated in TB disease susceptibility. The probability that these variants had a predicted regulatory function was estimated using RegulomeDB and Ensembl's Variant Effect Predictor. Subsequent genotyping of these 133 predicted regulatory polymorphisms was performed in 400 admixed South African TB cases and 366 healthy controls in a population-based case-control association study to fine-map the causal variant. We detected associations between tuberculosis susceptibility and six intronic polymorphisms located in MARCO, IFNGR2, ASHAS2, ACACA, NISCH and TLR10. Our post-GWAS approach demonstrates the feasibility of combining multiple TB GWAS datasets with linkage information to identify regulatory variants associated with this infectious disease.