DeepSAGE reveals genetic variants associated with alternative polyadenylation and expression of coding and non-coding transcripts.
ABSTRACT: Many disease-associated variants affect gene expression levels (expression quantitative trait loci, eQTLs) and expression profiling using next generation sequencing (NGS) technology is a powerful way to detect these eQTLs. We analyzed 94 total blood samples from healthy volunteers with DeepSAGE to gain specific insight into how genetic variants affect the expression of genes and lengths of 3'-untranslated regions (3'-UTRs). We detected previously unknown cis-eQTL effects for GWAS hits in disease- and physiology-associated traits. Apart from cis-eQTLs that are typically easily identifiable using microarrays or RNA-sequencing, DeepSAGE also revealed many cis-eQTLs for antisense and other non-coding transcripts, often in genomic regions containing retrotransposon-derived elements. We also identified and confirmed SNPs that affect the usage of alternative polyadenylation sites, thereby potentially influencing the stability of messenger RNAs (mRNA). We then combined the power of RNA-sequencing with DeepSAGE by performing a meta-analysis of three datasets, leading to the identification of many more cis-eQTLs. Our results indicate that DeepSAGE data is useful for eQTL mapping of known and unknown transcripts, and for identifying SNPs that affect alternative polyadenylation. Because of the inherent differences between DeepSAGE and RNA-sequencing, our complementary, integrative approach leads to greater insight into the molecular consequences of many disease-associated variants.
Project description:Identifying the downstream effects of disease-associated SNPs is challenging. To help overcome this problem, we performed expression quantitative trait locus (eQTL) meta-analysis in non-transformed peripheral blood samples from 5,311 individuals with replication in 2,775 individuals. We identified and replicated trans eQTLs for 233 SNPs (reflecting 103 independent loci) that were previously associated with complex traits at genome-wide significance. Some of these SNPs affect multiple genes in trans that are known to be altered in individuals with disease: rs4917014, previously associated with systemic lupus erythematosus (SLE), altered gene expression of C1QB and five type I interferon response genes, both hallmarks of SLE. DeepSAGE RNA sequencing showed that rs4917014 strongly alters the 3' UTR levels of IKZF1 in cis, and chromatin immunoprecipitation and sequencing analysis of the trans-regulated genes implicated IKZF1 as the causal gene. Variants associated with cholesterol metabolism and type 1 diabetes showed similar phenomena, indicating that large-scale eQTL mapping provides insight into the downstream effects of many trait-associated variants.
Project description:Recently it has become clear that only a small percentage (7%) of disease-associated single nucleotide polymorphisms (SNPs) are located in protein-coding regions, while the remaining 93% are located in gene regulatory regions or in intergenic regions. Thus, the understanding of how genetic variations control the expression of non-coding RNAs (in a tissue-dependent manner) has far-reaching implications. We tested the association of SNPs with expression levels (eQTLs) of large intergenic non-coding RNAs (lincRNAs), using genome-wide gene expression and genotype data from five different tissues. We identified 112 cis-regulated lincRNAs, of which 45% could be replicated in an independent dataset. We observed that 75% of the SNPs affecting lincRNA expression (lincRNA cis-eQTLs) were specific to lincRNA alone and did not affect the expression of neighboring protein-coding genes. We show that this specific genotype-lincRNA expression correlation is tissue-dependent and that many of these lincRNA cis-eQTL SNPs are also associated with complex traits and diseases.
Project description:Expression quantitative trait loci (eQTLs) mapping and linkage disequilibrium (LD) analysis have been widely employed to interpret findings of genome-wide association studies (GWAS). With the availability of deep sequencing data of 423 lymphoblastoid cell lines (LCLs) from six global populations and the microarray expression data, we performed eQTL analysis, identified more than 228?K SNP cis-eQTLs and 21?K indel cis-eQTLs and generated a LCL cis-eQTL database. We demonstrate that the percentages of population-shared and population-specific cis-eQTLs are comparable; while indel cis-eQTLs in the population-specific subsection make more contribution to gene expression variations than those in the population-shared subsection. We found cis-eQTLs, especially the population-shared cis-eQTLs are significantly enriched toward transcription start site. Moreover, the National Human Genome Research Institute cataloged GWAS SNPs are enriched for LCL cis-eQTLs. Specifically, 32.8% GWAS SNPs are LCL cis-eQTLs, among which 12.5% can be tagged by indel cis-eQTLs, suggesting the fundamental contribution of indel cis-eQTLs to GWAS association signals. To search for functional indels and SNPs tagging GWAS SNPs, a pipeline Post-GWAS Explorer for Functional Indels and SNPs (PExFInS) has been developed, integrating LD analysis, functional annotation from public databases, cis-eQTL mapping with our LCL cis-eQTL database and other published cis-eQTL datasets.
Project description:Identification of microRNA expression quantitative trait loci (miR-eQTL) can yield insights into regulatory mechanisms of microRNA transcription, and can help elucidate the role of microRNA as mediators of complex traits. Here we present a miR-eQTL mapping study of whole blood from 5,239 individuals, and identify 5,269 cis-miR-eQTLs for 76 mature microRNAs. Forty-nine per cent of cis-miR-eQTLs are located 300-500?kb upstream of their associated intergenic microRNAs, suggesting that distal regulatory elements may affect the interindividual variability in microRNA expression levels. We find that cis-miR-eQTLs are highly enriched for cis-mRNA-eQTLs and regulatory single nucleotide polymorphisms. Among 243 cis-miR-eQTLs that were reported to be associated with complex traits in prior genome-wide association studies, many cis-miR-eQTLs miRNAs display differential expression in relation to the corresponding trait (for example, rs7115089, miR-125b-5p and high-density lipoprotein cholesterol). Our study provides a roadmap for understanding the genetic basis of miRNA expression, and sheds light on miRNA involvement in a variety of complex traits.
Project description:The spontaneously hypertensive rat (SHR) is a widely used rodent model of hypertension and metabolic syndrome. Previously we identified thousands of cis-regulated expression quantitative trait loci (eQTLs) across multiple tissues using a panel of rat recombinant inbred (RI) strains derived from Brown Norway and SHR progenitors. These cis-eQTLs represent potential susceptibility loci underlying physiological and pathophysiological traits manifested in SHR. We have prioritized 60 cis-eQTLs and confirmed differential expression between the parental strains by quantitative PCR in 43 (72%) of the eQTL transcripts. Quantitative trait transcript (QTT) analysis in the RI strains showed highly significant correlation between cis-eQTL transcript abundance and clinically relevant traits such as systolic blood pressure and blood glucose, with the physical location of a subset of the cis-eQTLs colocalizing with "physiological" QTLs (pQTLs) for these same traits. These colocalizing correlated cis-eQTLs (c3-eQTLs) are highly attractive as primary susceptibility loci for the colocalizing pQTLs. Furthermore, sequence analysis of the c3-eQTL genes identified single nucleotide polymorphisms (SNPs) that are predicted to affect transcription factor binding affinity, splicing and protein function. These SNPs, which potentially alter transcript abundance and stability, represent strong candidate factors underlying not just eQTL expression phenotypes, but also the correlated metabolic and physiological traits. In conclusion, by integration of genomic sequence, eQTL and QTT datasets we have identified several genes that are strong positional candidates for pathophysiological traits observed in the SHR strain. These findings provide a basis for the functional testing and ultimate elucidation of the molecular basis of these metabolic and cardiovascular phenotypes.
Project description:Obesity is a multi-factorial health problem in which genetic factors play an important role. Limited results have been obtained in single-gene studies using either genomic or transcriptomic data. RNA sequencing technology has shown its potential in gaining accurate knowledge about the transcriptome, and may reveal novel genes affecting complex diseases. Integration of genomic and transcriptomic variation (expression quantitative trait loci [eQTL] mapping) has identified causal variants that affect complex diseases. We integrated transcriptomic data from adipose tissue and genomic data from a porcine model to investigate the mechanisms involved in obesity using a systems genetics approach.Using a selective gene expression profiling approach, we selected 36 animals based on a previously created genomic Obesity Index for RNA sequencing of subcutaneous adipose tissue. Differential expression analysis was performed using the Obesity Index as a continuous variable in a linear model. eQTL mapping was then performed to integrate 60 K porcine SNP chip data with the RNA sequencing data. Results were restricted based on genome-wide significant single nucleotide polymorphisms, detected differentially expressed genes, and previously detected co-expressed gene modules. Further data integration was performed by detecting co-expression patterns among eQTLs and integration with protein data.Differential expression analysis of RNA sequencing data revealed 458 differentially expressed genes. The eQTL mapping resulted in 987 cis-eQTLs and 73 trans-eQTLs (false discovery rate < 0.05), of which the cis-eQTLs were associated with metabolic pathways. We reduced the eQTL search space by focusing on differentially expressed and co-expressed genes and disease-associated single nucleotide polymorphisms to detect obesity-related genes and pathways. Building a co-expression network using eQTLs resulted in the detection of a module strongly associated with lipid pathways. Furthermore, we detected several obesity candidate genes, for example, ENPP1, CTSL, and ABHD12B.To our knowledge, this is the first study to perform an integrated genomics and transcriptomics (eQTL) study using, and modeling, genomic and subcutaneous adipose tissue RNA sequencing data on obesity in a porcine model. We detected several pathways and potential causal genes for obesity. Further validation and investigation may reveal their exact function and association with obesity.
Project description:Numerous studies indicate that non-coding RNAs (ncRNAs) have critical functions across biological processes, and single-nucleotide polymorphisms (SNPs) could contribute to diseases or traits through influencing ncRNA expression. However, the associations between SNPs and ncRNA expression are largely unknown. Therefore, genome-wide expression quantitative trait loci (eQTL) analysis to assess the effects of SNPs on ncRNA expression, especially in multiple cancer types, will help to understand how risk alleles contribute toward tumorigenesis and cancer development. Using genotype data and expression profiles of ncRNAs of >8700 samples from The Cancer Genome Atlas (TCGA), we developed a computational pipeline to systematically identify ncRNA-related eQTLs (ncRNA-eQTLs) across 33 cancer types. We identified a total of 6 133 278 and 721 122 eQTL-ncRNA pairs in cis-eQTL and trans-eQTL analyses, respectively. Further survival analyses identified 8312 eQTLs associated with patient survival times. Furthermore, we linked ncRNA-eQTLs to genome-wide association study (GWAS) data and found 262 332 ncRNA-eQTLs overlapping with known disease- and trait-associated loci. Finally, a user-friendly database, ncRNA-eQTL (http://ibi.hzau.edu.cn/ncRNA-eQTL), was developed for free searching, browsing and downloading of all ncRNA-eQTLs. We anticipate that such an integrative and comprehensive resource will improve our understanding of the mechanistic basis of human complex phenotypic variation, especially for ncRNA- and cancer-related studies.
Project description:Mapping expression quantitative trait loci (eQTLs) has been shown as a powerful tool to uncover the genetic underpinnings of many complex traits at molecular level. In this paper, we present an integrative analysis approach that leverages eQTL data collected from multiple population groups. In particular, our approach effectively identifies multiple independent cis-eQTL signals that are consistent across populations, accounting for population heterogeneity in allele frequencies and linkage disequilibrium patterns. Furthermore, by integrating genomic annotations, our analysis framework enables high-resolution functional analysis of eQTLs. We applied our statistical approach to analyze the GEUVADIS data consisting of samples from five population groups. From this analysis, we concluded that i) jointly analysis across population groups greatly improves the power of eQTL discovery and the resolution of fine mapping of causal eQTL ii) many genes harbor multiple independent eQTLs in their cis regions iii) genetic variants that disrupt transcription factor binding are significantly enriched in eQTLs (p-value = 4.93 × 10(-22)).
Project description:BACKGROUND: We aimed to assess whether whole blood expression quantitative trait loci (eQTLs) with effects in cis and trans are robust and can be used to identify regulatory pathways affecting disease susceptibility. MATERIALS AND METHODS: We performed whole-genome eQTL analyses in 890 participants of the KORA F4 study and in two independent replication samples (SHIP-TREND, N?=?976 and EGCUT, N?=?842) using linear regression models and Bonferroni correction. RESULTS: In the KORA F4 study, 4,116 cis-eQTLs (defined as SNP-probe pairs where the SNP is located within a 500 kb window around the transcription unit) and 94 trans-eQTLs reached genome-wide significance and overall 91% (92% of cis-, 84% of trans-eQTLs) were confirmed in at least one of the two replication studies. Different study designs including distinct laboratory reagents (PAXgene™ vs. Tempus™ tubes) did not affect reproducibility (separate overall replication overlap: 78% and 82%). Immune response pathways were enriched in cis- and trans-eQTLs and significant cis-eQTLs were partly coexistent in other tissues (cross-tissue similarity 40-70%). Furthermore, four chromosomal regions displayed simultaneous impact on multiple gene expression levels in trans, and 746 eQTL-SNPs have been previously reported to have clinical relevance. We demonstrated cross-associations between eQTL-SNPs, gene expression levels in trans, and clinical phenotypes as well as a link between eQTLs and human metabolic traits via modification of gene regulation in cis. CONCLUSIONS: Our data suggest that whole blood is a robust tissue for eQTL analysis and may be used both for biomarker studies and to enhance our understanding of molecular mechanisms underlying gene-disease associations.
Project description:The simplest definition of cis-eQTLs versus trans, refers to genetic variants that affect expression in an allele specific manner, with implications on underlying mechanism. Yet, due to technical limitations of expression microarrays, the vast majority of eQTL studies performed in the last decade used a genomic distance based definition as a surrogate for cis, therefore exploring local rather than cis-eQTLs.In this study we use RNAseq to explore allele specific expression (ASE) in adipose tissue of male and female F1 mice, produced from reciprocal crosses of C57BL/6J and DBA/2J strains. Comparison of the identified cis-eQTLs, to local-eQTLs, that were obtained from adipose tissue expression in two previous population based studies in our laboratory, yields poor overlap between the two mapping approaches, while both local-eQTL studies show highly concordant results. Specifically, local-eQTL studies show ~60% overlap between themselves, while only 15-20% of local-eQTLs are identified as cis by ASE, and less than 50% of ASE genes are recovered in local-eQTL studies. Utilizing recently published ENCODE data, we also find that ASE genes show significant bias for SNPs prevalence in DNase I hypersensitive sites that is ASE direction specific.We suggest a new approach to analysis of allele specific expression that is more sensitive and accurate than the commonly used fisher or chi-square statistics. Our analysis indicates that technical differences between the cis and local-eQTL approaches, such as differences in genomic background or sex specificity, account for relatively small fraction of the discrepancy. Therefore, we suggest that the differences between two eQTL mapping approaches may facilitate sorting of SNP-eQTL interactions into true cis and trans, and that a considerable portion of local-eQTL may actually represent trans interactions.