Allele-specific DNA methylation: beyond imprinting.
ABSTRACT: Allele-specific DNA methylation (ASM) and allele-specific gene expression (ASE) have long been studied in genomic imprinting and X chromosome inactivation. But these types of allelic asymmetries, along with allele-specific transcription factor binding (ASTF), have turned out to be far more pervasive-affecting many non-imprinted autosomal genes in normal human tissues. ASM, ASE and ASTF have now been mapped genome-wide by microarray-based methods and NextGen sequencing. Multiple studies agree that all three types of allelic asymmetries, as well as the related phenomena of expression and methylation quantitative trait loci, are mostly accounted for by cis-acting regulatory polymorphisms. The precise mechanisms by which this occurs are not yet understood, but there are some testable hypotheses and already a few direct clues. Future challenges include achieving higher resolution maps to locate the epicenters of cis-regulated ASM, using this information to test mechanistic models, and applying genome-wide maps of ASE/ASM/ASTF to pinpoint functional regulatory polymorphisms influencing disease susceptibility.
Project description:Over the past decades, genome-wide association studies (GWAS) have identified thousands of phenotype-associated DNA sequence variants for potential explanations of inter-individual phenotypic differences and disease susceptibility. However, it remains a challenge for translating the associations into causative mechanisms for complex diseases, partially due to the involved variants in the noncoding regions and the inconvenience of functional studies in human population samples. So far, accumulating evidence has suggested a complex crosstalk among genetic variants, allele-specific binding of transcription factors (ABTF), and allele-specific DNA methylation patterns (ASM), as well as environmental factors for disease risk. This review aims to summarize the current studies regarding the interactions of the aforementioned factors with a focus on epigenetic insights. We present two scenarios of single nucleotide polymorphisms (SNPs) in coding regions and non-coding regions for disease risk, via potentially impacting epigenetic patterns. While a SNP in a coding region may confer disease risk via altering protein functions, a SNP in non-coding region may cause diseases, via SNP-altering ABTF, ASM, and allele-specific gene expression (ASE). The allelic increases or decreases of gene expression are key for disease risk during development. Such ASE can be achieved via either a "SNP-introduced ABTF to ASM" or a "SNP-introduced ASM to ABTF." Together with our additional in-depth review on insulator CTCF, we are convinced to propose a working model that the small effect of a SNP acts through altered ABTF and/or ASM, for ASE and eventual disease outcome (named as a "SNP intensifier" model). In summary, the significance of complex crosstalk among genetic factors, epigenetic patterns, and environmental factors requires further investigations for disease susceptibility.
Project description:Though sequence differences between alleles are often limited to a few polymorphisms, these differences can cause large and widespread allelic variation at the expression level. Such allele-specific expression (ASE) has been extensively explored at the level of transcription but not translation. Here we measured ASE in the diploid yeast Candida albicans at both the transcriptional and translational levels using RNA-seq and ribosome profiling, respectively. Since C. albicans is an obligate diploid, our analysis isolates ASE arising from cis elements in a natural, nonhybrid organism, where allelic effects reflect evolutionary forces. Importantly, we find that ASE arising from translation is of a similar magnitude as transcriptional ASE, both in terms of the number of genes affected and the magnitude of the bias. We further observe coordination between ASE at the levels of transcription and translation for single genes. Specifically, reinforcing relationships--where transcription and translation favor the same allele--are more frequent than expected by chance, consistent with selective pressure tuning ASE at multiple regulatory steps. Finally, we parameterize alleles based on a range of properties and find that SNP location and predicted mRNA-structure stability are associated with translational ASE in cis. Since this analysis probes more than 4000 allelic pairs spanning a broad range of variations, our data provide a genome-wide view into the relative impact of cis elements that regulate translation.
Project description:Allelic imbalance is a common phenomenon in mammals that plays an important role in gene regulation. An Allele Specific Expression (ASE) approach can be used to detect variants with a cis-regulatory effect on gene expression. In cattle, this type of study has only been done once in Holstein. In our study we performed a genome-wide analysis of ASE in 19 Limousine muscle samples. We identified 5,658 ASE SNPs (Single Nucleotide Polymorphisms showing allele specific expression) in 13% of genes with detectable expression in the Longissimus thoraci muscle. Interestingly we found allelic imbalance in AOX1, PALLD and CAST genes. We also found 2,107 ASE SNPs located within genomic regions associated with meat or carcass traits. In order to identify causative cis-regulatory variants explaining ASE we searched for SNPs altering binding sites of transcription factors or microRNAs. We identified one SNP in the 3'UTR region of PRNP that could be a causal regulatory variant modifying binding sites of several miRNAs. We showed that ASE is frequent within our muscle samples. Our data could be used to elucidate the molecular mechanisms underlying gene expression imbalance.
Project description:Though sequence differences between alleles are often limited to a few polymorphisms, these differences can cause large and widespread allelic variation at the expression level. Such allele-specific expression (ASE) has been extensively explored at the level of transcription but not translation. Here we measured ASE in the diploid yeast Candida albicans at both the transcriptional and translational levels using RNA-seq and ribosome profiling, respectively. Since C. albicans is an obligate diploid, our analysis isolates ASE arising from cis elements in a natural, non-hybrid organism, where allelic effects reflect evolutionary forces. Importantly, we find that ASE arising from translation is of a similar magnitude as transcriptional ASE, both in terms of the number of genes affected and the magnitude of the bias. We further observe coordination between ASE at the levels of transcription and translation for single genes. Specifically, reinforcing relationships—where transcription and translation favor the same allele—are more frequent than expected by chance, consistent with selective pressure tuning ASE at multiple regulatory steps. Finally, we parameterize alleles based on a range of properties and find that SNP location and predicted mRNA-structure stability are associated with translational ASE in cis. Since this analysis probes more than 4,000 allelic pairs spanning a broad range of variations, our data provide a genome-wide view into the relative impacts of cis elements that regulate translation. Two biological replicates of WT Candida albicans ribosome profiling and RNA-seq
Project description:We previously identified sequence-dependent allele-specific methylation (sd-ASM) in adult human peripheral blood leukocytes, in which ASM occurs in cis depending on adjacent polymorphic sequences. A number of groups have identified sd-ASM sites in the human and mouse genomes, illustrating the prevalence of sd-ASM in mammalian genomes. In addition, sd-ASM can lead to sequence-dependent allele-specific expression of neighbouring genes. Imprinted genes also often exhibit parent-of-origin-dependent allele-specific methylation (pd-ASM), which causes parent-of-origin-dependent allele-specific expression. However, whether most of the already known sd-ASM and pd-ASM sites are methylated or hydroxymethylated remains unclear due to technical restrictions. Accordingly, a novel method that enables examination of allelic methylation and hydroxymethylation status and also overcomes the drawbacks of conventional methods is needed. Such a method could also be used to elucidate the mechanisms underlying polymorphism-associated inter-individual differences in disease susceptibility and the mechanism of genomic imprinting. Here, we developed a simple method to determine allelic hydroxymethylation status and identified novel sequence- and parent-of-origin-dependent allele-specific hydroxymethylation sites. Correlation analyses of TF binding sequences and methylation or hydroxymethylation between three mouse strains revealed the involvement of Pax5 in strain-specific methylation and hydroxymethylation in exon 7 of Pdgfrb.
Project description:BACKGROUND:The expression of genes involved in regulating adipogenesis and lipid metabolism may affect economically important fatness traits in pigs. Allele-specific expression (ASE) reflects imbalance between allelic transcript levels and can be used to identify underlying cis-regulatory elements. ASE has not yet been intensively studied in pigs. The aim of this investigation was to analyze the differential allelic expression of four genes, PPARA, PPARG, SREBF1, and PPARGC1A, which are involved in the regulation of fat deposition in porcine subcutaneous and visceral fat and longissimus dorsi muscle. RESULTS:Quantification of allelic proportions by pyrosequencing revealed that both alleles of PPARG and SREBF1 are expressed at similar levels. PPARGC1A showed the greatest ASE imbalance in fat deposits in Polish Large White (PLW), Polish Landrace and Pietrain pigs; and PPARA in PLW pigs. Significant deviations of mean PPARGC1A allelic transcript ratio between cDNA and genomic DNA were detected in all tissues, with the most pronounced difference (p < 0.001) in visceral fat of PLW pigs. To search for potential cis-regulatory elements affecting ASE in the PPARGC1A gene we analyzed the effects of four SNPs (rs337351686, rs340650517, rs336405906 and rs345224049) in the promoter region, but none were associated with ASE in the breeds studied. DNA methylation analysis revealed significant CpG methylation differences between samples showing balanced (allelic transcript ratio ≈1) and imbalanced allelic expression for CpG site at the genomic position in chromosome 8 (SSC8): 18527678 in visceral fat (p = 0.017) and two CpG sites (SSC8:18525215, p = 0.030; SSC8:18525237, p = 0.031) in subcutaneous fat. CONCLUSIONS:Our analysis of differential allelic expression suggests that PPARGC1A is subjected to cis-regulation in porcine fat tissues. Further studies are necessary to identify other regulatory elements localized outside the PPARGC1A proximal promoter region.
Project description:Allele-specific expression (ASE) assays can be used to identify cis, trans, and cis-by-trans regulatory variation. Understanding the source of expression variation has important implications for disease susceptibility, phenotypic diversity, and adaptation. While ASE is commonly measured via relative fluorescence at a SNP, next generation sequencing provides an opportunity to measure ASE in an accurate and high-throughput manner using read counts.We introduce a Solexa-based method to perform large numbers of ASE assays using only a single lane of a Solexa flowcell. In brief, transcripts of interest, which contain a known SNP, are PCR enriched and barcoded to enable multiplexing. Then high-throughput sequencing is used to estimate allele-specific expression using sequencing counts. To validate this method, we measured the allelic bias in a dilution series and found high correlations between measured and expected values (r>0.9, p < 0.001). We applied this method to a set of 5 genes in a Drosophila simulans parental mix, F1 and introgression and found that for these genes the majority of expression divergence can be explained by cis-regulatory variation.We present a new method with the capacity to measure ASE for large numbers of assays using as little as one lane of a Solexa flowcell. This will be a valuable technique for molecular and population genetic studies, as well as for verification of genome-wide data sets.
Project description:Large-scale screening studies carried out to date for genetic variants that affect gene regulation are generally limited to descriptions of differences in allele-specific expression (ASE) detected in vivo. Allele-specific differences in gene expression provide evidence for a model whereby cis-acting genetic variation results in differential expression between alleles. Such gene surveys for regulatory variation are a first step in identifying the specific nucleotide changes that govern gene expression differences, but they leave the underlying mechanisms unexplored. Here, we propose a quantitative genetics approach to perform a genome-wide analysis of ASE differences (GASED). The GASED approach is based on a diallel design that is often used in plant breeding programs to estimate general combining abilities (GCA) of specific inbred lines and to identify high-yielding hybrid combinations of parents based on their specific combining abilities (SCAs). In a context of gene expression, the values of GCA and SCA parameters allow cis- and trans-regulatory changes to be distinguished and imbalances in gene expression to be ascribed to cis-regulatory variation. With this approach, a total of 715 genes could be identified that are likely to carry allelic polymorphisms responsible for at least a 1.5-fold allelic expression difference in a total of 10 diploid Arabidopsis thaliana hybrids. The major strength of the GASED approach, compared to other ASE detection methods, is that it is not restricted to genes with allelic transcript variants. Although a false-positive rate of 9/41 was observed, the GASED approach is a valuable pre-screening method that can accelerate systematic surveys of naturally occurring cis-regulatory variation among inbred lines for laboratory species, such as Arabidopsis, mouse, rat and fruitfly, and economically important crop species, such as corn.
Project description:Genomic imprinting is an important epigenetic process that silences one of the parentally-inherited alleles of a gene and thereby exhibits allelic-specific expression (ASE). Detection of human imprinting events is hampered by the infeasibility of the reciprocal mating system in humans and the removal of ASE events arising from non-imprinting factors. Here, we describe a pipeline with the pattern of reciprocal allele descendants (RADs) through genotyping and transcriptome sequencing data across independent parent-offspring trios to discriminate between varied types of ASE (e.g., imprinting, genetic variation-dependent ASE, and random monoallelic expression (RME)). We show that the vast majority of ASE events are due to sequence-dependent genetic variant, which are evolutionarily conserved and may themselves play a cis-regulatory role. Particularly, 74% of non-RAD ASE events, even though they exhibit ASE biases toward the same parentally-inherited allele across different individuals, are derived from genetic variation but not imprinting. We further show that the RME effect may affect the effectiveness of the population-based method for detecting imprinting events and our pipeline can help to distinguish between these two ASE types. Taken together, this study provides a good indicator for categorization of different types of ASE, opening up this widespread and complex mechanism for comprehensive characterization.
Project description:Motivation:Mapping bias causes preferential alignment to the reference allele, forming a major obstacle in allele-specific expression (ASE) analysis. The existing methods, such as simulation and SNP-aware alignment, are either inaccurate or relatively slow. To fast and accurately count allelic reads for ASE analysis, we developed a novel approach, ASElux, which utilizes the personal SNP information and counts allelic reads directly from unmapped RNA-sequence (RNA-seq) data. ASElux significantly reduces runtime by disregarding reads outside single nucleotide polymorphisms (SNPs) during the alignment. Results:When compared to other tools on simulated and experimental data, ASElux achieves a higher accuracy on ASE estimation than non-SNP-aware aligners and requires a much shorter time than the benchmark SNP-aware aligner, GSNAP with just a slight loss in performance. ASElux can process 40 million read-pairs from an RNA-sequence (RNA-seq) sample and count allelic reads within 10?min, which is comparable to directly counting the allelic reads from alignments based on other tools. Furthermore, processing an RNA-seq sample using ASElux in conjunction with a general aligner, such as STAR, is more accurate and still ?4× faster than STAR?+?WASP, and ?33× faster than the lead SNP-aware aligner, GSNAP, making ASElux ideal for ASE analysis of large-scale transcriptomic studies. We applied ASElux to 273 lung RNA-seq samples from GTEx and identified a splice-QTL rs11078928 in lung which explains the mechanism underlying an asthma GWAS SNP rs11078927. Thus, our analysis demonstrated ASE as a highly powerful complementary tool to cis-expression quantitative trait locus (eQTL) analysis. Availability and implementation:The software can be downloaded from https://github.com/abl0719/ASElux. Contact:email@example.com or firstname.lastname@example.org. Supplementary information:Supplementary data are available at Bioinformatics online.