CpGislandEVO: a database and genome browser for comparative evolutionary genomics of CpG islands.
ABSTRACT: Hypomethylated, CpG-rich DNA segments (CpG islands, CGIs) are epigenome markers involved in key biological processes. Aberrant methylation is implicated in the appearance of several disorders as cancer, immunodeficiency, or centromere instability. Furthermore, methylation differences at promoter regions between human and chimpanzee strongly associate with genes involved in neurological/psychological disorders and cancers. Therefore, the evolutionary comparative analyses of CGIs can provide insights on the functional role of these epigenome markers in both health and disease. Given the lack of specific tools, we developed CpGislandEVO. Briefly, we first compile a database of statistically significant CGIs for the best assembled mammalian genome sequences available to date. Second, by means of a coupled browser front-end, we focus on the CGIs overlapping orthologous genes extracted from OrthoDB, thus ensuring the comparison between CGIs located on truly homologous genome segments. This allows comparing the main compositional features between homologous CGIs. Finally, to facilitate nucleotide comparisons, we lifted genome coordinates between assemblies from different species, which enables the analysis of sequence divergence by direct count of nucleotide substitutions and indels occurring between homologous CGIs. The resulting CpGislandEVO database, linking together CGIs and single-cytosine DNA methylation data from several mammalian species, is freely available at our website.
Project description:Recent assays for individual-specific genome-wide DNA methylation profiles have enabled epigenome-wide association studies to identify specific CpG sites associated with a phenotype. Computational prediction of CpG site-specific methylation levels is critical to enable genome-wide analyses, but current approaches tackle average methylation within a locus and are often limited to specific genomic regions.We characterize genome-wide DNA methylation patterns, and show that correlation among CpG sites decays rapidly, making predictions solely based on neighboring sites challenging. We built a random forest classifier to predict methylation levels at CpG site resolution using features including neighboring CpG site methylation levels and genomic distance, co-localization with coding regions, CpG islands (CGIs), and regulatory elements from the ENCODE project. Our approach achieves 92% prediction accuracy of genome-wide methylation levels at single-CpG-site precision. The accuracy increases to 98% when restricted to CpG sites within CGIs and is robust across platform and cell-type heterogeneity. Our classifier outperforms other types of classifiers and identifies features that contribute to prediction accuracy: neighboring CpG site methylation, CGIs, co-localized DNase I hypersensitive sites, transcription factor binding sites, and histone modifications were found to be most predictive of methylation levels.Our observations of DNA methylation patterns led us to develop a classifier to predict DNA methylation levels at CpG site resolution with high accuracy. Furthermore, our method identified genomic features that interact with DNA methylation, suggesting mechanisms involved in DNA methylation modification and regulation, and linking diverse epigenetic processes.
Project description:Chromatin properties are regulated by complex networks of epigenome modifications. Currently, it is unclear how these modifications interact and if they control downstream effects such as gene expression. We employed promiscuous chromatin binding of a zinc finger fused catalytic domain of DNMT3A to introduce DNA methylation in HEK293 cells at many CpG islands (CGIs) and systematically investigated the dynamics of the introduced DNA methylation and the consequent changes of the epigenome network. We observed efficient methylation at thousands of CGIs, but it was unstable at about 90% of them, highlighting the power of genome-wide molecular processes that protect CGIs against DNA methylation. Partially stable methylation was observed at about 1000 CGIs, which showed enrichment in H3K27me3. Globally, the introduced DNA methylation strongly correlated with a decrease in gene expression indicating a direct effect. Similarly, global but transient reductions in H3K4me3 and H3K27ac were observed after DNA methylation but no changes were found for H3K9me3 and H3K36me3. Our data provide a global and time-resolved view on the network of epigenome modifications, their connections with DNA methylation and the responses triggered by artificial DNA methylation revealing a direct repressive effect of DNA methylation in CGIs on H3K4me3, histone acetylation, and gene expression.
Project description:CpG islands (CGIs) are primarily promoter-associated genomic regions and are mostly unmethylated within highly methylated mammalian genomes. The mechanisms by which CGIs are protected from de novo methylation remain elusive. Here we show that insertion of CpG-free DNA into targeted CGIs induces de novo methylation of the entire CGI in human pluripotent stem cells (PSCs). The methylation status is stably maintained even after CpG-free DNA removal, extensive passaging, and differentiation. By targeting the DNA mismatch repair gene MLH1 CGI, we could generate a PSC model of a cancer-related epimutation. Furthermore, we successfully corrected aberrant imprinting in induced PSCs derived from an Angelman syndrome patient. Our results provide insights into how CpG-free DNA induces de novo CGI methylation and broaden the application of targeted epigenome editing for a better understanding of human development and disease.
Project description:Regulatory change has long been hypothesized to drive the delineation of the human phenotype from other closely related primates. Here we provide evidence that CpG dinucleotides play a special role in this process. CpGs enable epigenome variability via DNA methylation, and this epigenetic mark functions as a regulatory mechanism. Therefore, species-specific CpGs may influence species-specific regulation. We report non-polymorphic species-specific CpG dinucleotides (termed "CpG beacons") as a distinct genomic feature associated with CpG island (CGI) evolution, human traits and disease. Using an inter-primate comparison, we identified 21 extreme CpG beacon clusters (? 20/kb peaks, empirical p < 1.0 × 10(-3)) in humans, which include associations with four monogenic developmental and neurological disease related genes (Benjamini-Hochberg corrected p = 6.03 × 10(-3)). We also demonstrate that beacon-mediated CpG density gain in CGIs correlates with reduced methylation in these species in orthologous CGIs over time, via human, chimpanzee and macaque MeDIP-seq. Therefore mapping into both the genomic and epigenomic space the identified CpG beacon clusters define points of intersection where a substantial two-way interaction between genetic sequence and epigenetic state has occurred. Taken together, our data support a model for CpG beacons to contribute to CGI evolution from genesis to tissue-specific to constitutively active CGIs.
Project description:We applied a solution hybrid selection approach to the enrichment of CpG islands (CGIs) and promoter sequences from the human genome for targeted high-throughput bisulfite sequencing. A single lane of Illumina sequences allowed accurate and quantitative analysis of ~1 million CpGs in more than 21,408 CGIs and more than 15,946 transcriptional regulatory regions. Of the CpGs analyzed, 77-84% fell on or near capture probe sequences; 69-75% fell within CGIs. More than 85% of capture probes successfully yielded quantitative DNA methylation information of targeted regions. Differentially methylated regions (DMRs) were identified in the 5'-end regulatory regions, as well as the intra- and intergenic regions, particularly in the X-chromosome among the three breast cancer cell lines analyzed. We chose 46 candidate loci (762 CpGs) for confirmation with PCR-based bisulfite sequencing and demonstrated excellent correlation between two data sets. Targeted bisulfite sequencing of three DNA methyltransferase (DNMT) knockout cell lines and the wild-type HCT116 colon cancer cell line revealed a significant decrease in CpG methylation for the DNMT1 knockout and DNMT1, 3B double knockout cell lines, but not in DNMT3B knockout cell line. We demonstrated the targeted bisulfite sequencing approach to be a powerful method to uncover novel aberrant methylation in the cancer epigenome. Since all targets were captured and sequenced as a pool through a series of single-tube reactions, this method can be easily scaled up to deal with a large number of samples.
Project description:Gametogenesis in mammals entails profound re-patterning of the epigenome. In the female germline, DNA methylation is acquired late in oogenesis from an essentially unmethylated baseline and is established largely as a consequence of transcription events. Molecular and functional studies have shown that imprinted genes become methylated at different times during oocyte growth; however, little is known about the kinetics of methylation gain genome wide and the reasons for asynchrony in methylation at imprinted loci.Given the predominant role of transcription, we sought to investigate whether transcription timing is rate limiting for de novo methylation and determines the asynchrony of methylation events. Therefore, we generated genome-wide methylation and transcriptome maps of size-selected, growing oocytes to capture the onset and progression of methylation. We find that most sequence elements, including most classes of transposable elements, acquire methylation at similar rates overall. However, methylation of CpG islands (CGIs) is delayed compared with the genome average and there are reproducible differences amongst CGIs in onset of methylation. Although more highly transcribed genes acquire methylation earlier, the major transitions in the oocyte transcriptome occur well before the de novo methylation phase, indicating that transcription is generally not rate limiting in conferring permissiveness to DNA methylation. Instead, CGI methylation timing negatively correlates with enrichment for histone 3 lysine 4 (H3K4) methylation and dependence on the H3K4 demethylases KDM1A and KDM1B, implicating chromatin remodelling as a major determinant of methylation timing. We also identified differential enrichment of transcription factor binding motifs in CGIs acquiring methylation early or late in oocyte growth. By combining these parameters into multiple regression models, we were able to account for about a fifth of the variation in methylation timing of CGIs. Finally, we show that establishment of non-CpG methylation, which is prevalent in fully grown oocytes, and methylation over non-transcribed regions, are later events in oogenesis.These results do not support a major role for transcriptional transitions in the time of onset of DNA methylation in the oocyte, but suggest a model in which sequences least dependent on chromatin remodelling are the earliest to become permissive for methylation.
Project description:CpG islands (CGIs) are dense clusters of CpG sequences that punctuate the CpG-deficient human genome and associate with many gene promoters. As CGIs also differ from bulk chromosomal DNA by their frequent lack of cytosine methylation, we devised a CGI enrichment method based on nonmethylated CpG affinity chromatography. The resulting library was sequenced to define a novel human blood CGI set that includes many that are not detected by current algorithms. Approximately half of CGIs were associated with annotated gene transcription start sites, the remainder being intra- or intergenic. Using an array representing over 17,000 CGIs, we established that 6%-8% of CGIs are methylated in genomic DNA of human blood, brain, muscle, and spleen. Inter- and intragenic CGIs are preferentially susceptible to methylation. CGIs showing tissue-specific methylation were overrepresented at numerous genetic loci that are essential for development, including HOX and PAX family members. The findings enable a comprehensive analysis of the roles played by CGI methylation in normal and diseased human tissues.
Project description:The genetic regulation of the human epigenome is not fully appreciated. Here we describe the effects of genetic variants on the DNA methylome in human lung based on methylation-quantitative trait loci (meQTL) analyses. We report 34,304 cis- and 585 trans-meQTLs, a genetic-epigenetic interaction of surprising magnitude, including a regulatory hotspot. These findings are replicated in both breast and kidney tissues and show distinct patterns: cis-meQTLs mostly localize to CpG sites outside of genes, promoters and CpG islands (CGIs), while trans-meQTLs are over-represented in promoter CGIs. meQTL SNPs are enriched in CTCF-binding sites, DNaseI hypersensitivity regions and histone marks. Importantly, four of the five established lung cancer risk loci in European ancestry are cis-meQTLs and, in aggregate, cis-meQTLs are enriched for lung cancer risk in a genome-wide analysis of 11,587 subjects. Thus, inherited genetic variation may affect lung carcinogenesis by regulating the human methylome.
Project description:The mammalian genome is punctuated by CpG islands (CGIs), which differ sharply from the bulk genome by being rich in G + C and the dinucleotide CpG. CGIs often include transcription initiation sites and display 'active' histone marks, notably histone H3 lysine 4 methylation. In embryonic stem cells (ESCs) some CGIs adopt a 'bivalent' chromatin state bearing simultaneous 'active' and 'inactive' chromatin marks. To determine whether CGI chromatin is developmentally programmed at specific genes or is imposed by shared features of CGI DNA, we integrated artificial CGI-like DNA sequences into the ESC genome. We found that bivalency is the default chromatin structure for CpG-rich, G + C-rich DNA. A high CpG density alone is not sufficient for this effect, as A + T-rich sequence settings invariably provoke de novo DNA methylation leading to loss of CGI signature chromatin. We conclude that both CpG-richness and G + C-richness are required for induction of signature chromatin structures at CGIs.
Project description:DNA methylation of regulatory and growth-related genes contributes to fetal programming which is important for maintaining the correct development of three germ layers of the embryo that develope into different tissues and organs, and which persists into adult life. In this study, a preliminary epigenetic screen was performed to define genomic regions that are involved in fetal epigenome remodeling. Embryonic ectodermic tissues (origin of nervous tissue), mesenchymal tissues (origin of connective and muscular tissues), and foregut endoderm tissues (origin of epithelial tissue), from day 28 sheep fetuses were collected and the distribution of methylated CpGs was analyzed using whole-genome bisulfite sequencing. Patterns of methylation among the three tissues showed a high level of conservation of hypo-methylated CpG islands CGIs, and a consistent level of methylation in regulatory genetic elements. Analysis of tissue specific differentially methylated regions, revealed that 20% of the total CGIs differed between tissues. A proportion of the methylome was remodeled in gene bodies, 5' UTRs and 3' UTRs (7, 11, and 11%, respectively). Genes with overlapping differentially methylated regions in gene bodies and CGIs showed a significant enrichment for tissue morphogenesis and development pathways. The data presented here provides a "reference" for the epigenetic status of genes potentially involved in the maintenance and regulation of fetal developmental during early life, a period expected to be particularly prone to epigenetic alterations induced by environmental and nutritional stressors.