Project description:DNA methylation is being increasingly recognized to play a role in regulation of hepatitis B virus (HBV) gene expression. The aim of this study was to compare the CpG island distribution among different HBV genotypes. We analyzed 176 full-length HBV genomic sequences obtained from the GenBank database, belonging to genotypes A through J, to identify the CpG islands in the HBV genomes. Our results showed that while 79 out of 176 sequences contained three conventional CpG islands (I-III) as previously described, 83 HBV sequences harbored only two of the three known islands. Novel CpG islands were identified in the remaining 14 HBV isolates and named as CpG island IV, V, and VI. Among the eight known HBV genotypes and two putative genotypes, while HBV genomes containing three CpG islands were predominant in genotypes A, B, D, E, and I; genotypes C, F, G, and H tended to contain only two CpG islands (II and III). In conclusion, the CpG islands, which are potential targets for DNA methylation mediated by the host functions, differ among HBV genotypes, and these genotype-specific differences in CpG island distribution could provide new insights into the understanding of epigenetic regulation of HBV gene expression and hepatitis B disease outcome.
Project description:Hypomethylated, CpG-rich DNA segments (CpG islands, CGIs) are epigenome markers involved in key biological processes. Aberrant methylation is implicated in the appearance of several disorders as cancer, immunodeficiency, or centromere instability. Furthermore, methylation differences at promoter regions between human and chimpanzee strongly associate with genes involved in neurological/psychological disorders and cancers. Therefore, the evolutionary comparative analyses of CGIs can provide insights on the functional role of these epigenome markers in both health and disease. Given the lack of specific tools, we developed CpGislandEVO. Briefly, we first compile a database of statistically significant CGIs for the best assembled mammalian genome sequences available to date. Second, by means of a coupled browser front-end, we focus on the CGIs overlapping orthologous genes extracted from OrthoDB, thus ensuring the comparison between CGIs located on truly homologous genome segments. This allows comparing the main compositional features between homologous CGIs. Finally, to facilitate nucleotide comparisons, we lifted genome coordinates between assemblies from different species, which enables the analysis of sequence divergence by direct count of nucleotide substitutions and indels occurring between homologous CGIs. The resulting CpGislandEVO database, linking together CGIs and single-cytosine DNA methylation data from several mammalian species, is freely available at our website.
Project description:We describe an analysis of the CpG islands (CGIs) of the pig. We have used both database survey and a porcine genomic library that is enriched for CGIs. Approximately half of 41 pig genomic database sequences had CGIs with an average G + C content of 65.3%, an average CpG observed/expected frequency of 0.85, and an average size of 978 bp. Of 27 CGI library clones, 16 were nonrepetitive, nonribosomal DNA and CGI-like. CGI library clones had similar average values for G + C and CpG frequency to CGIs of database genes, and an average size of 670 bp, as MseI cuts within some islands. Library clones were also shown to be low copy number and unmethylated in genomic DNA. The presence in the library of seven previously known CGI sequences was confirmed as was the absence of one nonisland sequence. The CGI library exhibits an R-band pattern for many chromosomes in FISH analysis. The pig chromosome arms that show the most dense CGI population are homologous to segments of human chromosomes that are known to be gene rich.
Project description:: The information about mtDNA methylation is still limited, thus epigenetic modification remains unclear. The lack of comprehensive information on the comparative epigenomics of mtDNA prompts comprehensive investigations of the epigenomic modification of mtDNA in different species. This is the first study in which the theoretical CpG localization in the mtDNA reference sequences from various species (12) was compared. The aim of the study was to determine the localization of CpG sites and islands in mtDNA of model organisms and to compare their distribution. The results are suitable for further investigations of mtDNA methylation. The analysis involved both strands of mtDNA sequences of animal model organisms representing different taxonomic groups of invertebrates and vertebrates. For each sequence, such parameters as the number, length, and localization of CpG islands were determined with the use of EMBOSS (European Molecular Biology Open Software Suite) software. The number of CpG sites for each sequence was indicated using the newcpgseek algorithm. The results showed that methylation of mtDNA in the analysed species involved mitochondrial gene expression. Our analyses showed that the CpG sites were commonly present in genomic regions including the D-loop, CYTB, ND6, ND5, ND4, ND3, ND2, ND1, COX3, COX2, COX1, ATP6, 16s rRNA, and 12s rRNA. The CpG distribution in animals from different species was diversified. Generally, the number of observed CpG sites of the mitochondrial genome was higher in the vertebrates than in the invertebrates. However, there was no relationship between the frequency of the CpG sites in the mitochondrial genome and the complexity of the analysed organisms. Interestingly, the distribution of the CpG sites for tRNA coding genes was usually cumulated in a larger CpG region in vertebrates. This paper may be a starting point for further research, since the collected information indicates possible methylation regions localized in mtDNA among different species including invertebrates and vertebrates.
Project description:ASC-G4 is an algorithm for the calculation of the advanced structural characteristics of G-quadruplexes (G4). It allows the unambiguous determination of the intramolecular G4 topology, based on the oriented strand numbering. It also resolves the ambiguity in the determination of the guanine glycosidic configuration. With this algorithm, we showed that the use of the C3' or C5' atoms to calculate the groove width in G4 is more appropriate than the P atoms and that the groove width does not always reflect the space available within the groove. For the latter, the minimum groove width is more appropriate. The application of ASC-G4 to 207 G4 structures guided the choices made for the calculations. A website based on ASC-G4 (http://tiny.cc/ASC-G4) was created, where the user uploads his G4 structure and gets its topology, the types of its loops and their lengths, the presence of snapbacks and bulges, the distribution of guanines in the tetrads and strands, the glycosidic configuration of these guanines, their rise, the groove widths, the minimum groove widths, the tilt and twist angles, the backbone dihedral angles, etc. It also provides a large number of atom-atom and atom-plane distances that are relevant to evaluating the quality of the structure.
Project description:BackgroundMammalian CpG islands (CGIs) normally escape DNA methylation in all adult tissues and developmental stages. However, in our previous study we unexpectedly identified many methylated CGIs in human peripheral blood leukocytes. Methylated CpG dinucleotides convert to TpG dinucleotides through deaminization of their cytosine bases more frequently than hypomethylated CpG dinucleotides. Therefore, we wondered how methylated CGIs in germline or non-germline cells maintain their CpG-rich sequences. It is known that events such as germline hypomethylation, CpG selection, biased gene conversion (BGC), and frequent CpG fixation can contribute to the maintenance of CpG-rich sequences in methylated CGIs in germline or non-germline cells. However, it has not been investigated which of the processes maintain CpG-rich sequences of methylated CGIs in each genomic position.ResultsIn this study, we comprehensively examined the contribution of the processes described above to the maintenance of CpG-rich sequences in methylated CGIs in germline and non-germline cells which were classified by genomic positions. Approximately 60-80% of CGIs with high methylation in H1 cell line (H1-HM) in all the genomic positions showed a low average CpG→TpG/CpA substitution rate. In contrast, fewer than half the numbers of CGIs with H1-HM in all the genomic positions showed a low average CpG→TpG/CpA substitution rate and low levels of methylation in sperm cells (SPM-LM). Furthermore, a small fraction of CGIs with a low average CpG→TpG/CpA substitution rate and high levels of methylation in sperm cells (SPM-HM) showed CpG selection. On the other hand, independent of the positions in genes, most CGIs with SPM-HM showed a slightly higher average TpG/CpA→CpG substitution rate compared with those with SPM-LM.ConclusionsRelatively high numbers (approximately 60-80%) of CGIs with H1-HM in all the genomic positions preserve their CpG-rich sequences by a low CpG→TpG/CpA substitution rate caused mainly by their SPM-LM, and for those with SPM-HM partly by CpG selection and TpG/CpA→CpG fixation. BGC has little contribution to the maintenance of CpG-rich sequences of CGIs with SPM-HM which were classified by genomic positions.
Project description:UnlabelledDNA methylation, the highly studied epigenetic mechanism which is involved in the regulatory events of various cellular processes like chromatin structure modifications, chromosomal inactivation, gene expressional patterns, embriyonic developments and transcriptional modification etc. Various high throughput techniques evolved for direct detection of methylation actions as well as information across the entire region. However, despite high throughput technological advances in experimental field, the development of software tools that has been dedicated to the prediction of epigenetic information from specific genome sequences is warranted. To this end we developed a tissue specific classifier MethFinder based on the frequency of novel sequence patterns across nine human tissues that was capable of discriminating methylation prone and methylation resistant CpG islands with an overall accuracy of 93%.AvailabilityMethFinder is freely available at www.rgcb.res.in/methfinder.
Project description:CpG islands (CGIs) are prominent in the mammalian genome owing to their GC-rich base composition and high density of CpG dinucleotides. Most human gene promoters are embedded within CGIs that lack DNA methylation and coincide with sites of histone H3 lysine 4 trimethylation (H3K4me3), irrespective of transcriptional activity. In spite of these intriguing correlations, the functional significance of non-methylated CGI sequences with respect to chromatin structure and transcription is unknown. By performing a search for proteins that are common to all CGIs, here we show high enrichment for Cfp1, which selectively binds to non-methylated CpGs in vitro. Chromatin immunoprecipitation of a mono-allelically methylated CGI confirmed that Cfp1 specifically associates with non-methylated CpG sites in vivo. High throughput sequencing of Cfp1-bound chromatin identified a notable concordance with non-methylated CGIs and sites of H3K4me3 in the mouse brain. Levels of H3K4me3 at CGIs were markedly reduced in Cfp1-depleted cells, consistent with the finding that Cfp1 associates with the H3K4 methyltransferase Setd1 (refs 7, 8). To test whether non-methylated CpG-dense sequences are sufficient to establish domains of H3K4me3, we analysed artificial CpG clusters that were integrated into the mouse genome. Despite the absence of promoters, the insertions recruited Cfp1 and created new peaks of H3K4me3. The data indicate that a primary function of non-methylated CGIs is to genetically influence the local chromatin modification state by interaction with Cfp1 and perhaps other CpG-binding proteins.
Project description:BackgroundRegions with abundant GC nucleotides, a high CpG number, and a length greater than 200 bp in a genome are often referred to as CpG islands. These islands are usually located in the 5' end of genes. Recently, several algorithms for the prediction of CpG islands have been proposed.Methodology/principal findingsWe propose here a new method called CPSORL to predict CpG islands, which consists of a complement particle swarm optimization algorithm combined with reinforcement learning to predict CpG islands more reliably. Several CpG island prediction tools equipped with the sliding window technique have been developed previously. However, the quality of the results seems to rely too much on the choices that are made for the window sizes, and thus these methods leave room for improvement.Conclusions/significanceExperimental results indicate that CPSORL provides results of a higher sensitivity and a higher correlation coefficient in all selected experimental contigs than the other methods it was compared to (CpGIS, CpGcluster, CpGProd and CpGPlot). A higher number of CpG islands were identified in chromosomes 21 and 22 of the human genome than with the other methods from the literature. CPSORL also achieved the highest coverage rate (3.4%). CPSORL is an application for identifying promoter and TSS regions associated with CpG islands in entire human genomic. When compared to CpGcluster, the islands predicted by CPSORL covered a larger region in the TSS (12.2%) and promoter (26.1%) region. If Alu sequences are considered, the islands predicted by CPSORL (Alu) covered a larger TSS (40.5%) and promoter (67.8%) region than CpGIS. Furthermore, CPSORL was used to verify that the average methylation density was 5.33% for CpG islands in the entire human genome.
Project description:The DNA of most vertebrates is depleted in CpG dinucleotide: a C followed by a G in the 5' to 3' direction. CpGs are the target for DNA methylation, a chemical modification of cytosine (C) heritable during cell division and the most well-characterized epigenetic mechanism. The remaining CpGs tend to cluster in regions referred to as CpG islands (CGI). Knowing CGI locations is important because they mark functionally relevant epigenetic loci in development and disease. For various mammals, including human, a readily available and widely used list of CGI is available from the UCSC Genome Browser. This list was derived using algorithms that search for regions satisfying a definition of CGI proposed by Gardiner-Garden and Frommer more than 20 years ago. Recent findings, enabled by advances in technology that permit direct measurement of epigenetic endpoints at a whole-genome scale, motivate the need to adapt the current CGI definition. In this paper, we propose a procedure, guided by hidden Markov models, that permits an extensible approach to detecting CGI. The main advantage of our approach over others is that it summarizes the evidence for CGI status as probability scores. This provides flexibility in the definition of a CGI and facilitates the creation of CGI lists for other species. The utility of this approach is demonstrated by generating the first CGI lists for invertebrates, and the fact that we can create CGI lists that substantially increases overlap with recently discovered epigenetic marks. A CGI list and the probability scores, as a function of genome location, for each species are available at http://www.rafalab.org.