Deciphering a TB-related DNA methylation biomarker and constructing a TB diagnostic classifier.
ABSTRACT: We systemically identified tuberculosis (TB)-related DNA methylation biomarkers and further constructed classifiers for TB diagnosis. TB-related DNA methylation datasets were searched through October 3, 2020. Limma and DMRcate were employed to identify differentially methylated probes (DMPs) and regions (DMRs). Machine learning methods were used to construct classifiers. The performance of the classifiers was evaluated in discovery datasets and a prospective independent cohort. Eighty-nine DMPs and 24 DMRs were identified based on 67 TB patients and 45 healthy controls from 4 datasets. Nine and three DMRs were selected by elastic net regression and logistic regression, respectively. Among the selected DMRs, two regions (chr3: 195635643-195636243 and chr6: 29691631-29692475) were differentially methylated in the independent cohort (p = 4.19 × 10-5 and 0.024, respectively). Among the ten classifiers, the 3-DMR logistic regression classifier exhibited the strongest performance. The sensitivity, specificity, and area under the curve were, respectively, 79.1%, 84.4%, and 0.888 in the discovery datasets and 64.5%, 90.3%, and 0.838 in the independent cohort. The differential diagnostic ability of this classifier was also assessed. Collectively, these data showed that DNA methylation might be a promising TB diagnostic biomarker. The 3-DMR logistic regression classifier is a potential clinical tool for TB diagnosis, and further validation is needed.
Project description:One of the main goals of large scale methylation studies is to detect differentially methylated loci. One way is to approach this problem sitewise, i.e. to find differentially methylated positions (DMPs). However, it has been shown that methylation is regulated in longer genomic regions. So it is more desirable to identify differentially methylated regions (DMRs) instead of DMPs. The new high coverage arrays, like Illuminas 450k platform, make it possible at a reasonable cost. Few tools exist for DMR identification from this type of data, but there is no standard approach.We propose a novel method for DMR identification that detects the region boundaries according to the minimum description length (MDL) principle, essentially solving the problem of model selection. The significance of the regions is established using linear mixed models. Using both simulated and large publicly available methylation datasets, we compare seqlm performance to alternative approaches. We demonstrate that it is both more sensitive and specific than competing methods. This is achieved with minimal parameter tuning and, surprisingly, quickest running time of all the tried methods. Finally, we show that the regional differential methylation patterns identified on sparse array data are confirmed by higher resolution sequencing approaches.The methods have been implemented in R package seqlm that is available through Github: https://firstname.lastname@example.orgSupplementary data are available at Bioinformatics online.
Project description:BACKGROUND:The development of whole genome bisulfite sequencing has made it possible to identify methylation differences at single base resolution throughout an entire genome. However, a persistent challenge in DNA methylome analysis is the accurate identification of differentially methylated regions (DMRs) between samples. Sensitive and specific identification of DMRs among different conditions requires accurate and efficient algorithms, and while various tools have been developed to tackle this problem, they frequently suffer from inaccurate DMR boundary identification and high false positive rate. RESULTS:We present a novel Histogram Of MEthylation (HOME) based method that takes into account the inherent difference in the distribution of methylation levels between DMRs and non-DMRs to discriminate between the two using a Support Vector Machine. We show that generated features used by HOME are dataset-independent such that a classifier trained on, for example, a mouse methylome training set of regions of differentially accessible chromatin, can be applied to any other organism's dataset and identify accurate DMRs. We demonstrate that DMRs identified by HOME exhibit higher association with biologically relevant genes, processes, and regulatory events compared to the existing methods. Moreover, HOME provides additional functionalities lacking in most of the current DMR finders such as DMR identification in non-CG context and time series analysis. HOME is freely available at https://github.com/ListerLab/HOME . CONCLUSION:HOME produces more accurate DMRs than the current state-of-the-art methods on both simulated and biological datasets. The broad applicability of HOME to identify accurate DMRs in genomic data from any organism will have a significant impact upon expanding our knowledge of how DNA methylation dynamics affect cell development and differentiation.
Project description:Sub-Saharan African (SSA) migrants in Europe experience psychosocial stressors, such as perceived discrimination (PD). The effect of such a stressor on health could potentially be mediated via epigenetics. In this study we performed an epigenome-wide association study (EWAS) to assess the association between levels of PD with genome-wide DNA methylation profiles in SSA migrants. The Illumina 450?K DNA-methylation array was used on whole blood samples of 340 Ghanaian adults residing in three European cities from the cross-sectional Research on Obesity and Diabetes among African Migrants (RODAM) study. PD was assessed using sum scores of the Everyday Discrimination Scale (EDS). Differentially methylated positions and regions (DMPs and DMRs) were identified through linear regression analysis. Two hypo-methylated DMPs, namely cg13986138 (CYFIP1) and cg10316525(ANKRD63), were found to be associated with PD. DMR analysis identified 47 regions associated with the PD. To the best of our knowledge, this survey is the first EWAS for PD in first generation SSA migrants. We identified two DMPs associated with PD. Whether these associations underlie a consequence or causal effect within the scope of biological functionality needs additional research.
Project description:Many psychiatric disorders are characterized by a strong sex difference, but the mechanisms behind sex-bias are not fully understood. DNA methylation plays important roles in regulating gene expression, ultimately impacting sexually different characteristics of the human brain. Most previous literature focused on DNA methylation alone without considering the regulatory network and its contribution to sex-bias of psychiatric disorders. Since DNA methylation acts in a complex regulatory network to connect genetic and environmental factors with high-order brain functions, we investigated the regulatory networks associated with different DNA methylation and assessed their contribution to the risks of psychiatric disorders. We compiled data from 1408 postmortem brain samples in 3 collections to identify sex-differentially methylated positions (DMPs) and regions (DMRs). We identified and replicated thousands of DMPs and DMRs. The DMR genes were enriched in neuronal related pathways. We extended the regulatory networks related to sex-differential methylation and psychiatric disorders by integrating methylation quantitative trait loci (meQTLs), gene expression, and protein-protein interaction data. We observed significant enrichment of sex-associated genes in psychiatric disorder-associated gene sets. We prioritized 2080 genes that were sex-biased and associated with psychiatric disorders, such as NRXN1, NRXN2, NRXN3, FDE4A, and SHANK2. These genes are enriched in synapse-related pathways and signaling pathways, suggesting that sex-differential genes of these neuronal pathways may cause the sex-bias of psychiatric disorders.
Project description:DNA methylation profiling reveals important differentially methylated regions (DMRs) of the genome that are altered during development or that are perturbed by disease. To date, few programs exist for regional analysis of enriched or whole-genome bisulfate conversion sequencing data, even though such data are increasingly common. Here, we describe an open-source, optimized method for determining empirically based DMRs (eDMR) from high-throughput sequence data that is applicable to enriched whole-genome methylation profiling datasets, as well as other globally enriched epigenetic modification data.Here we show that our bimodal distribution model and weighted cost function for optimized regional methylation analysis provides accurate boundaries of regions harboring significant epigenetic modifications. Our algorithm takes the spatial distribution of CpGs into account for the enrichment assay, allowing for optimization of the definition of empirical regions for differential methylation. Combined with the dependent adjustment for regional p-value combination and DMR annotation, we provide a method that may be applied to a variety of datasets for rapid DMR analysis. Our method classifies both the directionality of DMRs and their genome-wide distribution, and we have observed that shows clinical relevance through correct stratification of two Acute Myeloid Leukemia (AML) tumor sub-types.Our weighted optimization algorithm eDMR for calling DMRs extends an established DMR R pipeline (methylKit) and provides a needed resource in epigenomics. Our method enables an accurate and scalable way of finding DMRs in high-throughput methylation sequencing experiments. eDMR is available for download at http://code.google.com/p/edmr/.
Project description:BACKGROUND:Ambient air pollution is associated with numerous adverse health outcomes, but the underlying mechanisms are not well understood; epigenetic effects including altered DNA methylation could play a role. To evaluate associations of long-term air pollution exposure with DNA methylation in blood, we conducted an epigenome-wide association study in a Korean chronic obstructive pulmonary disease cohort (N?=?100 including 60 cases) using Illumina's Infinium HumanMethylation450K Beadchip. Annual average concentrations of particulate matter ??10 ?m in diameter (PM10) and nitrogen dioxide (NO2) were estimated at participants' residential addresses using exposure prediction models. We used robust linear regression to identify differentially methylated probes (DMPs) and two different approaches, DMRcate and comb-p, to identify differentially methylated regions (DMRs). RESULTS:After multiple testing correction (false discovery rate <?0.05), there were 12 DMPs and 27 DMRs associated with PM10 and 45 DMPs and 57 DMRs related to NO2. DMP cg06992688 (OTUB2) and several DMRs were associated with both exposures. Eleven DMPs in relation to NO2 confirmed previous findings in Europeans; the remainder were novel. Methylation levels of 39 DMPs were associated with expression levels of nearby genes in a separate dataset of 3075 individuals. Enriched networks were related to outcomes associated with air pollution including cardiovascular and respiratory diseases as well as inflammatory and immune responses. CONCLUSIONS:This study provides evidence that long-term ambient air pollution exposure impacts DNA methylation. The differential methylation signals can serve as potential air pollution biomarkers. These results may help better understand the influences of ambient air pollution on human health.
Project description:<h4>Background</h4>Arsenic (As) exposure through drinking water is a global public health concern. Epigenetic dysregulation including changes in DNA methylation (DNAm), may be involved in arsenic toxicity. Epigenome-wide association studies (EWAS) of arsenic exposure have been restricted to single populations and comparison across EWAS has been limited by methodological differences. Leveraging data from epidemiological studies conducted in Chile and Bangladesh, we use a harmonized data processing and analysis pipeline and meta-analysis to combine results from four EWAS.<h4>Methods</h4>DNAm was measured among adults in Chile with and without prenatal and early-life As exposure in PBMCs and buccal cells (N = 40, 850K array) and among men in Bangladesh with high and low As exposure in PBMCs (N = 32, 850K array; N = 48, 450K array). Linear models were used to identify differentially methylated positions (DMPs) and differentially variable positions (DVPs) adjusting for age, smoking, cell type, and sex in the Chile cohort. Probes common across EWAS were meta-analyzed using METAL, and differentially methylated and variable regions (DMRs and DVRs, respectively) were identified using comb-p. KEGG pathway analysis was used to understand biological functions of DMPs and DVPs.<h4>Results</h4>In a meta-analysis restricted to PBMCs, we identified one DMP and 23 DVPs associated with arsenic exposure; including buccal cells, we identified 3 DMPs and 19 DVPs (FDR < 0.05). Using meta-analyzed results, we identified 11 DMRs and 11 DVRs in PBMC samples, and 16 DMRs and 19 DVRs in PBMC and buccal cell samples. One region annotated to LRRC27 was identified as a DMR and DVR. Arsenic-associated KEGG pathways included lysosome, autophagy, and mTOR signaling, AMPK signaling, and one carbon pool by folate.<h4>Conclusions</h4>Using a two-step process of (1) harmonized data processing and analysis and (2) meta-analysis, we leverage four DNAm datasets from two continents of individuals exposed to high levels of As prenatally and during adulthood to identify DMPs and DVPs associated with arsenic exposure. Our approach suggests that standardizing analytical pipelines can aid in identifying biological meaningful signals.
Project description:<h4>Background</h4> Prenatal exposure to essential and non-essential metals impacts birth and child health, including fetal growth and neurodevelopment. DNA methylation (DNAm) may be involved in pathways linking prenatal metal exposure and health. In the Project Viva cohort, we analyzed the extent to which metals (As, Ba, Cd, Cr, Cs, Cu, Hg, Mg, Mn, Pb, Se, and Zn) measured in maternal erythrocytes were associated with differentially methylated positions (DMPs) and regions (DMRs) in cord blood and tested if associations persisted in blood collected in mid-childhood. We measured metal concentrations in first-trimester maternal erythrocytes, and DNAm in cord blood (N = 361) and mid-childhood blood (N = 333, 6–10 years) with the Illumina HumanMethylation450 BeadChip. For each metal individually, we tested for DMPs using linear models (considered significant at FDR < 0.05), and for DMRs using comb-p (Sidak p < 0.05). Covariates included biologically relevant variables and estimated cell-type composition. We also performed sex-stratified analyses. <h4>Results</h4> Pb was associated with decreased methylation of cg20608990 (CASP8) (FDR = 0.04), and Mn was associated with increased methylation of cg02042823 (A2BP1) in cord blood (FDR = 9.73 × 10–6). Both associations remained significant but attenuated in blood DNAm collected at mid-childhood (p < 0.01). Two and nine Mn-associated DMPs were identified in male and female infants, respectively (FDR < 0.05), with two and six persisting in mid-childhood (p < 0.05). All metals except Ba and Pb were associated with ≥ 1 DMR among all infants (Sidak p < 0.05). Overlapping DMRs annotated to genes in the human leukocyte antigen (HLA) region were identified for Cr, Cs, Cu, Hg, Mg, and Mn. <h4>Conclusions</h4> Prenatal metal exposure is associated with DNAm, including DMRs annotated to genes involved in neurodevelopment. Future research is needed to determine if DNAm partially explains the relationship between prenatal metal exposures and health outcomes. <h4>Supplementary Information</h4> The online version contains supplementary material available at 10.1186/s13148-021-01198-z.
Project description:Background:DNA methylation is essential for regulating gene expression, and the changes of DNA methylation status are commonly discovered in disease. Therefore, identification of differentially methylation patterns, especially differentially methylated regions (DMRs), in two different groups is important for understanding the mechanism of complex diseases. Few tools exist for DMR identification through considering features of methylation data, but there is no comprehensive integration of the characteristics of DNA methylation data in current methods. Results:Accounting for the characteristics of methylation data, such as the correlation characteristics of neighboring CpG sites and the high heterogeneity of DNA methylation data, we propose a data-driven approach for DMR identification through evaluating the energy of single site using modified 1D Ising model. Applied to both simulated and publicly available datasets, our approach is compared with other popular methods in terms of performance. Simulated results show that our method is more sensitive than competing methods. Applied to the real data, our method can identify more common DMRs than DMRcate, ProbeLasso, and Wang's methods with a high overlapping ratio. Also, the necessity of integrating the heterogeneity and correlation characteristics in identifying DMR is shown through comparing results with only considering mean or variance signals and without considering relationship of neighboring CpG sites, respectively. Through analyzing the number of DMRs identified in real data located in different genomic regions, we find that about 90% DMRs are located in CGI which always regulates the expression of genes. It may help us understand the functional effect of DNA methylation on disease.
Project description:Several small studies have shown associations between breastfeeding and genome-wide DNA methylation (DNAm). We performed a comprehensive Epigenome-Wide Association Study (EWAS) to identify associations between breastfeeding and DNAm patterns in childhood. We analysed DNAm data from the Isle of Wight Birth Cohort at birth, 10, 18 and 26 years. The feeding method was categorized as breastfeeding duration >3 months and >6 months, and exclusive breastfeeding duration >3 months. EWASs using robust linear regression were performed to identify differentially methylated positions (DMPs) in breastfed and non-breastfed children at age 10 (false discovery rate of 5%). Differentially methylated regions (DMRs) were identified using comb-p. The persistence of significant associations was evaluated in neonates and individuals at 18 and 26 years. Two DMPs, in genes SNX25 and LINC00840, were significantly associated with breastfeeding duration >6 months at 10 years and was replicated for >3 months of exclusive breastfeeding. Additionally, a significant DMR spanning the gene FDFT1 was identified in 10-year-old children who were exposed to a breastfeeding duration >3 months. None of these signals persisted to 18 or 26 years. This study lends further support for a suggestive role of DNAm in the known benefits of breastfeeding on a child's future health.