High quality methylome-wide investigations through next-generation sequencing of DNA from a single archived dry blood spot.
ABSTRACT: The potential importance of DNA methylation in the etiology of complex diseases has led to interest in the development of methylome-wide association studies (MWAS) aimed at interrogating all methylation sites in the human genome. When using blood as biomaterial for a MWAS the DNA is typically extracted directly from fresh or frozen whole blood that was collected via venous puncture. However, DNA extracted from dry blood spots may also be an alternative starting material. In the present study, we apply a methyl-CpG binding domain (MBD) protein enrichment-based technique in combination with next generation sequencing (MBD-seq) to assess the methylation status of the ~27 million CpGs in the human autosomal reference genome. We investigate eight methylomes using DNA from blood spots. This data are compared with 1,500 methylomes previously assayed with the same MBD-seq approach using DNA from whole blood. When investigating the sequence quality and the enrichment profile across biological features, we find that DNA extracted from blood spots gives comparable results with DNA extracted from whole blood. Only if the amount of starting material is ? 0.5µg DNA we observe a slight decrease in the assay performance. In conclusion, we show that high quality methylome-wide investigations using MBD-seq can be conducted in DNA extracted from archived dry blood spots without sacrificing quality and without bias in enrichment profile as long as the amount of starting material is sufficient. In general, the amount of DNA extracted from a single blood spot is sufficient for methylome-wide investigations with the MBD-seq approach.
Project description:Mutated CpG sites (CpG-SNPs) are potential hotspots for human diseases because in addition to the sequence variation they may show individual differences in DNA methylation. We performed methylome-wide association studies (MWAS) to test whether methylation differences at those sites were associated with schizophrenia. We assayed all common CpG-SNPs with methyl-CpG binding domain protein-enriched genome sequencing (MBD-seq) using DNA extracted from 1408 blood samples and 66 postmortem brain samples (BA10) of schizophrenia cases and controls. Seven CpG-SNPs passed our FDR threshold of 0.1 in the blood MWAS. Of the CpG-SNPs methylated in brain, 94% were also methylated in blood. This significantly exceeded the 46.2% overlap expected by chance (P-value < 1.0×10(-8)) and justified replicating findings from blood in brain tissue. CpG-SNP rs3796293 in IL1RAP replicated (P-value = .003) with the same direction of effects. This site was further validated through targeted bisulfite pyrosequencing in 736 independent case-control blood samples (P-value < 9.5×10(-4)). Our top result in the brain MWAS (P-value = 8.8×10(-7)) was CpG-SNP rs16872141 located in the potential promoter of ENC1. Overall, our results suggested that CpG-SNP methylation may reflect effects of environmental insults and can provide biomarkers in blood that could potentially improve disease management.
Project description:We recently showed that, after optimization, our methyl-CpG binding domain sequencing (MBD-seq) application approximates the methylome-wide coverage obtained with whole-genome bisulfite sequencing (WGB-seq), but at a cost that enables adequately powered large-scale association studies. A prior drawback of MBD-seq is the relatively large amount of genomic DNA (ideally >1 µg) required to obtain high-quality data. Biomaterials are typically expensive to collect, provide a finite amount of DNA, and may simply not yield sufficient starting material. The ability to use low amounts of DNA will increase the breadth and number of studies that can be conducted. Therefore, we further optimized the enrichment step. With this low starting material protocol, MBD-seq performed equally well, or better, than the protocol requiring ample starting material (>1 µg). Using only 15 ng of DNA as input, there is minimal loss in data quality, achieving 93% of the coverage of WGB-seq (with standard amounts of input DNA) at similar false/positive rates. Furthermore, across a large number of genomic features, the MBD-seq methylation profiles closely tracked those observed for WGB-seq with even slightly larger effect sizes. This suggests that MBD-seq provides similar information about the methylome and classifies methylation status somewhat more accurately. Performance decreases with <15 ng DNA as starting material but, even with as little as 5 ng, MBD-seq still achieves 90% of the coverage of WGB-seq with comparable genome-wide methylation profiles. Thus, the proposed protocol is an attractive option for adequately powered and cost-effective methylome-wide investigations using (very) low amounts of DNA.
Project description:We studied the use of methyl-CpG binding domain (MBD) protein-enriched genome sequencing (MBD-seq) as a cost-effective screening tool for methylome-wide association studies (MWAS).Because MBD-seq has not yet been applied on a large scale, we first developed and tested a pipeline for data processing using 1500 schizophrenia cases and controls plus 75 technical replicates with an average of 68 million reads per sample. This involved the use of technical replicates to optimize quality control for multi- and duplicate-reads, an in silico experiment to identify CpGs in loci with alignment problems, CpG coverage calculations based on multiparametric estimates of the fragment size distribution, a two-stage adaptive algorithm to combine data from correlated adjacent CpG sites, principal component analyses to control for confounders and new software tailored to handle the large data set.We replicated MWAS findings in independent samples using a different technology that provided single base resolution. In an MWAS of age-related methylation changes, one of our top findings was a previously reported robust association involving GRIA2. Our results also suggested that owing to the many confounding effects, a considerable challenge in MWAS is to identify those effects that are informative about disease processes.This study showed the potential of MBD-seq as a cost-effective tool in large-scale disease studies.
Project description:Methyl-binding domain (MBD) enrichment followed by deep sequencing (MBD-seq), is a robust and cost efficient approach for methylome-wide association studies (MWAS). MBD-seq has been demonstrated to be capable of identifying differentially methylated regions, detecting previously reported robust associations and producing findings that replicate with other technologies such as targeted pyrosequencing of bisulfite converted DNA. There are several kits commercially available that can be used for MBD enrichment. Our previous work has involved MethylMiner (Life Technologies, Foster City, CA, USA) that we chose after careful investigation of its properties. However, in a recent evaluation of five commercially available MBD-enrichment kits the performance of the MethylMiner was deemed poor. Given our positive experience with MethylMiner, we were surprised by this report. In an attempt to reproduce these findings we here have performed a direct comparison of MethylMiner with MethylCap (Diagenode Inc, Denville, NJ, USA), the best performing kit in that study. We find that both MethylMiner and MethylCap are two well performing MBD-enrichment kits. However, MethylMiner shows somewhat better enrichment efficiency and lower levels of background "noise". In addition, for the purpose of MWAS where we want to investigate the majority of CpGs, we find MethylMiner to be superior as it allows tailoring the enrichment to the regions where most CpGs are located. Using targeted bisulfite sequencing we confirmed that sites where methylation was detected by either MethylMiner or by MethylCap indeed were methylated.
Project description:In methylome-wide association studies (MWAS) there are many possible differences between cases and controls (e.g. related to life style, diet, and medication use) that may affect the methylome and produce false positive findings. An effective approach to control for these confounders is to first capture the major sources of variation in the methylation data and then regress out these components in the association analyses. This approach is, however, computationally very challenging due to the extremely large number of methylation sites in the human genome.We introduce MethylPCA that is specifically designed to control for potential confounders in studies where the number of methylation sites is extremely large. MethylPCA offers a complete and flexible data analysis including 1) an adaptive method that performs data reduction prior to PCA by empirically combining methylation data of neighboring sites, 2) an efficient algorithm that performs a principal component analysis (PCA) on the ultra high-dimensional data matrix, and 3) association tests. To accomplish this MethylPCA allows for parallel execution of tasks, uses C++ for CPU and I/O intensive calculations, and stores intermediate results to avoid computing the same statistics multiple times or keeping results in memory. Through simulations and an analysis of a real whole methylome MBD-seq study of 1,500 subjects we show that MethylPCA effectively controls for potential confounders.MethylPCA provides users a convenient tool to perform MWAS. The software effectively handles the challenge in memory and speed to perform tasks that would be impossible to accomplish using existing software when millions of sites are interrogated with the sample sizes required for MWAS.
Project description:Organisms and cells, in response to environmental influences or during development, undergo considerable changes in DNA methylation on a genome-wide scale, which are linked to a variety of biological processes. Using MethylC-seq to decipher DNA methylome at single-base resolution is prohibitively costly. In this study, we develop a novel approach, named MBRidge, to detect the methylation levels of repertoire CpGs, by innovatively introducing C-hydroxylmethylated adapters and bisulfate treatment into the MeDIP-seq protocol and employing ridge regression in data analysis. A systematic evaluation of DNA methylome in a human ovarian cell line T29 showed that MBRidge achieved high correlation (R > 0.90) with much less cost (?10%) in comparison with MethylC-seq. We further applied MBRidge to profiling DNA methylome in T29H, an oncogenic counterpart of T29's. By comparing methylomes of T29H and T29, we identified 131790 differential methylation regions (DMRs), which are mainly enriched in carcinogenesis-related pathways. These are substantially different from 7567 DMRs that were obtained by RRBS and related with cell development or differentiation. The integrated analysis of DMRs in the promoter and expression of DMR-corresponding genes revealed that DNA methylation enforced reverse regulation of gene expression, depending on the distance from the proximal DMR to transcription starting sites in both mRNA and lncRNA. Taken together, our results demonstrate that MBRidge is an efficient and cost-effective method that can be widely applied to profiling DNA methylomes.
Project description:Methyl-CpG binding domain protein sequencing (MBD-seq) is widely used to survey DNA methylation patterns. However, the optimal experimental parameters for MBD-seq remain unclear and the data analysis remains challenging. In this study, we generated high depth MBD-seq data in MCF-7 cell and developed a bi-asymmetric-Laplace model (BALM) to perform data analysis. We found that optimal efficiency of MBD-seq experiments was achieved by sequencing ?100 million unique mapped tags from a combination of 500 mM and 1000 mM salt concentration elution in MCF-7 cells. Clonal bisulfite sequencing results showed that the methylation status of each CpG dinucleotides in the tested regions was accurately detected with high resolution using the proposed model. These results demonstrated the combination of MBD-seq and BALM could serve as a useful tool to investigate DNA methylome due to its low cost, high specificity, efficiency and resolution.
Project description:DNA methylation is a frequently studied epigenetic modification due to its role in regulating gene expression and hence in biological processes and in determining phenotypic plasticity in organisms. Rudimentary DNA methylation patterns for some livestock species are publically available: among these, goat methylome deserves to be further explored.Genome-wide DNA methylation maps of the hypothalamus and ovary from Saanen goats were generated using Methyl-CpG binding domain protein sequencing (MBD-seq). Analysis of DNA methylation patterns indicate that the majority of methylation peaks found within genes are located gene body regions, for both organs. Analysis of the distribution of methylated sites per chromosome showed that chromosome X had the lowest number of methylation peaks. The X chromosome has one of the highest percentages of methylated CpG islands in both organs, and approximately 50% of the CpG islands in the goat epigenome are methylated in hypothalamus and ovary. Organ-specific Differentially Methylated Genes (DMGs) were correlated with the expression levels.The comparison between transcriptome and methylome in hypothalamus and ovary showed that a higher level of methylation is not accompanied by a higher gene suppression. The genome-wide DNA methylation map for two goat organs produced here is a valuable starting point for studying the involvement of epigenetic modifications in regulating goat reproduction performance.
Project description:High throughput bisulfite sequencing (BS-seq) is an important technology to generate single-base DNA methylomes in both plants and animals. In order to accelerate the data analysis of BS-seq data, toolkits for visualization are required.ViewBS, an open-source toolkit, can extract and visualize the DNA methylome data easily and with flexibility. By using Tabix, ViewBS can visualize BS-seq for large datasets quickly. ViewBS can generate publication-quality figures, such as meta-plots, heat maps and violin-boxplots, which can help users to answer biological questions. We illustrate its application using BS-seq data from Arabidopsis thaliana.ViewBS is freely available at: https://github.com/xie186/ViewBS.firstname.lastname@example.org.Supplementary data are available at Bioinformatics online.