Performances of Different Fragment Sizes for Reduced Representation Bisulfite Sequencing in Pigs.
ABSTRACT: BACKGROUND:Reduced representation bisulfite sequencing (RRBS) has been widely used to profile genome-scale DNA methylation in mammalian genomes. However, the applications and technical performances of RRBS with different fragment sizes have not been systematically reported in pigs, which serve as one of the important biomedical models for humans. The aims of this study were to evaluate capacities of RRBS libraries with different fragment sizes to characterize the porcine genome. RESULTS:We found that the MspI-digested segments between 40 and 220 bp harbored a high distribution peak at 74 bp, which were highly overlapped with the repetitive elements and might reduce the unique mapping alignment. The RRBS library of 110-220 bp fragment size had the highest unique mapping alignment and the lowest multiple alignment. The cost-effectiveness of the 40-110 bp, 110-220 bp and 40-220 bp fragment sizes might decrease when the dataset size was more than 70, 50 and 110 million reads for these three fragment sizes, respectively. Given a 50-million dataset size, the average sequencing depth of the detected CpG sites in the 110-220 bp fragment size appeared to be deeper than in the 40-110 bp and 40-220 bp fragment sizes, and these detected CpG sties differently located in gene- and CpG island-related regions. CONCLUSIONS:In this study, our results demonstrated that selections of fragment sizes could affect the numbers and sequencing depth of detected CpG sites as well as the cost-efficiency. No single solution of RRBS is optimal in all circumstances for investigating genome-scale DNA methylation. This work provides the useful knowledge on designing and executing RRBS for investigating the genome-wide DNA methylation in tissues from pigs.
Project description:BACKGROUND:Plasmodium ovale has two different subspecies: P. ovale curtisi and P. ovale wallikeri, which may be distinguished by the gene potra encoding P. ovale tryptophan-rich antigen. The sequence and size of potra gene was variable between the two P. ovale spp., and more fragment sizes were found compared to previous studies. Further information about the diversity of potra genes in these two P. ovale spp. will be needed. METHODS:A total of 110 dried blood samples were collected from the clinical patients infected with P. ovale, who all returned from Africa in Henan Province in 2011-2016. The fragments of potra were amplified by nested PCR. The sizes and species of potra gene were analysed after sequencing, and the difference between the isolates were analysed with the alignment of the amino acid sequences. The phylogenetic tree was constructed by neighbour-joining to determine the genetic relationship among all the isolates. The distribution of the isolates was analysed based on the origin country. RESULTS:Totally 67 samples infected with P. o. wallikeri, which included 8 genotypes of potra, while 43 samples infected with P. o. curtisi including 3 genotypes of potra. Combination with the previous studies, P. o. wallikeri had six sizes, 227, 245, 263, 281, 299 and 335 bp, and P. o. curtisi had four sizes, 299, 317, 335 and 353 bp, the fragment sizes of 299 and 335 bp were the overlaps between the two species. Six amino acid as one unit was firstly used to analyse the amino acid sequence of potra. Amino acid sequence alignment revealed that potra of P. o. wallikeri differed in two amino acid units, MANPIN and AITPIN, while potra of P. o. curtisi differed in amino acid units TINPIN and TITPIS. Combination with the previous studies, there were ten subtypes of potra exiting for P. o. wallikeri and four subtypes for P. o. curtisi. The phylogenetic tree showed that 11 isolates were divided into two clusters, P. o. wallikeri which was then divided into five sub-clusters, and P. o. curtisi which also formed two sub-clusters with their respective reference sequences. The genetic relationship of the P. ovale spp. mainly based on the number of the dominant amino acid repeats, the number of MANPIN, AITPIN, TINPIN or TITPIS. The genotype of the 245 bp size for P. o. wallikeri and that of the 299 and 317 bp size for P. o. curtisi were commonly exiting in Africa. CONCLUSION:This study further proved that more fragment sizes were found, P. o. wallikeri had six sizes, P. o. curtisi had four sizes. There were ten subtypes of potra exiting for P. o. wallikeri and four subtypes for P. o. curtisi. The genetic polymorphisms of potra provided complementary information for the gene tracing of P. ovale spp. in the malaria elimination era.
Project description:UNLABELLED: Reduced representation bisulfite sequencing (RRBS) is a cost-effective approach for genome-wide methylation pattern profiling. Analyzing RRBS sequencing data is challenging and specialized alignment/mapping programs are needed. Although such programs have been developed, a comprehensive solution that provides researchers with good quality and analyzable data is still lacking. To address this need, we have developed a Streamlined Analysis and Annotation Pipeline for RRBS data (SAAP-RRBS) that integrates read quality assessment/clean-up, alignment, methylation data extraction, annotation, reporting and visualization. This package facilitates a rapid transition from sequencing reads to a fully annotated CpG methylation report to biological interpretation. AVAILABILITY AND IMPLEMENTATION: SAAP-RRBS is freely available to non-commercial users at the web site http://ndc.mayo.edu/mayo/research/biostat/stand-alone-packages.cfm.
Project description:DNA methylation can control some CpG-poor genes but unbiased studies have not found a consistent genome-wide association with gene activity outside of CpG islands or shores possibly due to use of cell lines or limited bioinformatics analyses. We performed reduced representation bisulfite sequencing (RRBS) of rat dorsal root ganglia encompassing postmitotic primary sensory neurons (n = 5, r > 0.99; orthogonal validation p < 10(-19)). The rat genome suggested a dichotomy of genes previously reported in other mammals: low CpG content (< 3.2%) promoter (LCP) genes and high CpG content (? 3.2%) promoter (HCP) genes. A genome-wide integrated methylome-transcriptome analysis showed that LCP genes were markedly hypermethylated when repressed, and hypomethylated when active with a 40% difference in a broad region at the 5' of the transcription start site (p < 10(-87) for -6000 bp to -2000 bp, p < 10(-73) for -2000 bp to +2000 bp, no difference in gene body p = 0.42). HCP genes had minimal TSS-associated methylation regardless of transcription status, but gene body methylation appeared to be lost in repressed HCP genes. Therefore, diametrically opposite methylome-transcriptome associations characterize LCP and HCP genes in postmitotic neural tissue in vivo.
Project description:<h4>Background</h4>Protein pulldown using Methyl-CpG binding domain (MBD) proteins followed by high-throughput sequencing is a common method to determine DNA methylation. Algorithms have been developed to estimate absolute methylation level from read coverage generated by affinity enrichment-based techniques, but the most accurate one for MBD-seq data requires additional data from an SssI-treated Control experiment.<h4>Results</h4>Using our previous characterizations of Methyl-CpG/MBD2 binding in the context of an MBD pulldown experiment, we build a model of expected MBD pulldown reads as drawn from SssI-treated DNA. We use the program BayMeth to evaluate the effectiveness of this model by substituting calculated SssI Control data for the observed SssI Control data. By comparing methylation predictions against those from an RRBS data set, we find that BayMeth run with our modeled SssI Control data performs better than BayMeth run with observed SssI Control data, on both 100 bp and 10 bp windows. Adapting the model to an external data set solely by changing the average fragment length, our calculated data still informs the BayMeth program to a similar level as observed data in predicting methylation state on a pulldown data set with matching WGBS estimates.<h4>Conclusion</h4>In both internal and external MBD pulldown data sets tested in this study, BayMeth used with our modeled pulldown coverage performs better than BayMeth run without the inclusion of any estimate of SssI Control pulldown, and is comparable to - and in some cases better than - using observed SssI Control data with the BayMeth program. Thus, our MBD pulldown alignment model can improve methylation predictions without the need to perform additional control experiments.
Project description:<h4>Summary</h4>Reduced representation bisulfite sequencing (RRBS) is a powerful yet cost-efficient method for studying DNA methylation on a genomic scale. RRBS involves restriction-enzyme digestion, bisulfite conversion and size selection, resulting in DNA sequencing data that require special bioinformatic handling. Here, we describe RRBSMAP, a short-read alignment tool that is designed for handling RRBS data in a user-friendly and scalable way. RRBSMAP uses wildcard alignment, and avoids the need for any preprocessing or post-processing steps. We benchmarked RRBSMAP against a well-validated MAQ-based pipeline for RRBS read alignment and observed similar accuracy but much improved runtime performance, easier handling and better scaling to large sample sets. In summary, RRBSMAP removes bioinformatic hurdles and reduces the computational burden of large-scale epigenome association studies performed with RRBS.<h4>Availability</h4>http://rrbsmap.computational-epigenetics.org/ http://code.google.com/p/bsmap/<h4>Contact</h4>firstname.lastname@example.org<h4>Supplementary information</h4>Supplementary data are available at Bioinformatics online.
| S-EPMC3268241 | BioStudies
Project description:RRBS data of different fragment sizes in pigs
Project description:We report the generation and analysis of genome-scale DNA methylation profiles at nucleotide resolution in mammalian cells. Using high-throughput Reduced Representation Bisulfite Sequencing (RRBS) and single-molecule-based sequencing, we generated DNA methylation maps covering the vast majority of CpG islands, and a representative sampling of conserved non-coding elements, transposons and other genomic features, for murine embryonic stem (ES) cells, ES-derived and primary neural cells, and eight other primary tissues. Several key findings emerge from the data. First, DNA methylation patterns are better correlated with histone methylation patterns than with the underlying genome sequence context. Second, methylation of CpGs are dynamic epigenetic marks that undergo extensive changes during cellular differentiation, particularly in regulatory regions outside of core promoters. Third, analysis of ES-derived and primary cells reveals that 'weak' CpG islands associated with a specific set of developmentally regulated genes undergo aberrant hypermethylation during extended proliferation in vitro, in a pattern reminiscent of that reported in some primary tumors. More generally, the results establish RRBS as a powerful technology for epigenetic profiling of cell populations relevant to developmental biology, cancer and regenerative medicine. Keywords: High-throughput Reduced Representation Bisulfite Sequencing (RRBS), Illumina, cell type comparison Reduced representation bisulfite sequencing (MspI,~40-220bp size fraction) of 18 murine cell types. Raw sequence data files for this study are available for download from the SRA FTP site at ftp://ftp.ncbi.nlm.nih.gov/sra/Studies/SRP000/SRP000179
Project description:The cryIIIA gene encoding a coleopteran-specific toxin is poorly expressed in Bacillus thuringiensis when cloned in a low-copy-number plasmid. This weak expression is observed when the gene is cloned only with its promoter and its putative terminator. cryIIIA gene expression was analyzed by using deletion derivatives of a larger DNA fragment carrying the toxin gene and additional adjacent sequences. The results indicate that a 1-kb DNA fragment located 400 bp upstream of the promoter strongly enhances CryIIIA production in B. thuringiensis sporulating cells. Similar results were obtained when the low-copy-number plasmid pHT304 carrying transcriptional fusions between upstream regions of cryIIIA and the lacZ gene was used. Analysis of the start sites, the sizes, and the amounts of cryIIIA-specific mRNAs shows that the enhancement occurs at the transcriptional level by increasing the number of cryIIIA-specific transcripts from the onset of sporulation to about 6 h after the onset of sporulation. The nucleotide sequence of the 1-kb activating fragment and of the 700 bp containing the promoter region and the 5' end of cryIIIA were determined. No potential protein-coding sequences were found upstream of the promoter. The major characteristic of the 1-kb activating fragment is the presence of a 220-bp A + T-rich region.
Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Florencia Pauli mailto:email@example.com). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:firstname.lastname@example.org). This track is produced as part of the ENCODE project. The track reports the percentage of DNA molecules that exhibit cytosine methylation at specific CpG dinucleotides. In general, DNA methylation within a gene's promoter is associated with gene silencing, and DNA methylation within the exons and introns of a gene is associated with gene expression. Proper regulation of DNA methylation is essential during development and aberrant DNA methylation is a hallmark of cancer. DNA methylation status is assayed at more than 500,000 CpG dinucleotides in the genome using Reduced Representation Bisulfite Sequencing (RRBS). Genomic DNA is digested with the methyl-insensitive restriction enzyme MspI, small genomic DNA fragments are purified by gel electrophoresis, and then used to construct an Illumina sequencing library. The library fragments are treated with sodium bisulfite and amplified by PCR to convert every unmethylated cytosine to a thymidine while leaving methylated cytosines intact. The sequenced fragments are aligned to a customized reference genome sequence and for each assayed CpG we report the number of sequencing reads covering that CpG and the percentage of those reads that are methylated. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf DNA methylation at CpG sites was assayed with a modified version of Reduced Representation Bisulfite Sequencing (RRBS; Meissner et al., 2008). RRBS was performed on cell lines grown by many ENCODE production groups. The production group that grew the cells and isolated genomic DNA is indicated in the "obtainedBy" field of the metadata. When a cell type was provided by more than one lab, the data for the cells from only one lab are displayed in the table above. However, the data for every cell type from every lab is available from the Downloads page. RRBS was carried out by the Myers production group at the HudsonAlpha Institute for Biotechnology. Isolation of genomic DNA Genomic DNA is isolated from biological replicates of each cell line using the QIAGEN DNeasy Blood & Tissue Kit according to the instructions provided by the manufacturer. DNA concentrations for each genomic DNA preparation are determined using fluorescent DNA binding dye and a fluorometer (Invitrogen Quant-iT dsDNA High Sensitivity Kit and Qubit Fluorometer). Typically, 1 µg of DNA is used to make an RRBS library; however, we have also had success in making libraries with 200 ng genomic DNA from rare or precious samples. RRBS library construction and sequencing RRBS library construction starts with MspI digestion of genomic DNA , which cuts at every CCGG regardless of methylation status. Klenow exo- DNA Polymerase is then used to fill in the recessed end of the genomic DNA and add an adenosine as a 3prime overhang. Next, a methylated version of the Illumina paired-end adapters is ligated onto the DNA. Adapter ligated genomic DNA fragments between 105 and 185 basepairs are selected using agarose gel electrophoresis and Qiagen Qiaquick Gel Extraction Kit. The selected adapter-ligated fragments are treated with sodium bisulfite using the Zymo Research EZ DNA Methylation Gold Kit, which converts unmethylated cytosines to uracils and leaves methylated cytosines unchanged. Bisulfite treated DNA is amplified in a final PCR reaction which has been optimized to uniformly amplify diverse fragment sizes and sequence contexts in the same reaction. During this final PCR reaction uracils are copied as thymines resulting in a thymine in the PCR products wherever an unmethylated cytosine existed in the genomic DNA. The sample is now ready for sequencing on the Illumina sequencing platform. These libraries were sequenced with an Illumina Genome Analyzer IIx according to the manufacturer's recommendations. Data analysis To analyze the sequence data, a reference genome is created that contains only the 36 base pairs adjacent to every MspI site and every C in those sequences is changed to T. A converted sequence read file is then created by changing each C in the original sequence reads to a T. The converted sequence reads are aligned to the converted reference genome, and only reads that map uniquely to the reference genome are kept. Once reads are aligned the percent methylation is calculated for each CpG using the original sequence reads. The percent methylation and number of reads is reported for each CpG.
Project description:Existing methods to improve detection of circulating tumor DNA (ctDNA) have focused on genomic alterations but have rarely considered the biological properties of plasma cell-free DNA (cfDNA). We hypothesized that differences in fragment lengths of circulating DNA could be exploited to enhance sensitivity for detecting the presence of ctDNA and for noninvasive genomic analysis of cancer. We surveyed ctDNA fragment sizes in 344 plasma samples from 200 patients with cancer using low-pass whole-genome sequencing (0.4×). To establish the size distribution of mutant ctDNA, tumor-guided personalized deep sequencing was performed in 19 patients. We detected enrichment of ctDNA in fragment sizes between 90 and 150 bp and developed methods for in vitro and in silico size selection of these fragments. Selecting fragments between 90 and 150 bp improved detection of tumor DNA, with more than twofold median enrichment in >95% of cases and more than fourfold enrichment in >10% of cases. Analysis of size-selected cfDNA identified clinically actionable mutations and copy number alterations that were otherwise not detected. Identification of plasma samples from patients with advanced cancer was improved by predictive models integrating fragment length and copy number analysis of cfDNA, with area under the curve (AUC) >0.99 compared to AUC <0.80 without fragmentation features. Increased identification of cfDNA from patients with glioma, renal, and pancreatic cancer was achieved with AUC > 0.91 compared to AUC < 0.5 without fragmentation features. Fragment size analysis and selective sequencing of specific fragment sizes can boost ctDNA detection and could complement or provide an alternative to deeper sequencing of cfDNA.