A systematic evaluation of pattern discovery algorithms
ABSTRACT: Pattern discovery algorithms are methods for discovering recurrent, non-random motifs widely used in the analysis of biological sequences. Many algorithms exist but few comparisons have been made amongst them. We systematically profile eight representative methods at multiple parameter settings across 174 diverse experimental datasets, including ten novel ChIP-on-chip datasets. We executed 16,777 pattern discovery analyses to assess prediction accuracy, CPU usage and memory consumption. For 144 datasets we developed a gold-standard using machine-learning algorithms; cross-validation was used for the remaining datasets. Performance was highly disparate, with median accuracy ranging from 32% to 96%. Importantly we were unable to replicate previously reported algorithm-rankings, emphasizing the need to use many and diverse experimental datasets. We found deterministic algorithms like Projection and Oligo/Dyad had the highest prediction accuracy. Computational efficiency was not linearly related to dataset size and becomes critical: some algorithms are intractably slow on large datasets. This work provides the first combined assessment of the CPU, memory, and prediction accuracies of pattern discovery algorithms on real experimental datasets. HL60-Mnt-ChIP: ChIP-Chip with 10 biological replicates HL60-Trrap-ChIP: ChIP-Chip with 13 biological replicates
Project description:Hybridization of amplified genomic DNA against unamplified genomic DNA using three different amplification methods to assess the bias introduced by amplification alone. Keywords: genomic DNA Three different amplification methods were assessed and total genomic DNA hybridized against equal amounts of amplified genomic DNA for each method.
Project description:Analysis of technical variance of ChIP-on-Chip studies by characterization of Myc-binding in HL60 cells. Keywords: Chip-on-chip Fully-blocked study of technical variance in Myc-binding in HL60 cells. Two different antibodies were used to generate 6 biologically independent replicates each. Each replicate was hybridized to arrays from two different batches in both dye-swap orientations, leading to 48 total arrays.
Project description:Analysis of technical variance of ChIP-on-Chip studies by characterization of Myc-binding in HL60 cells. Keywords: Chip-on-chip Characterization of c-Myc binding in HL60 cells. Thirteen biologically independent replicates of growing HL60 cells were subjected to ChIP with an N-terminal c-Myc antibody and hybridized to CpG island microarrays
Project description:The influenza polymerase cleaves host RNAs ~10–13 nucleotides downstream of their 5′ ends and uses this capped fragment to prime viral mRNA synthesis. To better understand this process of cap snatching, we used high-throughput sequencing to determine the 5′ ends of A/WSN/33 (H1N1) influenza mRNAs. The sequences provided clear evidence for nascent-chain realignment during transcription initiation and revealed a strong influence of the viral template on the frequency of realignment. After accounting for the extra nucleotides inserted through realignment, analysis of the capped fragments indicated that the different viral mRNAs were each prepended with a common set of sequences and that the polymerase often cleaved host RNAs after a purine and often primed transcription on a single base pair to either the terminal or penultimate residue of the viral template. We also developed a bioinformatic approach to identify the targeted host transcripts despite limited information content within snatched fragments and found that small nuclear RNAs and small nucleolar RNAs contributed the most abundant capped leaders. These results provide insight into the mechanism of viral transcription initiation and reveal the diversity of the cap-snatched repertoire, showing that noncoding transcripts as well as mRNAs are used to make influenza mRNAs. Sixteen datasets were generated, corresponding to template-switching 5’-RACE libraries of 1) each viral RNA, 2) a time-course of NS1 viral RNA, or 3) CIP-TAP libraries of NS1 and PB2. Libraries contain random barcodes of length 5 (NS1_TS, PB2_TS, NS1_5R, PB2_5R) or 9 (other datasets) at the start of each read, and at least three guanosine residues for the template-switching datasets. See methods of accompanying paper for further details. Samples were sequenced on an Illumina HiSeq 2000.
Project description:We describe a refined approach to identify new human RNA-protein interactions. In vitro transcribed labeled RNA is bound to ~9,400 human recombinant proteins spotted on protein microarrays. This approach identified 137 RNA-protein interactions for 10 human coding and non-coding RNAs, including an interaction between Staufen 1 protein and TP53 mRNA that promoted the latter’s stability. RNA hybridization to protein microarrays allows rapid identification of human RNA-protein interactions on a large scale. Sense and antisense strands for 10 RNA transcripts representing protein coding RNAs TP53, HRAS, MYC, BCL2 and non-coding sequences PWRN1, SOX2OT, OCC1, IGF2RNC, lncRBM26 and DLEU1 were in vitro transcribed, labeled with Cy5 and independently hybridized on human protein microarrays. The labeling process was optimized in order to achieve ~ 3 pmol dye per every microgram RNA with average efficacy of 1 dye molecule for approximately every 850 bp RNA to minimally influence RNA native structure and at the same time yield in signal intensities that were readily visualized.
Project description:Putative RNA-protein interactions with selected snoRNAs were screened using labaled RNA hybridized to a human protein microarray snoRNAs SNORD50A and SNORD50B were in vitro transcribed, labeled with Cy5 and independently hybridized on human protein microarrays. The labeling process was optimized in order to achieve ~ 3 pmol dye per every microgram RNA while maintaining signal intensities that were readily visualized.
Project description:Despite serving as a central experimental technique in epigenetics research, chromatin immunoprecipitation (ChIP) suffers from several serious drawbacks: it is a relative measurement untethered to any external scale that obviates fair comparison amongst experiments; it employs antibody reagents that have differing affinity and specificity for target epitopes, which are in turn variable in abundance; and it is frequently not reproducible. To address these problems, we developed internal standard calibrated ChIP (ICeChIP), a method of spiking a native chromatin sample with nucleosomes reconstituted from recombinant and semisynthetic histones on barcoded DNA prior to immunoprecipitation. ICeChIP measures local histone modification densities on a biologically meaningful scale, enabling unbiased trans-experimental comparisons and revealing a correlation between the apparent symmetry of H3K4me3 in promoter nucleosomes and gene expression. Direct in situ assessment of immunoprecipitation accommodates for a number of experimental pitfalls, and provides a critical examination of untested assumptions inherent in conventional ChIP. Examination of spiked-in semi-synthetic nucleosomes in ICeChIP-seq experiments performed for HEK293, mESC E14 and DM S2 cell line
Project description:An ability to sense and respond to changes in extracellular phosphate is critical to the survival of most bacteria. For Caulobacter crescentus, which typically lives in phosphate-limited environments, this process is especially crucial. Like many bacteria, Caulobacter responds to phosphate limitation through a conserved two-component signaling pathway called PhoR-PhoB, but the direct regulon of PhoB in this organism is unknown. Here, we use ChIP-Seq to map the global binding patterns of the phosphate-responsive transcriptional regulator PhoB in both phosphate-limited and -replete conditions. Combined with genome-wide expression profiling, our work demonstrates that PhoB is induced to regulate nearly 50 genes in phosphate-starved conditions. The PhoB regulon is comprised primarily of genes known or predicted to help Caulobacter scavenge for and import inorganic phosphate, including 15 different membrane transporters. We also investigated the regulatory role of PhoU, a widely conserved protein proposed to coordinate phosphate import with expression of the PhoB regulon by directly modulating the histidine kinase PhoR. However, our studies show that it likely does not play such a role in Caulobacter as depleting PhoU has no significant effect on PhoB-dependent gene expression. Instead, cells lacking PhoU exhibit a striking accumulation of large polyphosphate granules suggesting that PhoU participates in controlling intracellular phosphate metabolism. An allele of phoB bearing a C-terminal 3x-flag tag was integrated at its native locus, and ChIP followed by deep sequencing on Illumina MiSeq was performed on samples grown in rich medium, phosphate-limited medium, and in a pstS::Tn5 mutant background in rich medium.
Project description:MNase-Seq and ChIP-Seq have evolved as popular techniques to study chromatin and histone modification. Although many tools have been developed to identify enriched regions, software tools for nucleosome positioning are still limited. We introduce a flexible and powerful open-source R package, PING 2.0, for nucleosome positioning using MNase-Seq data or MNase- or sonicated- ChIP-Seq data combined with either single-end or paired-end sequencing. PING uses a model-based approach, which enables nucleosome predictions even in the presence of low read counts. We illustrate PING using two paired-end datasets from Saccharomyces cerevisiae and compare its performance to nucleR and ChIPseqR. Identification of nucleosomes from two different mononucleosomes data. A yeast strain (W303 background) with the HTZ1 gene expressed a fusion with a myc epitope was used to map total and Htz1-containign nucleosome by MNase-ChIP-Seq. Cells were grown to mid-log phase and monomucleosomes were generated using MNase treatment of isolated nuclei. Especially for the sample of SC0017_61YDGAAXX_8_TCATTC, the Htz1-containing nucleosomes were enriched by immunoprecipitation using an anti-Myc antibody (3E10). DNA from both total nucleosomes and Htz1-enriched nucleosomes were purified and sequenced on an Illumina GA IIx using the by paired-end protocol.
Project description:Identification and characterization of HP1BP3 (a human histone H1 homologue) as a novel chromatin retention factor essential for the co-transcriptional processing of pri-miRNA. We generated BAC transgenic cells at 80% confluency (~1x107) were cross-linked with 1% formaldehyde for 10 minutes at 37°C, and quenched with 125 mM glycine at room temperature for 5 minutes. The fixed cells were washed twice with cold PBS, scraped, and transferred into 1 ml PBS containing protease inhibitors (Roche). After centrifugation at 700 g for 4 minutes at 4°C, the cell pellets were resuspended in 100 μl ChIP lysis buffer (1% SDS, 10 mM EDTA, 50 mM Tris-HCl [pH 8.1] with protease inhibitors) and sonicated at 4°C with a Bioruptor (Diagenode) (30 seconds ON and 30 seconds OFF at highest power for 15 minutes). The sheared chromatin with a fragment length of ~200 – 600 bp) was centrifuged at 20,000 g for 15 minutes at 4°C). 100 μl of the supernatant was used for ChIP or as input. A 1:10 dilution of the solubilized chromatin in ChIP dilution buffer (0.01% SDS, 1.1% Triton X-100, 1.2 mM EDTA, 167 mM NaCl 16.7 mM Tris-HCl [pH 8.1]) was incubated at 4°C overnight with 6 μg/ml of a goat anti-GFP (raised against His-tagged full-length eGFP and affinity-purified with GST-tagged full-length eGFP). Immunoprecipitation was carried out by incubating with 40 μl pre-cleared Protein G Sepharose beads (Amersham Bioscience) for 1 hour at 4°C, followed by five washes for 10 minutes with 1ml of the following buffers: Buffer I: 0.1% SDS, 1% Triton X-100, 2 mM EDTA, 20 mM Tris-HCl [pH 8.1], 150 mM NaCl; Buffer II: 0.1% SDS, 1% Triton X-100, 2 mM EDTA, 20 mM Tris-HCl [pH 8.1], 500 mM NaCl; Buffer III: 0.25 M LiCl, 1% NP-40, 1% deoxycholate, 1 mM EDTA, 10 mM Tris-HCl [pH 8.1]; twice with TE buffer [pH 8.0]. Elution from the beads was performed twice with 100 μl ChIP elution buffer (1% SDS, 0.1 M NaHCO3) at room temperature (RT) for 15 minutes. Protein-DNA complexes were de-crosslinked by heating at 65°C in 192 mM NaCl for 16 hours. DNA fragments were purified using QiaQuick PCR Purification kit (QIAGEN) and eluted into 30 μl H2O according to the manufacturer’s protocol after treatment with RNase A and Proteinase K.