ABSTRACT: The ability to rapidly map millions of oligonucleotide fragments to a reference genome is crucial to many high throughput genomic technologies.We propose an intuitive and efficient algorithm, titled extreme MApping of OligoNucleotide (xMAN), to rapidly map millions of oligonucleotide fragments to a genome of any length. By converting oligonucleotides to integers hashed in RAM, xMAN can scan through genomes using bit shifting operation and achieve at least one order of magnitude speed increase over existing tools. xMAN can map the 42 million 25-mer probes on the Affymetrix whole human genome tiling arrays to the entire genome in less than 6 CPU hours.In addition to the speed advantage, we found the probe mapping of xMAN to substantially improve the final analysis results in both a spike-in experiment on ENCODE tiling arrays and an estrogen receptor ChIP-chip experiment on whole human genome tiling arrays. Those improvements were confirmed by direct ChIP and real-time PCR assay. xMAN can be further extended for application to other high-throughput genomic technologies for oligonucleotide mapping.
Project description:We investigate sigma factor binding regions for each sigma factor under various enviromental and/or genetic conditions. To measure sigma factor binding at a genome scale, we employed a ChIP-chip method to derivative strains of E. coli K-12 MG1655 wild type and its isogenic rpoS and rpoN knock-out strains under various conditions. A 45 ChIP-chip study under 4 separate culture conditions. The high-density oligonucleotide tiling arrays used were consisted of 371,034 oligonucleotide probes spaced 25 bp apart (25-bp overlap between two probes) across the E. coli genome.
Project description:Transcriptional enhancement of X-linked genes to compensate for the sex chromosome monosomy in Drosophila males is brought about by a ribonucleoprotein assembly called Male-Specific-Lethal or Dosage Compensation Complex (MSL-DCC). This machinery is formed in male flies and specifically associates with active genes on the X chromosome. After assembly at dedicated high-affinity "entry" sites (HAS) on the X chromosome, the complex distributes to the nearby active chromatin. High-resolution, genome-wide mapping of the MSL-DCC subunits by chromatin immunoprecipitation (ChIP) on oligonucleotide tiling arrays suggests a rather homogenous spreading of the intact complex onto transcribed chromatin. Coupling ChIP to deep sequencing (ChIP-seq) promises to map the chromosomal interactions of the DCC with improved resolution. We present ChIP-seq binding profiles for all complex subunits, including the first description of the RNA helicase MLE binding pattern. Exploiting the preferential representation of direct chromatin contacts upon high-energy shearing, we report a surprising functional and topological separation of MSL protein contacts at three classes of chromosomal binding sites. Furthermore, precise determination of DNA fragment lengths by paired-end ChIP-seq allows decrypting of the local complex architecture. Primary contacts of MSL-2 and MLE define HAS for the DCC. In contrast, association of the DCC with actively transcribed gene bodies is mediated by MSL-3 binding to nucleosomes. We identify robust MSL-1/MOF binding at a fraction of active promoters genome-wide. Correlation analyses suggest that this association reflects a function outside dosage compensation. Our comprehensive analysis provides a new level of information on different interaction modes of a multiprotein complex at distinct regions within the genome.
Project description:Investigation of whole genome gene expression level in E. coli K-12 MG1655 in glucose M9 minimal media with/without heatshock A six chip study using total RNA recovered from E. coli K-12 MG1655 grown up to OD600nm 0.6 (mid-exponential phase) in M9 minimal media supplemented with 0.2% glucose with/without heatshock in 42oC. The high-density oligonucleotide tiling arrays used were consisted of 371,034 oligonucleotide probes with 50-bp length that are spaced 25 bp apart across the E. coli genome (NimbleGen). Experiments were performed with three biological replicates.
Project description:Genomic tiling microarrays have become a popular tool for interrogating the transcriptional activity of large regions of the genome in an unbiased fashion. There are several key parameters associated with each tiling experiment (e.g., experimental protocols and genomic tiling density). Here, we assess the role of these parameters as they are manifest in different tiling-array platforms used for transcription mapping. First, we analyze how a number of published tiling-array experiments agree with established gene annotation on human chromosome 22. We observe that the transcription detected from high-density arrays correlates substantially better with annotation than that from other array types. Next, we analyze the transcription-mapping performance of the two main high-density oligonucleotide array platforms in the ENCODE regions of the human genome. We hybridize identical biological samples and develop several ways of scoring the arrays and segmenting the genome into transcribed and nontranscribed regions, with the aim of making the platforms most comparable to each other. Finally, we develop a platform comparison approach based on agreement with known annotation. Overall, we find that the performance improves with more data points per locus, coupled with statistical scoring approaches that properly take advantage of this, where this larger number of data points arises from higher genomic tiling density and the use of replicate arrays and mismatches. While we do find significant differences in the performance of the two high-density platforms, we also find that they complement each other to some extent. Finally, our experiments reveal a significant amount of novel transcription outside of known genes, and an appreciable sample of this was validated by independent experiments.
Project description:The most widely used method for detecting genome-wide protein-DNA interactions is chromatin immunoprecipitation on tiling microarrays, commonly known as ChIP-chip. Here, we conducted the first objective analysis of tiling array platforms, amplification procedures, and signal detection algorithms in a simulated ChIP-chip experiment. Mixtures of human genomic DNA and "spike-ins" comprised of nearly 100 human sequences at various concentrations were hybridized to four tiling array platforms by eight independent groups. Blind to the number of spike-ins, their locations, and the range of concentrations, each group made predictions of the spike-in locations. We found that microarray platform choice is not the primary determinant of overall performance. In fact, variation in performance between labs, protocols, and algorithms within the same array platform was greater than the variation in performance between array platforms. However, each array platform had unique performance characteristics that varied with tiling resolution and the number of replicates, which have implications for cost versus detection power. Long oligonucleotide arrays were slightly more sensitive at detecting very low enrichment. On all platforms, simple sequence repeats and genome redundancy tended to result in false positives. LM-PCR and WGA, the most popular sample amplification techniques, reproduced relative enrichment levels with high fidelity. Performance among signal detection algorithms was heavily dependent on array platform. The spike-in DNA samples and the data presented here provide a stable benchmark against which future ChIP platforms, protocol improvements, and analysis methods can be evaluated.
Project description:A single chromatin immunoprecipitation (ChIP) sample does not provide enough DNA for hybridization to a genomic tiling array. A commonly used technique for amplifying the DNA obtained from ChIP assays is ligation-mediated PCR (LM-PCR). However; using this amplification method, we could not identify Oct4 binding sites on genomic tiling arrays representing 1% of the human genome (ENCODE arrays). In contrast, hybridization of a pool of 10 ChIP samples to the arrays produced reproducible binding patterns and low background signals. However the pooling method would greatly increase the number of ChIP reactions needed to analyze the entire human genome. Therefore, we have adapted the GenomePlex whole genome amplification (WGA) method for use in ChIP-chip assays; detailed ChIP and amplification protocols used for these analyses are provided as supplementary material. When applied to ENCODE arrays, the products prepared using this new method resulted in an Oct4 binding pattern similar to that from the pooled Oct4 ChIP samples. Importantly, the signal-to-noise ratio using the GenomePlex WGA method is superior to the LM-PCR amplification method.
Project description:Array comparative genomic hybridization is a fast and cost-effective method for detecting, genotyping, and comparing the genomic sequence of unknown bacterial isolates. This method, as with all microarray applications, requires adequate coverage of probes targeting the regions of interest. An unbiased tiling of probes across the entire length of the genome is the most flexible design approach. However, such a whole-genome tiling requires that the genome sequence is known in advance. For the accurate analysis of uncharacterized bacteria, an array must query a fully representative set of sequences from the species' pan-genome. Prior microarrays have included only a single strain per array or the conserved sequences of gene families. These arrays omit potentially important genes and sequence variants from the pan-genome.This paper presents a new probe selection algorithm (PanArray) that can tile multiple whole genomes using a minimal number of probes. Unlike arrays built on clustered gene families, PanArray uses an unbiased, probe-centric approach that does not rely on annotations, gene clustering, or multi-alignments. Instead, probes are evenly tiled across all sequences of the pan-genome at a consistent level of coverage. To minimize the required number of probes, probes conserved across multiple strains in the pan-genome are selected first, and additional probes are used only where necessary to span polymorphic regions of the genome. The viability of the algorithm is demonstrated by array designs for seven different bacterial pan-genomes and, in particular, the design of a 385,000 probe array that fully tiles the genomes of 20 different Listeria monocytogenes strains with overlapping probes at greater than twofold coverage.PanArray is an oligonucleotide probe selection algorithm for tiling multiple genome sequences using a minimal number of probes. It is capable of fully tiling all genomes of a species on a single microarray chip. These unique pan-genome tiling arrays provide maximum flexibility for the analysis of both known and uncharacterized strains.
Project description:OBJECTIVE:To describe a considerably advanced method of array painting, which allows the rapid, ultra-high resolution mapping of translocation breakpoints such that rearrangement junction fragments can be amplified directly and sequenced. METHOD:Ultra-high resolution array painting involves the hybridisation of probes generated by the amplification of small numbers of flow-sorted derivative chromosomes to oligonucleotide arrays designed to tile breakpoint regions at extremely high resolution. RESULTS AND DISCUSSION:How ultra-high resolution array painting of four balanced translocation cases rapidly and efficiently maps breakpoints to a point where junction fragments can be amplified easily and sequenced is demonstrated. With this new development, breakpoints can be mapped using just two array experiments: the first using whole-genome array painting to tiling resolution large insert clone arrays, the second using ultra-high-resolution oligonucleotide arrays targeted to the breakpoint regions. In this way, breakpoints can be mapped and then sequenced in a few weeks.
Project description:High-density tiling arrays are designed to blanket an entire genomic region of interest using tiled oligonucleotides at very high resolution and are widely used in various biological applications. Experiments are usually conducted in multiple stages, in which unwanted technical variations may be introduced. As tiling arrays become more popular and are adopted by many research labs, it is pressing to develop quality control tools as was done for expression microarrays. We propose a set of statistical quality metrics analogous to those in expression microarrays with application to tiling array data. We also develop a method to estimate the significance level of an observed quality measurement using randomization tests. These methods have been applied to multiple real data sets, including three independent ChIP-chip experiments and one transcriptom mapping study, and they have successfully identified good quality chips as well as outliers in each study.
Project description:Imprinted genes are monoallelically expressed according to parental inheritance. The maternally and paternally inherited alleles are distinguished epigenetically by DNA methylation and histone modifications. Chromosome-wide Chromatin immunoprecipitation (ChIP) and MIRA analysis of MatDup.dist7 and PatDup.dist7 MEFs provided a panoramic map of reciprocal allele-specific histone modifications and DNA methylation at imprinted genes along distal chromosome 7 and 15. ChIP-chip and MIRA-chip was done to map histone modifications and DNA methylation along distal chr7 in the maternal allele and paternal allele in Matdup.dist7 and Patdup.dist7 MEFs, respectively, using Nimblegen tiling arrays for distal chr7.