Project description:HiCUP is a pipeline for processing sequence data generated by Hi-C, a technique used to investigate the three-dimensional organisation of a genome. The pipeline maps data to a specified reference genome and removes artefacts that would otherwise hinder subsequent analysis. HiCUP also provides an easy-to-interpret yet detailed quality control report that may be used by researchers to refine their experimental protocol for future studies. The software is freely available and has already been used for processing Hi-C data in several recently published peer-reviewed research articles. This experiment investigates the impact of using HiCUP to remove putative PCR amplification products in heavily duplicated Capture Hi-C libraries.
Project description:HiCUP is a pipeline for processing sequence data generated by Hi-C, a technique used to investigate the three-dimensional organisation of a genome. The pipeline maps data to a specified reference genome and removes artefacts that would otherwise hinder subsequent analysis. HiCUP also provides an easy-to-interpret yet detailed quality control report that may be used by researchers to refine their experimental protocol for future studies. The software is freely available and has already been used for processing Hi-C data in several recently published peer-reviewed research articles. This experiment investigates the impact of using HiCUP to remove putative PCR amplification products in heavily duplicated Capture Hi-C libraries. Examination of three Capture Hi-C libraries
Project description:We investigated the reported binding of telomere associated factor TERF1 and TERF2 to internal telomere sites using ChIP-Seq for these two factors in a lymphoblastoid cell line. We mapped over 40 million reads for each sample to a custom reference genome that incorporates our subtelomere assembly, and generated signal tracks using only uniquely mapping reads, and also using a multimapping pipeline we developed. We find that peaks are misshapen and made up of reads that cannot be distinguished from true telomere sequence. Removing telomere identified reads removes all internal signal.
Project description:This is the validation data for candidate de novo CNV calls made in the asthma trios by Itsara et al., Genome Research 2010. In this study, de novo CNV calls in the asthma data set were initially made with Illumina 550K SNP arrays. Validation was performed with custom Nimblegen array CGH for which DNA was available. de novo CNVs would be expected to validate in the child of each trio tested, and not be detected in either parent.
Project description:This is the validation data for candidate de novo CNV calls made in the CEU Hapmap by Itsara et al., Genome Research 2010. In this study, de novo CNV calls were initially made with Illumina 1M SNP arrays. Validation of CNV calls was performed with Nimblegen custom array CGH using the extended CEPH pedigrees. A truly de novo CNV would be unobserved in the first generation (CEU trio parents), validated in the second generation (CEU trio children), and assuming no selective effects, transmitted to approximately half of the individuals in the third generation. We attempted validation of 4 de novo CNVs in 3 extended CEPH pedigrees: 1358, 1408, and 1459.
Project description:We present MultiEditR, the first algorithm specifically designed to detect and quantify RNA editing from Sanger sequencing (z.umn.edu/multieditr). Although RNA editing is routinely evaluated by measuring the heights of peaks from in Sanger sequencing traces, the accuracy and the precision of this approach has yet to be evaluated against gold-standards next-generation sequencing methods. Through a comprehensive comparison to RNA-seq and amplicon based deep sequencing, we show that MultiEditR is accurate, precise, and reliable for detecting endogenous and programmable RNA editing.
Project description:RNA-Sequencing is a transformative method that captures the quantitative dynamics of a transcriptome with exquisite sensitivity and single-base resolution. There are, however, few computational pipelines for RNA-Seq with statistical tests that evince sufficient robustness and power as demanded by the difficult combination of small sample sizes and high variability in sequence read counts. To this end, we developed GENE-counter, a complete software pipeline for analyzing RNA-Seq data for genome-wide expression differences between replicated treatment groups. One important component of GENE-counter is a statistical test based on the NBP parameterization of the negative binomial distribution for identifying differentially expressed genome features. We used GENE-counter to analyze RNA-Seq data derived from Arabidopsis thaliana infected with a strain of defense-eliciting bacteria. We identified 308 genes that were differentially induced. Using alternative methods, we provided support for the induced expression and biological relevance of a substantial proportion of the genes. These results suggest the NBP parameterization of the negative binomial distribution is well suited for explaining RNA-Seq data and the statistical test makes GENE-counter a powerful pipeline for studying genome-wide expression changes. GENE-counter is freely available at http://changlab.cgrb.oregonstate.edu/. Our RNA-seq data is uploaded on the NCBI short read archive (SRA) under the SRA025952.
Project description:We investigated the reported binding of telomere associated factor TERF1 and TERF2 to internal telomere sites using ChIP-Seq for these two factors in a lymphoblastoid cell line. We mapped over 40 million reads for each sample to a custom reference genome that incorporates our subtelomere assembly, and generated signal tracks using only uniquely mapping reads, and also using a multimapping pipeline we developed. We find that peaks are misshapen and made up of reads that cannot be distinguished from true telomere sequence. Removing telomere identified reads removes all internal signal. Examination of TRF1 and TRF2