Project description:Cleavage Under Targets and Release Using Nuclease (CUT&RUN) has rapidly gained prominence as an effective approach for mapping protein-DNA interactions, especially histone modifications, offering substantial improvements over conventional chromatin immunoprecipitation sequencing (ChIP-seq). However, the effectiveness of this technique is contingent upon accurate peak identification, necessitating the use of optimal peak calling methods tailored to the unique characteristics of CUT&RUN data. Here, we benchmark four prominent peak calling tools - MACS2, SEACR, GoPeaks, and LanceOtron - evaluating their performance in identifying peaks from CUT&RUN datasets. Our analysis utilizes in-house data of three histone marks (H3K4me3, H3K27ac, and H3K27me3) from mouse brain tissue, as well as samples from the 4DNucleome database. We systematically assess these tools based on parameters such as the number of peaks called, peak length distribution, signal enrichment, and reproducibility across biological replicates. Our findings reveal substantial variability in peak calling efficacy, with each method demonstrating distinct strengths in sensitivity, precision, and applicability depending on the histone mark in question. These insights provide a comprehensive evaluation that will assist in selecting the most suitable peak caller for high-confidence identification of regions of interest in CUT&RUN experiments, ultimately enhancing the study of chromatin dynamics and transcriptional regulation.
Project description:High-resolution methods such as 4C and Capture-C enable the study of chromatin loops such as those formed between promoters and enhancers or CTCF/cohesin binding sites. An important aspect of 4C/CapC analyses is the identification of robust peaks in the data for the identification of chromatin loops. Here we present an R package for the analysis of 4C/CapC data. We generated 4C data for 10 viewpoints in 2 tissues in triplicate to test our methods. We developed a non-parametric peak caller based on rank-products. Sampling analysis shows that not read depth but template quality is the most important determinant of success in 4C experiments. By performing peak calling on single experiments we show that the peak calling results are similar to the replicate experiments, but that false positive rates are significantly reduced by performing replicates.
Project description:BackgroundCUT&RUN is an efficient epigenome profiling method that identifies sites of DNA binding protein enrichment genome-wide with high signal to noise and low sequencing requirements. Currently, the analysis of CUT&RUN data is complicated by its exceptionally low background, which renders programs designed for analysis of ChIP-seq data vulnerable to oversensitivity in identifying sites of protein binding.ResultsHere we introduce Sparse Enrichment Analysis for CUT&RUN (SEACR), an analysis strategy that uses the global distribution of background signal to calibrate a simple threshold for peak calling. SEACR discriminates between true and false-positive peaks with near-perfect specificity from "gold standard" CUT&RUN datasets and efficiently identifies enriched regions for several different protein targets. We also introduce a web server ( http://seacr.fredhutch.org ) for plug-and-play analysis with SEACR that facilitates maximum accessibility across users of all skill levels.ConclusionsSEACR is a highly selective peak caller that definitively validates the accuracy of CUT&RUN for datasets with known true negatives. Its ease of use and performance in comparison with existing peak calling strategies make it an ideal choice for analyzing CUT&RUN data.
Project description:Recent advances in single cell RNA sequencing allow users to pool multiple samples into one run and demultiplex in downstream analysis, greatly increasing the experimental efficiency and cost-effectiveness. However, the expensive reagents for cell labeling, limited pooling capacity, non-ideal cell recovery rate and calling accuracy remain great challenges for this approach. To date, there are two major demultiplexing methods, antibody-based cell hashing and Single Nucleotide Polymorphism (SNP)-based genomic signature profiling, and each method has advantages and limitations. Here, we propose a hybrid demultiplexing strategy that increases calling accuracy and cell recovery at the same time. We first develop a computational algorithm that significantly increases calling accuracy of cell hashing. Next, we cluster all single cells based on their SNP profiles. Finally, we integrate results from both methods to make corrections and retrieve cells that are only identifiable in one method but not the other. By testing on several real-world datasets, we demonstrate that this hybrid strategy combines advantages of both methods, resulting in increased cell recovery and calling accuracy at lower cost.