Project description:Accurate functional annotation of regulatory elements is essential for understanding global gene regulation. Here, we report a genome-wide map of 827,000 transcription factor binding sites in human lymphoblastoid cell lines, which is comprised of sites correspond-ing to 239 position weight matrices of known transcription factor binding motifs, and 49 novel sequence motifs. To generate this map, we developed a probabilistic framework that integrates cell- or tissue-specific experimental data such as histone modifications and DNa-seI cleavage patterns with genomic information such as gene annotation and evolutionary conservation. Comparison to empirical ChIP-seq data suggests that our method is highly accurate yet has the advantage of targeting many factors in a single assay. We anticipate that this approach will be a valuable tool for genome-wide studies of gene regulation in a wide variety of cell-types or tissues under diverse conditions. DNaseI-Seq on two YRI Hapmap cell lines. Each individual sequenced on 8 lanes of the Illumina Genome Analyzer II
Project description:The use of multiple proteases has been shown to increase protein sequence coverage in proteomics experiments, but due to the additional sample preparation and analysis time required, it has not been widely adapted in routine proteomic workflows. While data-independent acquisition (DIA) has been primarily optimized for fragmenting tryptic peptides with beam type (bt)-CID, it has the potential to analyze multiplexed samples from different protease digests. Here we evaluate a DIA multiplexing method that combines three proteolytic digests (Trypsin, AspN, and GluC) into a single sample. We first optimize DIA conditions for both resonance excitation (re-CID) and bt-CID to determine the optimal consensus fragmentation conditions for tryptic and non-tryptic peptides, and apply these methods to a human cell line. We demonstrate that using this multiplexed approach results in similar protein identifications and quantitative performance as compared to trypsin alone, but enables up to a 63% increase in peptide detections, resulting in up to a 8% increase in average sequence coverage. Importantly, this resulted in 100% sequence coverage for numerous proteins, suggesting the utility of this approach in applications where sequence coverage is critical, such as proteoform analysis.