Long-range chromatin interactions in mouse embryonic stem cells identified by ChIA-PET [ChIA-PET]
ABSTRACT: The pluripotent state of embryonic stem cells (ESCs) is produced by active transcription of cell identity genes and repression of genes encoding lineage-specifying developmental regulators. Here we use large ESC cohesin ChIA-PET datasets and other genomic data to identify the local chromosomal structures at both active and repressed genes across the genome. The results show that super-enhancer driven cell identity genes generally occur within large loops that are connected through CTCF-CTCF interaction sites occupied by cohesin. Smc1 ChIA-PET data from wild type murine embryonic stem cells V6.5 were generated by deep sequencing using Illumina Hi-Seq 2000.
Project description:The pluripotent state of embryonic stem cells (ESCs) is produced by active transcription of cell identity genes and repression of genes encoding lineage-specifying developmental regulators. Here we use large ESC cohesin ChIA-PET datasets and other genomic data to identify the local chromosomal structures at both active and repressed genes across the genome. The results show that super-enhancer driven cell identity genes generally occur within large loops that are connected through CTCF-CTCF interaction sites occupied by cohesin. H3K27me3 ChIP-seq data from wild type murine embryonic stem cells V6.5 were generated by deep sequencing using Illumina Hi-Seq 2000.
Project description:RNA Polymerase II ChIA-PET data has revealed enhancers that are active in a profiled cell type and the genes that the enhancers regulate through chromatin interactions. The most commonly used computational method for analyzing ChIA-PET data, the ChIA-PET Tool, discovers interaction anchors at a spatial resolution that is insufficient to accurately identify individual enhancers. We introduce $Germ$, a computational method that estimates the likelihood that any two narrowly defined genomic locations are jointly occupied by RNA Polymerase II. $Germ$ takes a blind deconvolution approach to simultaneously estimate the likelihood of RNA Polymerase II occupation as well as a model of the arrangement of read alignments relative to locations occupied by RNA Polymerase II. Both types of information are utilized to estimate the likelihood that RNA Polymerase II jointly occupies any two genomic locations. We apply $Germ$ to RNA Polymerase II ChIA-PET data from embryonic stem cells to identify the genomic locations that are jointly occupied along with transcription start sites. We show that these genomic locations align more closely with features of active enhancers measured by ChIP-Seq than the locations identified using the ChIA-PET Tool. We also apply $Germ$ to RNA Polymerase II ChIA-PET data from motor neuron progenitors. Based on the $Germ$ results, we observe that a combination of cell type specific and cell type independent regulatory interactions are utilized by cells to regulate gene expression. Overall design: RNA Polymerase II ChIA-PET data from murine motor neuron progenitors were generated by deep sequencing.
Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Yijun Ruan mailto:email@example.com). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:firstname.lastname@example.org). This track was produced as part of the ENCODE Project. It shows the locations of protein factor mediated chromatin interactions determined by Chromatin Interaction Analysis with Paired-End Tag (ChIA-PET) data (Fullwood et al., 2010) extracted from five different human cancer cell lines (K562 (chronic myeloid leukemia), HCT116 (colorectal cancer), HeLa-S3 (cervical cancer), MCF-7 (breast cancer), and NB4 (promyelocytic)). A chromatin interaction is defined as the association of two regions of the genome that are far apart in terms of genomic distance, but are spatially proximate to each other in the 3-dimensional cellular nucleus. Additionally, ChIA-PET experiments generate transcription factor binding sites. A binding site is defined as a region of the genome that is highly enriched by specific Chromatin ImmunoPrecipitation (ChIP) against a transcription factor, which indicates that the transcription factor binds specifically to this region. The protein factors displayed in the track include estrogen receptor alpha (ERa), RNA polymerase II (RNAPII), and CCCTC binding factor (CTCF). For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Chromatin interaction analysis with paired-end tag sequencing (ChIA-PET) is a global de novo high-throughput method for characterizing the 3-dimensional structure of chromatin in the nucleus. In the ChIA-PET protocol, samples were cross-linked and fragmented, then subjected to chromatin immunoprecipitation. The DNA fragments that were brought together by the chromatin interactions were then proximity-ligated. During this proximity-ligation step, the half-linkers (created by the fragmentation) containing flanking MmeI sites (type IIS restriction enzymes) were first ligated to the DNA fragments and then ligated to each other to form full linkers. Full linkers bridge either two ends of a self-circularized fragment, or two ends of two different chromatin fragments. The material was then reverse cross-linked, purified and digested with MmeI. MmeI cuts 20 base pairs away from its recognition site. Tag-linker-tag (paired-end tag, PET) constructs were sequenced by ultra-high-throughput methods (Illumina or SOLiD paired-end sequencing). ChIA-PET reads were processed with the ChIA-PET Tool (Li et al., 2010) by the following steps: linker filtering, short reads mapping, PET classification, binding site identification, and interaction cluster identification. The high-confidence binding sites and chromatin interaction clusters were reported. Chromatin interactions identified by ChIA-PET have been validated by 3C, ChIP-3C, 4C and DNA-FISH (Fullwood et al., 2009).
Project description:The developmentally regulated 26- to 32-nt siRNAs (scnRNAs) are loaded to the Argonaute protein Twi1p and display a strong bias for uracil at the 5' end. In this study, we used deep sequencing to analyze loaded and unloaded populations of scnRNAs. We show that the size of the scnRNA is determined during a pre-loading process, whereas their 5' uracil bias is attributed to both pre-loading and loading processes. We also demonstrate that scnRNAs have a strong bias for adenine at the third base from the 3' terminus, suggesting that most scnRNAs are direct Dicer products. Furthermore, we show that the thermodynamic asymmetry of the scnRNA duplex does not affect the guide and passenger strand decision. Finally, we show that scnRNAs frequently have templated uracil at the last base without a strong bias for adenine at the second base indicating non-sequential production of scnRNAs from substrates. These findings provide a biochemical basis for the varying attributes of scnRNAs, which should help improve our understanding of the production and turnover of scnRNAs in vivo. We compared Twi1p-loaded scnRNAs to scnRNAs before they have been loaded into Twi1p by deep sequencing to understand how the two processes, the production of siRNAs by Dicer and the loading of siRNAs into Argonaute, shape the population of siRNAs in vivo.
Project description:We prepared small RNA libraries from 29 tumor/normal pairs of human cervical tissue samples. Analysis of the resulting sequences (42 million in total) defined 64 new human microRNA (miRNA) genes. Both arms of the hairpin precursor were observed in twenty-three of the newly identified miRNA candidates. We tested several computational approaches for analysis of class differences between high throughput sequencing datasets, and describe a novel application of log linear model that has provided the most datasets, and describe a novel application of log linear model that has provided the most effective analysis for this data. This method resulted in the identification of 67 miRNAs that were differentially-expressed between the tumor and normal samples at a false discovery rate less than 0.001. A total of 29 tumor/normal pairs of human cervical tissue samples were analyzed. Two samples (G699N_2 and G761T_2) were performed in duplicates. No Fastq files for GSM532871 to GSM532889, GSM532929, and GSM532930. Sequence files are provided as text files for these 22 Sample records in GSE20592_RAW.tar. 38 samples with quality scores are available from SRA as SRP002/SRP002326 (see Supplementary file below).
Project description:Salvia hispanica L. (chia) is a member of the mint family that is cultivated for its seeds. The majority of seed content in chia is comprised of omega fatty acids. Furthermore, chia seeds are also rich in fiber and minerals. The human health potential of chia seeds have driven studies of dietary effects, however there is little genetic or genomic studies available. In this study we obtained RNA from seeds, shoots, cotyledons, leaf primordia, nodes, racemes, and flower tissues from different developmental stages to generate an expression atlas for chia. RNA was sequenced on an Illumina Hiseq 2500. Sequence reads were assembled de novo to produce transcripts. Sequence reads were aligned to the chia transcriptome assembly to generate counts for each tissue type. Differentially expressed transcripts were determined for each tissue type.
Project description:Mutations such as gene fusion, translocation and focal amplification are a frequent cause of proto-oncogene activation during tumorigenesis, but such mutations do not explain all cases of proto-oncogene activation. Here we show that disruption of local chromosome conformation can also activate proto-oncogenes in human cells. We mapped chromosome structures in T-cell acute lymphoblastic leukemia (T-ALL), and found that active oncogenes and silent proto-oncogenes generally occur within insulated neighborhoods formed by the looping of two interacting CTCF sites co-occupied by cohesin. Recurrent microdeletions frequently overlap neighborhood boundary sites in T-ALL genomes, and we demonstrate that site-specific perturbation of loop boundaries is sufficient to activate the respective proto-oncogenes in non-malignant cells. We found somatic genomic rearrangements affecting loop boundaries in many cancers. These results suggest that chromosome structural organization is fundamental to identify functional somatic alterations in cancer genomes. Overall design: Two replicates of SMC1 ChIA-PET in T-ALL Jurkat Cells
Project description:There is considerable evidence that chromosome structure plays important roles in gene control, but we have limited understanding of the proteins that contribute to structural interactions between gene promoters and their enhancer elements. Large DNA loops that encompass genes and their regulatory elements depend on CTCF-CTCF interactions, but most enhancer-promoter interactions do not depend on this structural protein. Here we show that the transcription factor Yin Yang 1 (YY1) contributes to enhancer-promoter structural interactions in a manner analogous to DNA interactions mediated by CTCF. YY1 binds to active enhancers and promoter-proximal elements in all cells examined. YY1 forms dimers that can facilitate DNA interactions. Deletion of YY1 binding sites or depletion of YY1 can disrupt enhancer-promoter looping and normal gene expression. We propose that YY1-mediated enhancer-promoter interactions are a general feature of mammalian gene control. Overall design: ChIA-PET (Chromosome interaction analysis by paired-end tags) in mouse embryonic stem cells targeting CTCF- or YY1-associated interactions
Project description:The earliest recognizable stages of breast neoplasia are lesions that represent a heterogeneous collection of epithelial proliferations currently classified based on morphology. Their role in the development of breast cancer is not well understood but insight into the critical events at this early stage will improve efforts in breast cancer detection and prevention. These microscopic lesions are technically difficult to study so very little is known about their molecular alterations. To characterize the transcriptional changes of early breast neoplasia, we sequenced 3'- end enriched RNAseq libraries from formalin-fixed paraffin-embedded tissue of early neoplasia samples and matched normal breast and carcinoma samples from 25 patients. We find that gene expression patterns within early neoplasias are distinct from both normal and breast cancer patterns and identify a pattern of pro-oncogenic changes, including elevated transcription of ERBB2, FOXA1, and GATA3 at this early stage. We validate these findings on a second independent gene expression profile data set generated by whole transcriptome sequencing. Measurements of protein expression by immunohistochemistry on an independent set of early neoplasias confirms that ER pathway regulators FOXA1 and GATA3, as well as ER itself, are consistently upregulated at this early stage. The early neoplasia samples also demonstrate coordinated changes in long non-coding RNA expression and microenvironment stromal gene expression patterns. This study is the first examination of global gene expression in early breast neoplasia, and the genes identified here represent candidate participants in the earliest molecular events in the development of breast cancer. 3SEQ was performed on 72 FFPE human breast samples from 25 patients: 24 normal, 25 early neoplasia, 9 carcinoma in situ, and 14 invasive cancer