Dataset Information


ENCODE Cold Spring Harbor Labs Long RNA-seq (hg19)

ABSTRACT: This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Carrie Davis (experimental), Alex Dobin (computational), Felix Schlesinger (computational), Tom Gingeras (primary investigator), and Roderic Guigo's group at the CRG). If you have questions about the Genome Browser track associated with this data, contact ENCODE ( These tracks were generate by the ENCODE Consortium. They contain information about human RNAs > 200 nucleotides in length obtained as short reads off the Illumina GAIIx platform. Data is available from biological replicates of several cell lines. In addition to profiling Poly-A+ and Poly-A- RNA from whole cells, we have also gather data from various subcellular compartments. In many cases, there are Cap Analysis of Gene Expression (CAGE, RIKEN Institute) and Small RNA-Seq (<200 nucleotides, CSHL) and Pair-End di-TAG-RNA (PET-RNA, Genome Institute of Singapore) datasets available from the same biological replicates. For data usage terms and conditions, please refer to and We are using the published protocol This protocol generates directional libraries and reports the transcripts strand of origin. Exogenous RNA spike-ins (Round 5, pool 14), in development at National Institutes Standards Technology were added to each endogenous RNA isolate and carried through library construction and sequencing. The Illumina PhiX control library was also spiked-in at 1% to each completed human library just prior to cluster formation. Accompanying each RNA-Seq dataset is a "Production Document". This document contains details about the RNA isolations and treatments, library construction, spike-ins as well as quality control figures for individual libraries. The spike-in sequence and the concentrations can are available for download in the supplemental directory. The libraries are sequenced on the Illumina platform to an average depth of ~200 million reads (100 million mate-pairs). The data are mapped against hg19 using Spliced Transcript Alignment and Reconstruction (STAR) written by Alex Dobin (CSHL). More information, about STAR including the parameters used for these data can be found at: Additionally, we provide the following processed "element" data files: de novo splice junctions, de novo transcripts, and contigs. These elements are assessed for reproducibility using a nonparametric irreproducible detection (IDR) rate script. The IDR values for each element are included in the files for end-users to threshold on. An IDR value of 0.1 means that the probability of detecting that element in a third experiment equivalent in depth to the the sum of the bioreplicates is 90%. In addition, we also compute expression values for annotated genes, transcripts and exons.

ORGANISM(S): Homo sapiens  

SUBMITTER: Tom Gingeras   Carrie Davis  Felix Schlesinger  Alex Dobin  ENCODE DCC 

PROVIDER: E-GEOD-30567 | ArrayExpress | 2011-07-13



altmetric image


Many animal species use a chromosome-based mechanism of sex determination, which has led to the coordinate evolution of dosage-compensation systems. Dosage compensation not only corrects the imbalance in the number of X chromosomes between the sexes but also is hypothesized to correct dosage imbalance within cells that is due to monoallelic X-linked expression and biallelic autosomal expression, by upregulating X-linked genes twofold (termed 'Ohno's hypothesis'). Although this hypothesis is well  ...[more]

Publication: 1/3

Similar Datasets

2012-04-02 | E-GEOD-36025 | ArrayExpress
2010-10-08 | E-GEOD-24565 | ArrayExpress
2013-10-16 | E-GEOD-40131 | ArrayExpress
2018-01-29 | PXD006575 | Pride
2011-09-29 | E-GEOD-32465 | ArrayExpress
2007-12-12 | E-GEOD-9848 | ArrayExpress
2007-12-12 | E-GEOD-9849 | ArrayExpress
2007-04-10 | E-FPMI-8 | ArrayExpress
2012-04-08 | E-GEOD-32007 | ArrayExpress
2019-05-14 | E-GEOD-40131 | ExpressionAtlas