Metabolomics,Unknown,Transcriptomics,Genomics,Proteomics

Dataset Information

0

RNA Subcellular CAGE Localization from ENCODE/RIKEN


ABSTRACT: This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Piero Carninci mailto:carninci@riken.jp). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). This track shows 5' cap analysis gene expression (CAGE) tags and clusters in RNA extracts (http://hgwdev.cse.ucsc.edu/cgi-bin/hgEncodeVocab?type=rnaExtract) from different sub-cellular localizations (http://hgwdev.cse.ucsc.edu/cgi-bin/hgEncodeVocab?type=localization) in multiple cell lines (http://hgwdev.cse.ucsc.edu/cgi-bin/hgEncodeVocab?type=cellType). A CAGE cluster is a region of overlapping tags with an assigned value that represents the expression level. The data in this track were produced as part of the ENCODE Transcriptome Project. Release 2 has three new downloads only files per experiment (Clusters, TSS Gencode 7 and TSS HMM) and four new cell lines (A459, AG04450, BJ and SK-N-SH_RA). Release 1 on hg19 contained the original data on hg18 (http://hgwdev.cse.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&g=wgEncodeRikenCage) that was remapped and indicated in this release as Generation 0 since that data had no replicates. If there is both old and new generation data available for a particular experiment, only the new generation data is displayed and the older data is available for download. The new data for this track was done with a different process and has standard replicate numbers. The replicate labeling in the genome browser view is a counter indicating the total number of replicates submitted. The producing lab has replicate numbers that correspond to their internal bio-replicate numbering. Where these two numbering systems conflict, both are listed in the long label of the specific track. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell). RNA molecules longer than 200 nt were isolated from each subcellular compartment and then were fractionated into polyA+ and polyA- fractions as described in these protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/general/rnaExtracts.txt). The CAGE tags were sequenced from the 5' ends of cap-trapped cDNAs produced using RIKEN CAGE technology (Kodzius et al. 2006; Valen et al. 2009). To create the tag, a linker was attached to the 5' end of polyA+ or polyA- reverse-transcribed cDNAs which were selected by cap trapping (Carninci et al. 1996). The first 27 bp of the cDNA were cleaved using class II restriction enzymes. A linker was then attached to the 3' end of the cDNA. After PCR amplification, the tags were sequenced (36 bp single reads) using Illumina's Genome analyzer. Tags were mapped to the human genome (hg19) using the program delve (T. Lassmann manuscript in preparation). Delve is a new probabilistic aligner focused on giving the best possible alignment of reads to a genome rather than focusing on speed. It calculates the mapping accuracy (probability of each alignment being true or not) for each alignment. There is no set limit on the number of errors allowed and therefore the mapping rate is commonly 100%. However, for analysis it is recommended to discard alignments with low mapping qualities. Exceptions to the above protocol are the polyA- RNA samples from K562 cytosol, K562 nucleus, and prostate whole cell which were sequenced using ABI SOLiD (http://www.appliedbiosystems.com/absite/us/en/home/applications-technologies/solid-next-generation-sequencing.html) technology. These reads were mapped using Bowtie using default parameters. Clusters were defined as regions of overlapping CAGE reads. The expression level was computed as the number of reads making up the cluster, divided by the total number of reads sequenced, times 1 million.

ORGANISM(S): Homo sapiens

SUBMITTER: ENCODE DCC 

PROVIDER: E-GEOD-34448 | biostudies-arrayexpress |

REPOSITORIES: biostudies-arrayexpress

altmetric image

Publications

Landscape of transcription in human cells.

Djebali Sarah S   Davis Carrie A CA   Merkel Angelika A   Dobin Alex A   Lassmann Timo T   Mortazavi Ali A   Tanzer Andrea A   Lagarde Julien J   Lin Wei W   Schlesinger Felix F   Xue Chenghai C   Marinov Georgi K GK   Khatun Jainab J   Williams Brian A BA   Zaleski Chris C   Rozowsky Joel J   Röder Maik M   Kokocinski Felix F   Abdelhamid Rehab F RF   Alioto Tyler T   Antoshechkin Igor I   Baer Michael T MT   Bar Nadav S NS   Batut Philippe P   Bell Kimberly K   Bell Ian I   Chakrabortty Sudipto S   Chen Xian X   Chrast Jacqueline J   Curado Joao J   Derrien Thomas T   Drenkow Jorg J   Dumais Erica E   Dumais Jacqueline J   Duttagupta Radha R   Falconnet Emilie E   Fastuca Meagan M   Fejes-Toth Kata K   Ferreira Pedro P   Foissac Sylvain S   Fullwood Melissa J MJ   Gao Hui H   Gonzalez David D   Gordon Assaf A   Gunawardena Harsha H   Howald Cedric C   Jha Sonali S   Johnson Rory R   Kapranov Philipp P   King Brandon B   Kingswood Colin C   Luo Oscar J OJ   Park Eddie E   Persaud Kimberly K   Preall Jonathan B JB   Ribeca Paolo P   Risk Brian B   Robyr Daniel D   Sammeth Michael M   Schaffer Lorian L   See Lei-Hoon LH   Shahab Atif A   Skancke Jorgen J   Suzuki Ana Maria AM   Takahashi Hazuki H   Tilgner Hagen H   Trout Diane D   Walters Nathalie N   Wang Huaien H   Wrobel John J   Yu Yanbao Y   Ruan Xiaoan X   Hayashizaki Yoshihide Y   Harrow Jennifer J   Gerstein Mark M   Hubbard Tim T   Reymond Alexandre A   Antonarakis Stylianos E SE   Hannon Gregory G   Giddings Morgan C MC   Ruan Yijun Y   Wold Barbara B   Carninci Piero P   Guigó Roderic R   Gingeras Thomas R TR  

Nature 20120901 7414


Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification  ...[more]

Similar Datasets

2011-12-16 | GSE34448 | GEO
2012-05-10 | GSE37909 | GEO
2012-05-23 | E-GEOD-38163 | biostudies-arrayexpress
2011-11-10 | GSE33600 | GEO
2012-05-24 | GSE38163 | GEO
2012-06-07 | GSE35583 | GEO
2012-05-09 | E-GEOD-37909 | biostudies-arrayexpress
2012-07-20 | GSE39524 | GEO
2012-06-06 | GSE35585 | GEO
2012-04-24 | E-GEOD-35587 | biostudies-arrayexpress