Dataset Information

Efficient targeted transcript discovery via array-based normalization of RACE libraries

ABSTRACT: RACE (Rapid Amplification of cDNA Ends) is a widely used approach for transcript identification. However, the dynamic range in the population of RACE transcript isoforms may be very large, and random clone selection -the typical approach- may be ineffective in sampling the different transcript species present in the population. Here, we describe an effective RACE sampling strategy. The products of the RACE reaction are hybridized onto high-density tiling arrays, and the exons detected are then used to delineate a series of RT-PCR reactions, through which the original RACE mixture is segregated into a number of simpler RT-PCR reactions. These are independently cloned, and randomly selected clones are sequenced. This approach is superior to the direct cloning and sequencing of the RACE products: it specifically targets novel transcripts, and often leads to the overall normalization of their abundances. We indeed show theoretically that this strategy leads to a very efficient sampling of the novel transcript species associated to annotated loci. In a pilot experiment, we used this approach to discover many novel transcripts for a few otherwise well-characterized protein coding genes. Finally we investigate how this strategy can be multiplexed for large-scale transcript discovery by high-density pooling of RACE reactions prior to hybridization. Our results indicate that through the interrogation of a limited number of exons per gene on a limited number of cell types, it is possible to recover a large fraction of the transcript diversity associated to protein coding loci. These loci, however, could be occupying a much larger genomic space than previously expected, implying that efficient multiplexing requires non-trivial pooling optimization.

ORGANISM(S): Homo sapiens

PROVIDER: GSE11433 | GEO | 2008/05/25

SECONDARY ACCESSION(S): PRJNA106459

REPOSITORIES: GEO

ACCESS DATA

Dataset's files

Source:

			Action	DRS
		Other

Items per page:

1 - 1 of 1

Similar Datasets

Project description:Combining or pooling individual samples when carrying out transcript profiling using microarrays is a fairly common means to reduce both the cost and complexity of data analysis. However, pooling does not allow for statistical comparison of changes between samples and can result in a loss of information. Because a rigorous comparison of the identified expression changes from the two approaches has not been reported, we compared the results for hepatic transcript profiles from pooled vs. individual samples. Hepatic transcript profiles from a single-dose time-course rat study in response to the prototypical toxicants Clofibrate [CAS:637-07-0;CHEBI:3750], Diethylhexyl phthalate DEHP [CAS:117-81-7;CHEBI:17243], and valproic acid VPA [CAS:1069-66-5;CHEBI:9925],were evaluated. Approximately 50% more transcript expression changes were observed in the individual (statistical) analysis compared with the pooled analysis. While the majority of these changes were less than twofold in magnitude (~80%), a substantial number were greater than twofold (~20%). Transcript changes unique to the individual analysis were confirmed by quantitative RT-PCR, while all the changes unique to the pooled analysis did not confirm. The individual analysis identified more hits per biological pathway than the pooled approach. Many of the transcripts identified by the individual analysis were novel findings and may contribute to a better understanding of molecular mechanisms of these compounds. Furthermore, having individual animal data provided the opportunity to correlate changes in transcript expression to phenotypes (i.e., histology) observed in toxicology studies. The two approaches were similar when clustering methods were used despite the large difference in the absolute number of transcripts changed. In summary, pooling reduced resource requirements substantially, but the individual approach enabled statistical analysis that identified more gene expression changes to evaluate mechanisms of toxicity. An individual animal approach becomes more valuable when the overall expression response is subtle and/or when associating expression data to variable phenotypic responses.

Project description:Background: RNA-seq is revolutionizing the way we study transcriptomes. mRNA can be surveyed without prior knowledge of gene transcripts. Alternative splicing of transcript isoforms and the identification of previously unknown exons are being reported. Initial reports of differences in exon usage, and splicing between samples as well as quantitative differences among samples are beginning to surface. Biological variation has been reported to be larger than technical variation. In addition, technical variation has been reported to be in line with expectations due to random sampling. However, strategies for dealing with technical variation will differ depending on the magnitude. The size of technical variance, and the role of sampling are examined in this manuscript. Results: Independent Solexa/Illumina experiments containing technical replicates are analyzed. When coverage is low, large disagreements between technical replicates are apparent. Exon detection between technical replicates is highly variable when the coverage is less than 5 reads per nucleotide and estimates of gene expression are more likely to disagree when coverage is low. Although large disagreements in the estimates of expression are observed at all levels of coverage. Conclusions: Technical variability is too high to ignore. Technical variability results in inconsistent detection of exons at low levels of coverage. Further, the estimate of the relative abundance of a transcript can substantially disagree, even when coverage levels are high. This may be due to the low sampling fraction and if so, it will persist as an issue needing to be addressed in experimental design even as the next wave of technology produces larger numbers of reads. We provide practical recommendations for dealing with the technical variability, without dramatic cost increases. Three independent samples of D. simulans male heads were collected with each sample representing a unique pool of biological material. Each sample was prepared according to manufacturer's instructions and then the same library was run on two lanes of a Solexa/Illumina flow cell, resulting in two technical replicates for each biological replicate, runs were 36 base-pair paired end.

Project description:Background: RNA-seq is revolutionizing the way we study transcriptomes. mRNA can be surveyed without prior knowledge of gene transcripts. Alternative splicing of transcript isoforms and the identification of previously unknown exons are being reported. Initial reports of differences in exon usage, and splicing between samples as well as quantitative differences among samples are beginning to surface. Biological variation has been reported to be larger than technical variation. In addition, technical variation has been reported to be in line with expectations due to random sampling. However, strategies for dealing with technical variation will differ depending on the magnitude. The size of technical variance, and the role of sampling are examined in this manuscript. Results: Independent Solexa/Illumina experiments containing technical replicates are analyzed. When coverage is low, large disagreements between technical replicates are apparent. Exon detection between technical replicates is highly variable when the coverage is less than 5 reads per nucleotide and estimates of gene expression are more likely to disagree when coverage is low. Although large disagreements in the estimates of expression are observed at all levels of coverage. Conclusions: Technical variability is too high to ignore. Technical variability results in inconsistent detection of exons at low levels of coverage. Further, the estimate of the relative abundance of a transcript can substantially disagree, even when coverage levels are high. This may be due to the low sampling fraction and if so, it will persist as an issue needing to be addressed in experimental design even as the next wave of technology produces larger numbers of reads. We provide practical recommendations for dealing with the technical variability, without dramatic cost increases. Three independent samples of D. melanogaster female heads were collected with each sample representing a unique pool of biological material. Each sample was prepared according to manufacturer's instructions and then the same library was run on two lanes of a Solexa/Illumina flow cell, resulting in two technical replicates for each biological replicate, runs were 36 base-pair paired end.

Dataset Information

Efficient targeted transcript discovery via array-based normalization of RACE libraries

Dataset's files

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets