Unknown

Dataset Information

0

Mining SAGE data allows large-scale, sensitive screening of antisense transcript expression.


ABSTRACT: As a growing number of complementary transcripts, susceptible to exert various regulatory functions, are being found in eukaryotes, high throughput analytical methods are needed to investigate their expression in multiple biological samples. Serial Analysis of Gene Expression (SAGE), based on the enumeration of directionally reliable short cDNA sequences (tags), is capable of revealing antisense transcripts. We initially detected them by observing tags that mapped on to the reverse complement of known mRNAs. The presence of such tags in individual SAGE libraries suggested that SAGE datasets contain latent information on antisense transcripts. We raised a collection of virtual tags for mining these data. Tag pairs were assembled by searching for complementarities between 24-nt long sequences centered on the potential SAGE-anchoring sites of well-annotated human expressed sequences. An analysis of their presence in a large collection of published SAGE libraries revealed transcripts expressed at high levels from both strands of two adjacent, oppositely oriented, transcription units. In other cases, the respective transcripts of such cis-oriented genes displayed a mutually exclusive expression pattern or were co-expressed in a small number of libraries. Other tag pairs revealed overlapping transcripts of trans-encoded unique genes. Finally, we isolated a group of tags shared by multiple transcripts. Most of them mapped on to retroelements, essentially represented in humans by Alu sequences inserted in opposite orientations in the 3'UTR of otherwise different mRNAs. Registering these tags in separate files makes possible computational searches focused on unique sense-antisense pairs. The method developed in the present work shows that SAGE datasets constitute a major resource of rapidly investigating with high sensitivity the expression of antisense transcripts, so that a single tag may be detected in one library when screening a large number of biological samples.

SUBMITTER: Quere R 

PROVIDER: S-EPMC534641 | BioStudies | 2004-01-01T00:00:00Z

REPOSITORIES: biostudies

Similar Datasets

2007-01-01 | S-EPMC2094080 | BioStudies
2009-01-01 | S-EPMC2709617 | BioStudies
2004-01-01 | S-EPMC520878 | BioStudies
2004-01-01 | S-EPMC535903 | BioStudies
2010-01-01 | S-EPMC2879516 | BioStudies
2006-01-01 | S-EPMC1676023 | BioStudies
2007-01-01 | S-EPMC1884178 | BioStudies
2006-01-01 | S-EPMC1791009 | BioStudies
2007-01-01 | S-EPMC2034470 | BioStudies
2007-01-01 | S-EPMC2104538 | BioStudies