Transcriptomics

Dataset Information

0

Baiting out a full length sequence from unmapped RNA-seq data


ABSTRACT: Usually, unmapped reads have been considered as useless and been trashed or ignored. Here, we develop a strategy to mining the full length sequence by unmapped reads combining with specific reverse transcription primers design and high throughput sequencing. In this study, we salvage 36 unmapped reads from standard RNA-Seq data(GSM3188619) and randomly select one 149 bp read as a model(CTGGTGCCATAATTCAGGGAACTGTGTTCTTGATGTACTATCTGAGACATTTGTGCTTCCCCCCATCCAGCTATCAGGCTGTTAGGCAATGCACTTCTAGGAATTAGAATTCTATAAGGAATCTCATGCTGGAAGAACAAAAAGACCCA ). Specific reverse transcription primers(5' end:CTGGTGCCATAATTCAGGGA, 3' end:GGATCTTCACGTAACGGATTGT) are designed to amplify its both ends, followed by next generation sequencing. Then we use a statistical model base on power law distribution to estimate its integrality and significance. Further, we validate it by Sanger sequencing. The result shows that the full length is 1,556 bp, with InDel mutation in microsatellite structure. This would be a useful strategy to extract the sequences information from the unmapped RNA-seq data.

ORGANISM(S): Mus musculus

PROVIDER: GSE172487 | GEO | 2021/04/22

REPOSITORIES: GEO

Similar Datasets

2011-11-23 | E-GEOD-33905 | biostudies-arrayexpress
2017-04-19 | GSE93749 | GEO
2018-10-23 | GSE74771 | GEO
2011-11-10 | E-GEOD-33600 | biostudies-arrayexpress
2020-01-15 | GSE115803 | GEO
2016-07-28 | E-MTAB-4527 | biostudies-arrayexpress
| PRJNA723548 | ENA
2011-02-14 | E-GEOD-27221 | biostudies-arrayexpress
2022-12-12 | GSE186152 | GEO
2012-05-09 | E-GEOD-37909 | biostudies-arrayexpress