Remapping the SRA: Drosophila melanogaster RNA-Seq data from the Sequence Read Archive

ABSTRACT: The sequence read archive (SRA) contains over 52 terabases or 482 billion reads from Drosophila melanogaster (as of June 2018). These data are massively underused by the community and include 14,423 RNA-Seq samples, that is roughly 7 times the size of modENCODE. Currently the major challenge is finding high quality datasets that are suitable for inclusion in new studies. To help the community overcome this hurdle, we re-processed all D. melanogaster RNA-Seq SRA experiments (SRXs) using an identical workflow. This workflow uses a data driven approach to identify technical metadata (i.e., strandedness and layout) for each sample in order to optimize mapping parameters. The workflow generates QC metrics, coverage tracks based on the dm6 assembly, and calculates gene level, junction level, and intergenic counts against FlyBase r6.11.  This resource will allow any researcher to visualize browser tracks for any publicly available dataset, quickly identify high quality data sets for use in their own research, and download identically processed counts tables. There is a treasure trove of underused data sitting in the SRA and this work addresses the first challenge to make data integration a common laboratory practice. Overall design: Published Drosophila melanogaster RNA-seq data were re-mapped to dm6 and processed with an identical work flow.

INSTRUMENT(S): Illumina Genome Analyzer (Drosophila melanogaster)

SUBMITTER: Brian Oliver  

PROVIDER: GSE117217 | GEO | 2018-07-18