Genomics

Dataset Information

0

Modeling bias and variation in the stochastic processes of small RNA sequencing


ABSTRACT: The use of RNA-seq as the preferred method for the discovery and validation of small RNA biomarkers is hindered by high variability and biased sequence counts. In this paper we develop a statistical model for sequence counts that accounts for ligase bias and stochastic variation in library amplification steps and sequencing depth variation. Our analytical contributions are the description of the Linear Quadratic (LQ) relation between the mean and variance of the sequence counts in an RNA-seq experiment and the derivation of the Poisson truncated mixture as the underlying probability distribution for RNA-seq data. Using a large number of sequencing datasets, we demonstrate here how one can use this modeling framework to calculate empirical correction factors for ligase bias, while accounting for random variation in sequence counts. Bias correction may remove the majority of bias in the absence of differential expression and more than 40% of the bias in the presence of variable expression of miRNAs. Empirical bias correction factors appear to be nearly constant over at least one and up to four orders of magnitude of total RNA input and independent of sample composition.

ORGANISM(S): synthetic construct

PROVIDER: GSE93399 | GEO | 2017/03/15

SECONDARY ACCESSION(S): PRJNA360871

REPOSITORIES: GEO

Similar Datasets

2012-07-16 | E-GEOD-29022 | biostudies-arrayexpress
2012-07-17 | GSE29022 | GEO
| PRJNA358315 | ENA
2018-07-09 | GSE94584 | GEO
2014-01-25 | GSE54375 | GEO
| PRJNA430884 | ENA
2014-06-24 | E-MTAB-2566 | biostudies-arrayexpress
2009-10-22 | GSE18156 | GEO
| 62369 | ecrin-mdr-crc
2018-07-09 | GSE94586 | GEO