Unknown

Dataset Information

0

Modeling and cleaning RNA-seq data significantly improve detection of differentially expressed genes.


ABSTRACT:

Background

RNA-seq has become a standard technology to quantify mRNA. The measured values usually vary by several orders of magnitude, and while the detection of differences at high values is statistically well grounded, the significance of the differences for rare mRNAs can be weakened by the presence of biological and technical noise.

Results

We have developed a method for cleaning RNA-seq data, which improves the detection of differentially expressed genes and specifically genes with low to moderate transcription. Using a data modeling approach, parameters of randomly distributed mRNA counts are identified and reads, most probably originating from technical noise, are removed. We demonstrate that the removal of this random component leads to the significant increase in the number of detected differentially expressed genes, more significant pvalues and no bias towards low-count genes.

Conclusion

Application of RNAdeNoise to our RNA-seq data on polysome profiling and several published RNA-seq datasets reveals its suitability for different organisms and sequencing technologies such as Illumina and BGI, shows improved detection of differentially expressed genes, and excludes the subjective setting of thresholds for minimal RNA counts. The program, RNA-seq data, resulted gene lists and examples of use are in the supplementary data and at https://github.com/Deyneko/RNAdeNoise .

SUBMITTER: Deyneko IV 

PROVIDER: S-EPMC9670425 | biostudies-literature | 2022 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Modeling and cleaning RNA-seq data significantly improve detection of differentially expressed genes.

Deyneko Igor V IV   Mustafaev Orkhan N ON   Tyurin Alexander А AА   Zhukova Ksenya V KV   Varzari Alexander A   Goldenkova-Pavlova Irina V IV  

BMC bioinformatics 20221116 1


<h4>Background</h4>RNA-seq has become a standard technology to quantify mRNA. The measured values usually vary by several orders of magnitude, and while the detection of differences at high values is statistically well grounded, the significance of the differences for rare mRNAs can be weakened by the presence of biological and technical noise.<h4>Results</h4>We have developed a method for cleaning RNA-seq data, which improves the detection of differentially expressed genes and specifically gene  ...[more]

Similar Datasets

| S-EPMC8582999 | biostudies-literature
| S-EPMC3488134 | biostudies-literature
| S-EPMC10316566 | biostudies-literature
| S-EPMC5592911 | biostudies-literature
| S-EPMC6284200 | biostudies-literature
| S-EPMC5349981 | biostudies-literature
| S-EPMC10940175 | biostudies-literature
| S-EPMC5307323 | biostudies-literature
| S-EPMC10137460 | biostudies-literature
| S-EPMC4670531 | biostudies-literature