Browse
Submit Data
Databases
API
Help

Dataset Information

0 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

Modeling bias and variation in the stochastic processes of small RNA sequencing

ABSTRACT: The use of RNA-seq as the preferred method for the discovery and validation of small RNA biomarkers is hindered by high variability and biased sequence counts. In this paper we develop a statistical model for sequence counts that accounts for ligase bias and stochastic variation in library amplification steps and sequencing depth variation. Our analytical contributions are the description of the Linear Quadratic (LQ) relation between the mean and variance of the sequence counts in an RNA-seq experiment and the derivation of the Poisson truncated mixture as the underlying probability distribution for RNA-seq data. Using a large number of sequencing datasets, we demonstrate here how one can use this modeling framework to calculate empirical correction factors for ligase bias, while accounting for random variation in sequence counts. Bias correction may remove the majority of bias in the absence of differential expression and more than 40% of the bias in the presence of variable expression of miRNAs. Empirical bias correction factors appear to be nearly constant over at least one and up to four orders of magnitude of total RNA input and independent of sample composition.

ORGANISM(S): synthetic construct

PROVIDER: GSE93399 | GEO | 2017/03/15

SECONDARY ACCESSION(S): PRJNA360871

REPOSITORIES: GEO

ACCESS DATA

Json Xml

Dataset's files

Source:

			Action	DRS
		Other

Items per page:

1 - 1 of 1

Similar Datasets

Homo sapiens

Project description:Universal correction of enzymatic sequence bias

| PRJNA358315 | ENA

Genome-wide mapping of the Galleria mellonella larvae transcription start sites during infection with the madurella mycetomatis pathogen

Project description:Using Low Quantity single strand CAGE (LQ-ssCAGE) we mapped the transcription start sites (TSS) and annotated the 5' end of the invertebrate Galleria mellonella which is upcoming and booming in the last years as an experimental model in infection disease and immunology research. The current genome annotation of this model lacks the annotation of the 5' end and TSS information. To map TSS under healthy and infection conditions, the G. mellonella larva was infected with the fungal pathogen Madurella mycetomatis After 4, 30 and 52 hours of the infection, larvae were treated with itraconazole or ravuconazole. RNA-seq and LQ-ssCAGE libraries prepared and sequenced. The LQ-ssCAGE data was processed to identify CAGE transcription start site (CTSS), uni- and bi-directional clusters. LQ-ssCAGE enabled us to precisely identify (39,410) TSS and (249) active enhancers. We assigned genomic features to the resulting TSSs and enhancers. The majority of the TSS peaks are annotated as promoter regions while the enhancers were annotated as intergenic and genic. Furthermore, we confirmed the quality of TSS by promoter shapes and GC bias. We identified a set of super-enhancers and predicted de-novo motifs. In this study we reported the first atlas of TSS and active enhancers of the G. mellonella.

2024-12-11 | GSE282923 | GEO

Systematic assessment of next-generation sequencing for quantitative small RNA profiling: synthetic equimolar pool

Project description:Small RNA-seq is increasingly being used for profiling of small RNAs. Quantitative characteristics of long RNA-seq have been extensively described, but small RNA-seq involves fundamentally different methods for library preparation, with distinct protocols and technical variations that have not been fully and systematically studied. Using common sets of reference samples, we evaluated the accuracy, reproducibility and bias of small RNA-seq library preparation for five distinct protocols and across nine different laboratories. As part of this larger study, we assessed sequencing bias and reproducibility using an equimolar pool of 1,152 small RNA sequences ranging from 15-90 nt, and primarily comprised of annotated human microRNAs. We observed extensive protocol-specific and sequence-specific bias that was largely mitigated in protocols employing sequencing adapters with randomized end-nucleotides. We find that sequencing bias is highly reproducible across labs using the same library preparation technologies, and use the data to calculate inter-protocol bias correction factors. These results provide strong evidence for the feasibility of reproducible cross-laboratory small RNA-seq studies, even those involving analysis of data generated using different protocols.

2018-07-09 | GSE94584 | GEO

BayMeth: improved DNA methylation quantification for affinity capture sequencing data using a flexible Bayesian approach

Project description:Affinity capture of DNA methylation combined with high-throughput sequencing strikes a good balance between the high cost of whole genome bisulfite sequencing and the low coverage of methylation arrays. We present BayMeth, an empirical Bayes approach that uses a fully methylated control sample to transform observed read counts into regional methylation levels. In our model, inefficient capture can readily be distinguished from low methylation levels. BayMeth improves on existing methods, allows explicit modeling of copy number variation, and offers computationally-efficient analytical mean and variance estimators. BayMeth is available in the Repitools Bioconductor package.

2014-01-25 | GSE54375 | GEO

Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data.

Project description:Next-generation sequencing has become an important tool for genome-wide quantification of DNA and RNA. However, a major technical hurdle lies in the need to map short sequence reads back to their correct locations in a reference genome. Here we investigate the impact of SNP variation on the reliability of read-mapping in the context of detecting allele-specific expression (ASE).We generated sixteen million 35 bp reads from mRNA of each of two HapMap Yoruba individuals. When we mapped these reads to the human genome we found that, at heterozygous SNPs, there was a significant bias towards higher mapping rates of the allele in the reference sequence, compared to the alternative allele. Masking known SNP positions in the genome sequence eliminated the reference bias but, surprisingly, did not lead to more reliable results overall. We find that even after masking, $\sim$5-10\% of SNPs still have an inherent bias towards more effective mapping of one allele. Filtering out inherently biased SNPs removes 40\% of the top signals of ASE. The remaining SNPs showing ASE are enriched in genes previously known to harbor cis-regulatory variation or known to show uniparental imprinting. Our results have implications for a variety of applications involving detection of alternate alleles from short-read sequence data. Scripts, written in Perl and R, for simulating short reads, masking SNP variation in a reference genome, and analyzing the simulation output are available upon request from JFD.

2009-10-22 | GSE18156 | GEO

Project description:Profiling of DNA ligase fidelity and bias

| PRJNA430884 | ENA

Evaluating bias-reducing protocols for RNA sequencing library preparation

Project description:The ligation step in RNA sequencing library generation is a known source of bias. We present the first comparison of the standard duplex adaptor protocol supplied by Life Technologies for use on the Ion Torrent PGM with an alternate single adaptor approach involving CircLigase (CircLig). We also investigate whether using the thermostable ligase Methanobacterium thermoautotrophicum RNA ligase K97A (Mth K97A) for the initial ligation step in the CircLigase protocol reduces bias. A pool of small RNA fragments of known composition was converted into a sequencing library using one of three protocols and sequenced on an Ion Torrent PGM. The single adaptor CircLigase-based approach significantly reduces, but does not eliminate, bias in Ion Torrent data. Using Mth K97A as part of the CircLig method does not further reduce bias.

2014-06-24 | E-MTAB-2566 | biostudies-arrayexpress

Systematic assessment of next-generation sequencing for quantitative small RNA profiling: a multiple protocol study across multiple laboratories

Project description:Small RNA-seq is increasingly being used for profiling of small RNAs. Quantitative characteristics of long RNA-seq have been extensively described, but small RNA-seq involves fundamentally different methods for library preparation, with distinct protocols and technical variations that have not been fully and systematically studied. We report here the results of a study using common references (synthetic RNA pools of defined composition, as well as plasma-derived RNA) to evaluate the accuracy, reproducibility and bias of small RNA-seq library preparation for five distinct protocols and across nine different laboratories. We observed protocol-specific and sequence-specific bias, which was ameliorated using adapters for ligation with randomized end-nucleotides, and computational correction factors. Despite this technical bias, relative quantification using small RNA-seq was remarkably accurate and reproducible, even across multiple laboratories using different methods. These results provide strong evidence for the feasibility of reproducible cross-laboratory small RNA-seq studies, even those involving analysis of data generated using different protocols. This SuperSeries is composed of the SubSeries listed below.

2018-07-09 | GSE94586 | GEO

AUD Biomarkers Study (Proteomic and Genomic Analysis of Biospecimens)

Project description:Study purpose: to explore the entire spectrum of proteomic and genomic changes (amongst others) involved in diseases and in healthy/control populations. The Study is designed to discover biomarkers, develop and validate diagnostic assays, instruments and therapeutics as well as other medical research. Specifically, researchers may analyze proteins, RNA, DNA copy number changes, including large and small (1,000-100,000 kb) scale rearrangements, transcription profiles, epigenetic modifications, sequence variation, and sequence in both diseased tissue and case-matched germline DNA from Subjects.

| 62369 | ecrin-mdr-crc

Batch effects and the effective design of single-cell gene expression studies

Project description:Single cell RNA sequencing (scRNA-seq) can be used to characterize variation in gene expression levels at high resolution. However, the sources of experimental noise in scRNA-seq are not yet well understood. We investigated the technical variation associated with sample processing using the single cell Fluidigm C1 platform. To do so, we processed three C1 replicates from three human induced pluripotent stem cell (iPSC) lines. We added unique molecular identifiers (UMIs) to all samples, to account for amplification bias. We found that the major source of variation in the gene expression data was driven by genotype, but we also observed substantial variation between the technical replicates. We observed that the conversion of reads to molecules using the UMIs was impacted by both biological and technical variation, indicating that UMI counts are not an unbiased estimator of gene expression levels. Based on our results, we suggest a framework for effective scRNA-seq studies.

2016-07-08 | GSE77288 | GEO

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data