Browse
Submit Data
Databases
API
Help

Unknown,Transcriptomics,Genomics,Proteomics

Dataset Information

0 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences

ABSTRACT: High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomics studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Several different quantification approaches have been proposed, ranging from simple counting of reads overlapping given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of both performance and interpretability. We also illustrate that while the presence of differential isoform usage can lead to inflated false discovery rates in differential expression analyses on simple count matrices, and incorporation of transcript-level abundance estimates improves the performance in simulated data, the difference is relatively minor in several real data sets. Finally, we provide an R package (tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.

INSTRUMENT(S): Illumina HiSeq 2000

ORGANISM(S): synthetic construct

SUBMITTER: Mark Robinson

PROVIDER: E-MTAB-4119 | biostudies-arrayexpress |

REPOSITORIES: biostudies-arrayexpress

ACCESS DATA

Json Xml

Similar Datasets

Simulation-based assessment of differential transcript usage using RNA-seq data: a matter of counting

Project description:'Background: Large-scale sequencing of cDNA (RNA-seq) has been a boon to the quantitative analysis of transcriptomes. A notable application of significant biomedical relevance is the detection of changes in transcript usage between experimental conditions. For example, discovery of pathological alternative splicing may allow the development of new treatments or better management of patients. From an analysis perspective, there are several ways to represent RNA-seq data to unravel differential transcript usage, such as annotation-based exon-level counting, differential analysis of the `percent spliced in'' measure or quantitative analysis of assembled transcripts. The goal of this research is to compare and contrast current state-of-the-art methods, as well as to suggest improvements to commonly used workflows. Results: We assess the performance of representative workflows using synthetic data, and explore the effect of using non-standard counting bin definitions as input to a state-of-the-art inference engine (DEXSeq). Although the canonical counting provided the best results overall, several non-canonical approaches were as good or better in specific aspects, and most counting approaches outperformed the evaluated event- and assembly-based methods. We show that an incomplete annotation catalog can have a detrimental effect on the ability to detect differential transcript usage in transcriptomes with few isoforms per gene, and that isoform-level pre-filtering can considerably improve the false discovery rate (FDR) control. Conclusion: Count-based methods generally perform well in detection of differential transcript usage. Controlling the FDR at the imposed threshold is difficult, mainly in complex organisms, but can be improved by pre-filtering of the annotation catalog.'

2015-07-23 | E-MTAB-3766 | biostudies-arrayexpress

Mass Dynamics 1.0: A streamlined, web-based environment for analyzing, sharing and integrating Label-Free Data

Project description:Label Free Quantification (LFQ) of shotgun proteomics data is a popular and robust method for the characterization of relative protein abundance between samples. Many analytical pipelines exist for the automation of this analysis and some tools exist for the subsequent representation and inspection of the results of these pipelines. Mass Dynamics 1.0 (MD 1.0) is a web-based analysis environment that can analyse and visualize LFQ data produced by software such as MaxQuant. Unlike other tools, MD 1.0 utilizes cloud-based architecture to enable researchers to store their data, enabling researchers to not only automatically process and visualize their LFQ data but annotate and share their findings with collaborators and, if chosen, to easily publish results to the community. With a view toward increased reproducibility and standardisation in proteomics data analysis and streamlining collaboration between researchers, MD 1.0 requires minimal parameter choices and automatically generates quality control reports to verify experiment integrity. Here, we demonstrate that MD 1.0 provides reliable results for protein expression quantification, emulating Perseus on benchmark datasets over a wide dynamic range.

2022-04-13 | PXD028038 | Pride

Leukocyte gene regulation and patterns of natural language use.

Project description:Analysis of transcript abundance estimates as a function of demographic, psychometric, and language use features.

2017-12-05 | GSE87656 | GEO

Peripheral blood transcriptome profiles in Nepali child soldiers and civilians.

Project description:Analysis of transcript abundance estimates as a function of child soldier status, PTSD symptoms, and psychological resilience.

2016-07-27 | GSE77164 | GEO

Peripheral blood transcriptome profiles in adult male Japanese IT workers.

Project description:Analysis of transcript abundance estimates as a function generalized purpose in life and work-related dimensions of meaning.

2016-07-25 | GSE79092 | GEO

Transcriptome profiles derived from parallel venipuncture, dried blood spot (DBS), and saliva samples

Project description:Comparison of transcript abundance estimates derived from DBS vs saliva vs gold standard peripheral blood mononuclear cell (PBMC) samples.

2016-07-27 | GSE79269 | GEO

Peripheral blood transcriptome profiles derived from parallel venipuncture (PAXgene and mononuclear cell; PBMC) and dried blood spot (DBS) samples.

Project description:Comparison of transcript abundance estimates and bioinformatic inferences derived from DBS vs PAXgene vs peripheral blood mononuclear cell (PBMC) samples.

2016-05-18 | GSE75511 | GEO

RNAseq of a yeast diallel hybrid population

Project description:Sequencing and transcript abundance quantification in a population of ~300 Saccharomyces cerevisiae hybrids

| PRJEB64466 | ENA

Transcription profiling by array of Sulfolobus acidocaldarius and Sulfolobus solfataricus to study replication-biased genome organisation.

Project description:Determination of transcript abundance as a function of gene position on the cromosome

2010-07-15 | E-MEXP-2770 | biostudies-arrayexpress

Isolator: accurate and consistent analysis of isoform-level expression in RNA-Seq experiments

Project description:While RNA-Seq has enabled great progress towards the goal of wide-scale isoform-level mRNA quantification, short reads have limitations when resolving complex or similar sets of isoforms. As a result, estimates of isoform abundance carry far more uncertainty than those made at the gene level. When confronted with this uncertainty, commonly used methods produce estimates that are unstable -small perturbations in the data often produce dramatically different results, confounding downstream analysis. We introduce a new method, Isolator, which analyzes all samples in an experiment in unison using a simple Bayesian hierarchical model. Combined with aggressive bias correction, it produces estimates that are simultaneously accurate and stable. In a comprehensive comparison of accuracy and stability, we show that this property is unique to Isolator. We further demonstrate that the approach of modeling an entire experiment enables new analyses, which we demonstrate by examining splicing monotonicity across several time points in the development of human cardiomyocyte cells.

2020-03-17 | GSE79439 | GEO

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data