Browse
Submit Data
Databases
API
Help

Dataset Information

0 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

High-resolution transcriptome analysis with long-read RNA sequencing

ABSTRACT: Ongoing improvements to next generation sequencing technologies are leading to longer sequencing read lengths, but a thorough understanding of the impact of longer reads on RNA sequencing analyses is lacking. To address this issue, we generated and compared two RNA sequencing datasets of differing read lengths -- 2x75 bp (L75) and 2x262 bp (L262) -- and investigated the impact of read length on various aspects of analysis, including the performance of currently available read-mapping tools, gene and transcript quantification, and detection of allele-specific expression patterns. Our results indicate that, while the scalability of read-mapping tools and the cost-effectiveness of long read protocol is an issue that requires further attention, longer reads enable more accurate quantification of diverse aspects of gene expression, including individual-specific patterns of allele-specific expression and alternative splicing.

ORGANISM(S): Homo sapiens

PROVIDER: GSE57862 | GEO | 2014/09/25

SECONDARY ACCESSION(S): PRJNA248285

REPOSITORIES: GEO

ACCESS DATA

Json Xml

Dataset's files

Source:

			Action	DRS
		Other

Items per page:

1 - 1 of 1

Similar Datasets

High-resolution transcriptome analysis with long-read RNA sequencing

Project description:Ongoing improvements to next generation sequencing technologies are leading to longer sequencing read lengths, but a thorough understanding of the impact of longer reads on RNA sequencing analyses is lacking. To address this issue, we generated and compared two RNA sequencing datasets of differing read lengths -- 2x75 bp (L75) and 2x262 bp (L262) -- and investigated the impact of read length on various aspects of analysis, including the performance of currently available read-mapping tools, gene and transcript quantification, and detection of allele-specific expression patterns. Our results indicate that, while the scalability of read-mapping tools and the cost-effectiveness of long read protocol is an issue that requires further attention, longer reads enable more accurate quantification of diverse aspects of gene expression, including individual-specific patterns of allele-specific expression and alternative splicing. Two RNA-Seq datasets of differing read lengths (2x262 bp and 2x75 bp)

2014-09-25 | E-GEOD-57862 | biostudies-arrayexpress

Enhanced whole exome sequencing by higherDNA insert lengths

Project description:Background: Whole exome sequencing (WES) has been proven to serve as a valuable basis for various applications such as variant calling and copy number variation (CNV) analyses. For those analyses the read coverage should be optimally balanced throughout protein coding regions at sufficient read depth. Unfortunately, WES is known for its uneven coverage within coding regions due to GC-rich regions or off-target enrichment. Results: In order to examine the irregularities of WES within genes, we applied Agilent SureSelectXT exome capture on human samples and sequenced these via Illumina in 2x101 paired-end mode. As we suspected the sequenced insert length to be crucial in the uneven coverage of exome captured samples, we sheared 12 genomic DNA samples to two different DNA insert size lengths, namely 130 and 170 bp. Interestingly, although mean coverages of target regions were clearly higher in samples of 130 bp insert length, the level of evenness was more pronounced in 170 bp samples. Moreover, merging overlapping paired-end reads revealed a positive effect on evenness indicating overlapping reads as another reason for the unevenness. In addition, mutation analysis on a subset of the samples was performed. In these isogenic subclones almost twofold mutations were failed in the 130 bp samples when compared to the 170 bp samples. Visual inspection of the discarded mutation sites exposed low coverages at the sites embedded in high amplitudes of coverage depth in the affected region. Conclusions: Producing longer insert reads could be a good strategy to achieve better uniform read coverage in coding regions and hereby enhancing the effective sequencing yield to provide an improved basis for further variant calling and CNV analyses.

2016-07-28 | E-MTAB-4527 | biostudies-arrayexpress

LocusMasterTE: long-read assisted short-read RNA-seq TE quantification [long]

Project description:With an ability to compromise genome integrity, transposable elements (TEs) have significant associations with human diseases. Short-read sequencing has been used to study the expression of TEs; however, the highly repetitive nature of these elements makes multimapping a critical issue. Here we implement LocusMasterTE, an improved quantification method by integrating long-read sequencing. Introducing computed transcript per million(TPM) counts from long-read sequencing as prior distribution during Expectation-Maximization(EM) model in short-read TE quantification, multi-mapped reads are re-assigned to correct expression values. Based on simulated short reads, LocusMasterTE outperforms current quantitative approaches and is significantly favorable in capturing newly inserted TEs. We also verified that TEs quantified by LocusMasterTE clearly related to euchromatins and heterochromatins in cell line samples. With LocusMasterTE we anticipate that more accurate quantification can be performed, allowing novel functions of TEs to be uncovered.

2023-09-01 | GSE225377 | GEO

LocusMasterTE: long-read assisted short-read TE quantification [short]

2023-09-01 | GSE225380 | GEO

Sequencing-based quantitative mapping of the cellular small RNA landscape

Project description:We report the development of an RNA sequencing method – AQRNA-seq – that minimizes biases and enables absolute quantification of all small RNA species in a sample mixture. Validation of AQRNA-seq library preparation and data mining algorithms using a 963-member microRNA reference library, RNA oligonucleotide standards of varying lengths, and northern blots demonstrated a direct, linear correlation between sequencing read count and RNA abundance.

2020-01-01 | GSE139936 | GEO

Long-read transcriptome sequencing reveals abundant promoter diversity in distinct molecular subtypes of gastric cancer

Project description:Deregulated gene expression is a hallmark of cancer, however most studies to date have analyzed short-read RNA-sequencing data with inherent limitations. Here, we combine PacBio long-read isoform sequencing (Iso-Seq) and Illumina paired-end short read RNA sequencing to comprehensively survey the transcriptome of gastric cancer (GC), a leading cause of global cancer mortality. We performed full-length transcriptome analysis across 10 GC cell lines covering four major GC molecular subtypes (chromosomal unstable, Epstein-Barr positive, genome stable and microsatellite unstable). We identify 60,239 non-redundant full-length transcripts, of which >66% are novel compared to current transcriptome databases. Novel isoforms are more likely to be cell-line and subtype specific, expressed at lower levels with larger number of exons, with longer isoform/coding sequence lengths. Most novel isoforms utilize an alternate first exon, and compared to other alternative splicing categories are expressed at higher levels and exhibit higher variability. Collectively, we observe alternate promoter usage in 25% of detected genes, with the majority (84.2%) of known/novel promoter pairs exhibiting potential changes in their coding sequences. Mapping these alternate promoters to TCGA GC samples, we identify several cancer-associated isoforms, including novel variants of oncogenes. Tumor-specific transcript isoforms tend to alter protein coding sequences to a larger extent than other isoforms. Analysis of outcome data suggests that novel isoforms may impart additional prognostic information. Our results provide a rich resource of full-length transcriptome data for deeper studies of GC and other gastrointestinal malignancies.

2021-02-02 | PXD023373 | Pride

Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data.

Project description:Next-generation sequencing has become an important tool for genome-wide quantification of DNA and RNA. However, a major technical hurdle lies in the need to map short sequence reads back to their correct locations in a reference genome. Here we investigate the impact of SNP variation on the reliability of read-mapping in the context of detecting allele-specific expression (ASE).We generated sixteen million 35 bp reads from mRNA of each of two HapMap Yoruba individuals. When we mapped these reads to the human genome we found that, at heterozygous SNPs, there was a significant bias towards higher mapping rates of the allele in the reference sequence, compared to the alternative allele. Masking known SNP positions in the genome sequence eliminated the reference bias but, surprisingly, did not lead to more reliable results overall. We find that even after masking, $\sim$5-10\% of SNPs still have an inherent bias towards more effective mapping of one allele. Filtering out inherently biased SNPs removes 40\% of the top signals of ASE. The remaining SNPs showing ASE are enriched in genes previously known to harbor cis-regulatory variation or known to show uniparental imprinting. Our results have implications for a variety of applications involving detection of alternate alleles from short-read sequence data. Scripts, written in Perl and R, for simulating short reads, masking SNP variation in a reference genome, and analyzing the simulation output are available upon request from JFD. RNA-Seq on two YRI Hapmap cell lines. Each individual sequenced on two lanes of the Illumina Genome Analyzer

2009-10-22 | E-GEOD-18156 | biostudies-arrayexpress

Identifying Hepatocyte Growth Factor dose response functions in Madin-Darby Canine Kidney Epithelial Cells

Project description:Two biological replicates of Madin-Darby Canine Kidney Epithelial Cells grown as 3D cysts in Collagen Type I (7 days old) were exposed to six different concentrations of Hepatocyte Growth Factor (HGF) (0, 1.03, 2.07, 4.15, 8.33 and 16.67 ng/ml). Total RNA was isolated from the cysts after 12 hours of HGF induction. The data submitted here are the raw sequence files of the single read lengths of 50 bp for the 12 samples (2 replicates X 6 conditions) after RNA sequencing experiment using Illumina HiSeq 2000.15

2016-01-17 | E-MTAB-4959 | biostudies-arrayexpress

Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data.

2009-10-22 | GSE18156 | GEO

Stockholm-Tartu Atherosclerosis Reverse Network Engineering Task (STARNET)

Project description:<p>STARNET is a genetics of RNA expression study of multiple disease-relevant tissues obtained from living patients with cardiovascular disease. Tissue samples are obtained from blood, atherosclerotic-lesion-free internal mammary artery (MAM) and atherosclerotic aortic root (AOR), subcutaneous fat (SF), visceral abdominal fat (VAF), skeletal muscle (SKLM), and liver (LIV) during open thorax surgery of 600 coronary artery disease (CAD) patients. All patients gave written informed consent. The inclusion criterion was eligibility for coronary artery by-pass graft (CABG) surgery. Patients with other severe systemic diseases, such as active systemic inflammatory disease or cancer, were excluded. The primary clinical end points were the SYNTAX score based on the extent of coronary atherosclerosis assessed from preoperative angiograms. The STARNET patients are Caucasians (31% females); 32% had diabetes, 75% had hypertension, and 67% had hyperlipidemia; and 33% had an MI before age 60. By New York Heart Association criteria, 45% were class I, 42% class II, 9% class III, and 1% class IV. </p> <p>TYPES AND RNA SEQUENCING: 566 DNA genotype and 3577 RNA-seq profiles from seven tissues from 600 STARNET CABG patients passed quality control (on average 511 RNA-seq profiles/tissue). DNA was genotyped with the OmniExpress Exome array (Illumina, ~900k SNPs) and imputed to a total of 14,098,063 DNA variant calls (6,245,505 with minor allele frequency >5%). The STARNET subjects mainly overlap with Caucasian of Northern European (Finnish) descent. RNA sequencing was performed using the HighSeq2000 platform, poly-A (LIV, SKLM, VAF, SF and blood) and ribo-zero (AOR, MAM) protocols with 50-100 bp read lengths, single end to 15-30 million read depth. </p>

| phs001203 | dbGaP

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data