Project description:Establishing the functional roles of genetic variants remains a significant challenge in the post-genomic era. Here, we present a method, allele-specific alternative mRNA processing (ASARP), to identify genetically influenced mRNA processing events using transcriptome sequencing (RNA-Seq) data. The method examines RNA-Seq data at both single nucleotide and whole-gene/isoform levels to identify allele-specific expression (ASE) and existence of allele-specific regulation of mRNA processing. We applied the methods to data obtained from the human glioblastoma cell line U87MG and primary breast cancer tissues and found that 26M-bM-^@M-^S45% of all genes with sufficient read coverage demonstrated ASE, with significant overlap between the two cell types. Our methods predicted potential mechanisms underlying ASE due to regulations affecting either whole-gene-level expression or alternative mRNA processing, including alternative splicing, alternative polyadenylation and alternative transcriptional initiation. Allele-specific alternative splicing and alternative polyadenylation may explain ASE in hundreds of genes in each cell type. Reporter studies following these predictions identified the causal single nucleotide variants (SNVs) for several allele-specific alternative splicing events. Finally, many genes identified in our study were also reported as disease/phenotype-associated genes in genome-wide association studies. Future applications of our approach may provide ample insights for a better understanding of the genetic basis of gene regulation underlying phenotypic diversity and disease mechanisms. Examine allele-specific gene expression and alternative RNA processing in U87MG cell line
Project description:A critical task in high throughput sequencing is aligning millions of short reads to a reference genome. Alignment is especially complicated for RNA sequencing (RNA-Seq) because of RNA splicing. A number of RNA-Seq algorithms are available, and claim to align reads with high accuracy and efficiency while detecting splice junctions. RNA-Seq data is discrete in nature; therefore with reasonable gene models and comparative metrics RNA-Seq data can be simulated to sufficient accuracy to enable meaningful benchmarking of alignment algorithms. The exercise to rigorously compare all viable published RNA-Seq algorithms has not previously been performed. RESULTS: We developed an RNA-Seq simulator that models the main impediments to RNA alignment, including alternative splicing, insertions, deletions, substitutions, sequencing errors, and intron signal. We used this simulator to measure the accuracy and robustness of available algorithms at the base and junction levels. Additionally, we used RT-PCR and Sanger sequencing to validate the ability of the algorithms to detect novel transcript features such as novel exons and alternative splicing in RNA-Seq data from mouse retina. A pipeline based on BLAT was developed to explore the performance of established tools for this problem, and to compare it to the recently developed methods. This pipeline, the RNA-Seq Unified Mapper (RUM) performs comparably to the best current aligners and provides an advantageous combination of accuracy, speed and usability. RNA-Seq of mouse retinal RNA, as described.
Project description:The tumorigenesis of small intestinal neuroendocrine tumors (NETs) is poorly understood. Recent studies have associated alternative polyadenylation with proliferation, cell transformation and cancer. Polyadenylation is the process in which the pre-mRNA is cleaved at a polyA site and a polyA tail is added. Genes with two or more polyA sites can undergo alternative polyadenylation. This produces two or more distinct mRNA isoforms with different 3M-bM-^@M-^Y untranslated regions. Additionally, alternative polyadenylation can also produce mRNAs containing different 3M-bM-^@M-^Y-terminal coding regions. Therefore, alternative polyadenylation alters both the repertoire and the expression level of proteins. Here we used high-throughput sequencing data to map polyA sites and characterize polyadenylation genome-wide in three small intestinal neuroendocrine tumors and a reference sample. In the tumors sixteen genes showed significant changes of alternative polyadenylation pattern, which lead to either the 3M-bM-^@M-^Y truncation of mRNA coding regions or 3M-bM-^@M-^Y untranslated regions. Among these, 11 genes had been previously associated with cancer, with 4 genes being known tumor suppressors: DCC, PDZD2, MAGI1 and DACT2. We validated the alternative polyadenylation in 3 out of 3 cases with Q-RT-PCR. Our findings suggest that changes of alternative polyadenylation pattern in these 16 genes could be involved in the tumorigenesis of small intestinal neuroendocrine tumors. Furthermore, they also point to alternative polyadenylation as a new target for both diagnostic and treatment of small intestinal neuroendocrine tumors. The identified genes with alternative polyadenylation specific to the small intestinal neuroendocrine tumors could be further tested as diagnostic markers and drug targets for disease prevention and treatment. PolyA-seq profiling of 3 human neuroendocrine tumors compared and pituitary using Direct RNA Sequencing from Helicos Biosciences Technology
Project description:Purpose: The goal of this study was to analyse RNA-seq data to determine the effect of deletion of the RNA-binding protein HuD in transcriptiome-wide alternative splicing and polyadenylation in the neocortex of adult HuD KO vs. wild type littermates (controls) Methods: Cortical mRNA profiles of adult HuD KO (Elavl4 -/-) mice and Control mice were generated by RNA sequencing, in triplicate, using Illumina NovaSeq 6000 platform. The quality of raw RNA-sequencing reads was evaluated using FastQC software (version 0.11.5) and adapters were removed using the Cutadapt (version 1.15) and Trimmomatic (version 0.38) software. Alternative splicing was evaluated using rMATS software (version 4.0.2) and BAM files were converted to BedGraph before examining alternative polyadenylation using DaPars software (version 0.9.1) Methods (cont.): RNA-seq data was aligned to the M musculus genome (UCSC browser, mm10) using STAR (version 2.7.3a), and MultiQC (version 1.8) was used to perform a final quality check on STAR alignment files. If alignments were found to be the same read length and have >80% reads mapped to a unique location, the data was considered good quality and alternative splicing and polyadenylation analyses were performed. Sequence reads per sample were aligned to the mouse genome (build mm10). Results: HuD KO affected alternative splicing of 310 genes, including 17 validated HuD targets such as Cbx3, Cspp1, Snap25 and Gria2. In addition, deletion of HuD affected polyadenylation of 53 genes, with the majority of significantly altered mRNAs shifting towards usage of the proximal polyadenylation signal (PAS), resulting in shorter 3’ untranslated regions (3’ UTRs). Conclusions: HuD KO had a greater effect on alternative splicing than polyadenylation, with many of the affected genes implicated in several neuronal functions and neuropsychiatric disorders.
2021-03-17 | GSE169023 | GEO
Project description:Benchmarking Metagenomics Tools for Taxonomic Classification
Project description:Most eukaryotic genes harbor multiple cleavage and polyadenylation sites (PASs), leading to expression of alternative polyadenylation (APA) isoforms. APA regulation has been implicated in a diverse array of physiological and pathological conditions. While RNA sequencing tools that generate reads containing the PAS, named onSite reads, have been instrumental in identifying PASs, they have not been widely used. By contrast, a growing number of methods generate reads that are close to the PAS, named nearSite reads, including the 3’ end counting strategy commonly used in single cell analysis. How these nearSite reads can be used for APA analysis, however, is poorly studied. Here, we present a computational method, named model-based analysis of alternative polyadenylation using 3’ end-linked reads (MAAPER), to examine APA using nearSite reads. MAAPER uses a probabilistic model to predict PASs for nearSite reads with high accuracy and sensitivity, and examines different types of APA events, including those in 3’UTRs and introns, with robust statistics. We show usability of MAAPER with data from bulk RNA and single cell samples. Our result also highlights the importance of using well annotated PASs for nearSite read analysis.
Project description:Alternative cleavage and polyadenylation (APA) is emerging as an important mechanism of gene regulation in eukaryotes and plays important regulatory roles in human development and diseases. Despite the widespread application of Second Generation Sequencing (SGS) technology for polyadenylation site identification, matching each identified polyadenylation site within a gene to its derived isoform remains a major challenge. To achieve the isoform-resolved APA analysis, we developed a tool termed “IDP-APA” that constructs truly expressed isoforms and identifies polyadenylation sites by integrating the respective strengths of Third Generation Sequencing (TGS) long reads and SGS short reads. Compared to existing tools, IDP-APA demonstrated superior performance in both isoform reconstruction and polyadenylation site identification. Applications to human embryonic stem cells, breast cancer cells and brain tissue from a patient with Alzheimer’s disease revealed prevalent APA events and cell-/tissue-specific APA patterns, especially in an isoform-resolved way.
Project description:We have developed both WTSS-seq (whole transcriptome start site sequencing) and WTTS-seq (whole transcriptome termini site sequencing) methods to capture either 5’- or 3’-ends of transcripts. HATT-seq (head and tail tag sequencing) is still under development, which can be used to capture both 5’- and 3’ ends of each transcript simultaneously. Iso-seq was used to produce full-length transcripts, which can be used to validate both alternative transcription start sites and alternative polyadenylation sites. CAGE-seq was used to confirm alternative transcription start sites only.
Project description:Alternative polyadenylation has been implicated as an important regulator of gene expression. In some cases, alternative polyadenylation is known to couple with alternative splicing to influence last intron removal. However, it is unknown whether alternative polyadenylation events influence alternative splicing decisions at upstream exons. Knockdown of the polyadenylation factors CFIm25 or CstF64 was used as an approach in identifying alternative polyadenylation and alternative splicing events on a genome-wide scale. Although hundreds of alternative splicing events were found to be differentially spliced in the knockdown of CstF64, genes associated with alternative polyadenylation did not exhibit an increased incidence of alternative splicing. These results demonstrate that the coupling between alternative polyadenylation and alternative splicing is usually limited to defining the last exon. The striking influence of CstF64 knockdown on alternative splicing can be explained through its effects on UTR selection of known splicing regulators such as hnRNP A2/B1, thereby indirectly influencing splice site selection. We conclude that changes in the expression of the polyadenylation factor CstF64 influences alternative splicing through indirect effects. HeLa cell line was stably transfected with shRNA plasmids targeting CstF64. Total RNA was isolated from CstF64 KD cells and wild-type control cells using Trizol according to manufacturerâs protocols. Samples were deep sequenced in duplicate using the Illumina GAIIx system.