A new advance in alternative splicing databases: from catalogue to detailed analysis of regulation of expression and function of human alternative splicing variants.
ABSTRACT: BACKGROUND: Most human genes produce several transcripts with different exon contents by using alternative promoters, alternative polyadenylation sites and alternative splice sites. Much effort has been devoted to describing known gene transcripts through the development of numerous databases. Nevertheless, owing to the diversity of the transcriptome, there is a need for interactive databases that provide information about the potential function of each splicing variant, as well as its expression pattern. DESCRIPTION: After setting up a database in which human and mouse splicing variants were compiled, we developed tools (1) to predict the production of protein isoforms from these transcripts, taking account of the presence of open reading frames and mechanisms that could potentially eliminate transcripts and/or inhibit their translation, i.e. nonsense-mediated mRNA decay and microRNAs; (2) to support studies of the regulation of transcript expression at multiple levels, including transcription and splicing, particularly in terms of tissue specificity; and (3) to assist in experimental analysis of the expression of splicing variants. Importantly, analyses of all features from transcript metabolism to functional protein domains were integrated in a highly interactive, user-friendly web interface that allows the functional and regulatory features of gene transcripts to be assessed rapidly and accurately. CONCLUSION: In addition to identifying the transcripts produced by human and mouse genes, fast DB http://www.fast-db.com provides tools for analyzing the putative functions of these transcripts and the regulation of their expression. Therefore, fast DB has achieved an advance in alternative splicing databases by providing resources for the functional interpretation of splicing variants for the human and mouse genomes. Because gene expression studies are increasingly employed in clinical analyses, our web interface has been designed to be as user-friendly as possible and to be readily searchable and intelligible at a glance by the whole biomedical community.
Project description:Alternative gene splicing is pervasive in metazoa, particularly in humans, where the majority of genes generate splice variant transcripts. Characterizing the biological significance of alternative transcripts is methodologically difficult since it is impractical to assess thousands of splice variants as to whether they actually encode proteins, whether these proteins are functional, or whether transcripts have a function independent of protein synthesis. Consequently, to elucidate the functional significance of splice variants and to investigate mechanisms underlying the fidelity of mRNA splicing, we used an indirect approach based on analyzing the evolutionary conservation of splice variants among species. Using DNA polymerase ? as an indicator locus, we cloned and characterized the types and frequencies of transcripts generated in primary cell lines of five primate species. Overall, we found that in addition to the canonical DNA polymerase ? transcript, there were 25 alternative transcripts generated, most containing premature terminating codons. We used a statistical method borrowed from community ecology to show that there is significant diversity and little conservation in alternative splicing patterns among species, despite high sequence similarity in the underlying genomic (exonic) sequences. However, the frequency of alternative splicing at this locus correlates well with life history parameters such as the maximal longevity of each species, indicating that the alternative splicing of unproductive splice variants may have adaptive significance, even if the specific RNA transcripts themselves have no function. These results demonstrate the validity of the phylogenetic conservation approach in elucidating the biological significance of alternative splicing.
Project description:Over 60% of protein-coding genes in vertebrates express mRNAs that undergo alternative splicing. The resulting collection of transcript isoforms poses significant challenges for contemporary biological assays. For example, RT-PCR validation of gene expression microarray results may be unsuccessful if the two technologies target different splice variants. Effective use of sequence-based technologies requires knowledge of the specific splice variant(s) that are targeted. In addition, the critical roles of alternative splice forms in biological function and in disease suggest that assay results may be more informative if analyzed in the context of the targeted splice variant.A number of contemporary technologies are used for analyzing transcripts or proteins. To enable investigation of the impact of splice variation on the interpretation of data derived from those technologies, we have developed SpliceCenter. SpliceCenter is a suite of user-friendly, web-based applications that includes programs for analysis of RT-PCR primer/probe sets, effectors of RNAi, microarrays, and protein-targeting technologies. Both interactive and high-throughput implementations of the tools are provided. The interactive versions of SpliceCenter tools provide visualizations of a gene's alternative transcripts and probe target positions, enabling the user to identify which splice variants are or are not targeted. The high-throughput batch versions accept user query files and provide results in tabular form. When, for example, we used SpliceCenter's batch siRNA-Check to process the Cancer Genome Anatomy Project's large-scale shRNA library, we found that only 59% of the 50,766 shRNAs in the library target all known splice variants of the target gene, 32% target some but not all, and 9% do not target any currently annotated transcript.SpliceCenter http://discover.nci.nih.gov/splicecenter provides unique, user-friendly applications for assessing the impact of transcript variation on the design and interpretation of RT-PCR, RNAi, gene expression microarrays, antibody-based detection, and mass spectrometry proteomics. The tools are intended for use by bench biologists as well as bioinformaticists.
Project description:Alternative pre-mRNA splicing increases proteomic diversity and provides a potential mechanism underlying both phenotypic diversity and susceptibility to genetic disorders in human populations. To investigate the variation in splicing among humans on a genome-wide scale, we use a comprehensive exon-targeted microarray to examine alternative splicing in lymphoblastoid cell lines (LCLs) derived from the CEPH HapMap population. We show the identification of transcripts containing sequence verified exon skipping, intron retention, and cryptic splice site usage that are specific between individuals. A number of novel alternative splicing events with no previous annotations in either the RefSeq and EST databases were identified, indicating that we are able to discover de novo splicing events. Using family-based linkage analysis, we demonstrate Mendelian inheritance and segregation of specific splice isoforms with regulatory haplotypes for three genes: OAS1, CAST, and CRTAP. Allelic association was further used to identify individual SNPs or regulatory haplotype blocks linked to the alternative splicing event, taking advantage of the high-resolution genotype information from the CEPH HapMap population. In one candidate, we identified a regulatory polymorphism that disrupts a 5' splice site of an exon in the CAST gene, resulting in its exclusion in the mutant allele. This report illustrates that our approach can detect both annotated and novel alternatively spliced variants, and that such variation among individuals is heritable and genetically controlled.
Project description:Alternative pre-mRNA splicing generates functionally distinct transcripts from the same gene and is involved in the control of multiple cellular processes, with its dysregulation being associated with a variety of pathologies. The advent of next-generation sequencing has enabled global studies of alternative splicing in different physiological and disease contexts. However, current bioinformatics tools for alternative splicing analysis from RNA-seq data are not user-friendly, disregard available exon-exon junction quantification or have limited downstream analysis features. To overcome such limitations, we have developed psichomics, an R package with an intuitive graphical interface for alternative splicing quantification and downstream dimensionality reduction, differential splicing and gene expression and survival analyses based on The Cancer Genome Atlas, the Genotype-Tissue Expression project, the Sequence Read Archive project and user-provided data. These integrative analyses can also incorporate clinical and molecular sample-associated features. We successfully used psichomics in a laptop to reveal alternative splicing signatures specific to stage I breast cancer and associated novel putative prognostic factors.
Project description:BACKGROUND: More and more experiments have shown that transcription and mRNA processing are not two independent events but are tightly coupled to each other. Both promoter and transcription rate were found to influence alternative splicing. More than half of human genes have alternative promoters, but it is still not clear why there are so many alternative promoters and what their biological roles are. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we explored whether there is a functional correlation between alternative promoters and alternative splicing by a genome-wide analysis of human and mouse genes. We constructed a large data set of genes with alternative promoter and alternative splicing annotations. By analyzing these genes, we showed that genes with alternative promoters tended to demonstrate alternative splicing compare to genes with single promoter, and, genes with more alternative promoters tend to have more alternative splicing variants. Furthermore, transcripts from different alternative promoters tended to splice differently. CONCLUSIONS/SIGNIFICANCE: Thus at the genomic level, alternative promoters are positively correlated with alternative splicing.
Project description:Alternative splicing essentially increases the diversity of the transcriptome and has important implications for physiology, development and the genesis of diseases. Conventionally, alternative splicing is investigated in a case-by-case fashion, but this becomes cumbersome and error prone if genes show a huge abundance of different splice variants. We use a different approach and integrate all transcripts derived from a gene into a single splicing graph. Each transcript corresponds to a path in the graph, and alternative splicing is displayed by bifurcations. This representation preserves the relationships between different splicing variants and allows us to investigate systematically all possible putative transcripts. We built a database of splicing graphs for human genes, using transcript information from various major sources (Ensembl, RefSeq, STACK, TIGR and UniGene). A Web interface allows users to display the splicing graphs, to interactively assemble transcripts and to access their sequences as well as neighboring genomic regions. We also provide for each gene an exhaustive pre-computed catalog of putative transcripts--in total more than 1.2 million sequences. We found that approximately 65% of the investigated genes show evidence for alternative splicing, and in 5% of the cases, a single gene might produce over 100 transcripts.
Project description:BioInformatics Pipeline Alternative Splicing Services (BIPASS) offer support to scientists interested in gathering information related to alternative splicing (AS) events. The service BIPAS-SpliceDB provides access to AS information that has been extracted a priori from various public databases and stored in a data warehouse. In contrast, the BIPAS-Align&Splice service allows scientists to submit their own sequences and genome to compute AS analysis results. BIPAS services offer various user-friendly ways to navigate through the results. AS results are organized at different conceptual levels (clusters and sequences), and are displayed in graphs or summarized in tables that can be downloaded in XML or text format. The two BIPAS services SpliceDB and Align&Splice are available online at http://bip.umiacs.umd.edu:8080/.
Project description:Exon arrays are regularly used to analyze differential splicing events. GeneChip Gene 1.0 ST Arrays (gene arrays) manufactured by Affymetrix, Inc. are primarily used to determine expression levels of transcripts, although their basic design is rather similar to GeneChip Exon 1.0 ST Arrays (exon arrays). Here, we show that the newly developed Gene Array Analyzer (GAA), which evolved from our previously published Exon Array Analyzer (EAA), enables economic and user-friendly analysis of alternative splicing events using gene arrays. To demonstrate the applicability of GAA, we profiled alternative splicing events during embryonic heart development. In addition, we found that numerous developmental splicing events are also activated under pathological conditions. We reason that the usage of GAA considerably expands the analysis of gene expression based on gene arrays and supplies an additional level of information without further costs and with only little effort.
Project description:In eukaryotes, different combinations of exons lead to multiple transcripts with various functions in protein level, in a process called alternative splicing (AS). Unfolding the complexity of functional genomics through genome-wide profiling of AS and determining the altered ultimate products provide new insights for better understanding of many biological processes, disease progress as well as drug development programs to target harmful splicing variants. The current available tools of alternative splicing work with raw data and include heavy computation. In particular, there is a shortcoming in tools to discover AS events directly from transcripts. Here, we developed a Windows-based user-friendly tool for identifying AS events from transcripts without the need to any advanced computer skill or database download. Meanwhile, due to online working mode, our application employs the updated SpliceGraphs without the need to any resource updating. First, SpliceGraph forms based on the frequency of active splice sites in pre-mRNA. Then, the presented approach compares query transcript exons to SpliceGraph exons. The tool provides the possibility of statistical analysis of AS events as well as AS visualization compared to SpliceGraph. The developed application works for transcript sets in human and model organisms.
Project description:BACKGROUND: Alternative splicing (AS) is a central mechanism in the generation of genomic complexity and is a major contributor to transcriptome and proteome diversity. Alterations of the splicing process can lead to deregulation of crucial cellular processes and have been associated with a large spectrum of human diseases. Cancer-associated transcripts are potential molecular markers and may contribute to the development of more accurate diagnostic and prognostic methods and also serve as therapeutic targets. Alternative splicing-enriched cDNA libraries have been used to explore the variability generated by alternative splicing. In this study, by combining the use of trapping heteroduplexes and RNA amplification, we developed a powerful approach that enables transcriptome-wide exploration of the AS repertoire for identifying AS variants associated with breast tumor cells modulated by ERBB2 (HER-2/neu) oncogene expression. RESULTS: The human breast cell line (C5.2) and a pool of 5 ERBB2 over-expressing breast tumor samples were used independently for the construction of two AS-enriched libraries. In total, 2,048 partial cDNA sequences were obtained, revealing 214 alternative splicing sequence-enriched tags (ASSETs). A subset with 79 multiple exon ASSETs was compared to public databases and reported 138 different AS events. A high success rate of RT-PCR validation (94.5%) was obtained, and 2 novel AS events were identified. The influence of ERBB2-mediated expression on AS regulation was evaluated by capillary electrophoresis and probe-ligation approaches in two mammary cell lines (Hb4a and C5.2) expressing different levels of ERBB2. The relative expression balance between AS variants from 3 genes was differentially modulated by ERBB2 in this model system. CONCLUSIONS: In this study, we presented a method for exploring AS from any RNA source in a transcriptome-wide format, which can be directly easily adapted to next generation sequencers. We identified AS transcripts that were differently modulated by ERBB2-mediated expression and that can be tested as molecular markers for breast cancer. Such a methodology will be useful for completely deciphering the cancer cell transcriptome diversity resulting from AS and for finding more precise molecular markers.