Interplay between coding and exonic splicing regulatory sequences.
ABSTRACT: The inclusion of exons during the splicing process depends on the binding of splicing factors to short low-complexity regulatory sequences. The relationship between exonic splicing regulatory sequences and coding sequences is still poorly understood. We demonstrate that exons that are coregulated by any given splicing factor share a similar nucleotide composition bias and preferentially code for amino acids with similar physicochemical properties because of the nonrandomness of the genetic code. Indeed, amino acids sharing similar physicochemical properties correspond to codons that have the same nucleotide composition bias. In particular, we uncover that the TRA2A and TRA2B splicing factors that bind to adenine-rich motifs promote the inclusion of adenine-rich exons coding preferentially for hydrophilic amino acids that correspond to adenine-rich codons. SRSF2 that binds guanine/cytosine-rich motifs promotes the inclusion of GC-rich exons coding preferentially for small amino acids, whereas SRSF3 that binds cytosine-rich motifs promotes the inclusion of exons coding preferentially for uncharged amino acids, like serine and threonine that can be phosphorylated. Finally, coregulated exons encoding amino acids with similar physicochemical properties correspond to specific protein features. In conclusion, the regulation of an exon by a splicing factor that relies on the affinity of this factor for specific nucleotide(s) is tightly interconnected with the exon-encoded physicochemical properties. We therefore uncover an unanticipated bidirectional interplay between the splicing regulatory process and its biological functional outcome.
Project description:Invertases are responsible for the breakdown of sucrose to fructose and glucose. In all but one plant invertase gene, the second exon is only 9 nt in length and encodes three amino acids of a five-amino-acid sequence that is highly conserved in all invertases of plant origin. Sequences responsible for normal splicing (inclusion) of exon 2 have been investigated in vivo using the potato invertase, invGF gene. The upstream intron 1 is required for inclusion whereas the downstream intron 2 is not. Mutations within intron 1 have identified two sequence elements that are needed for inclusion: a putative branchpoint sequence and an adjacent U-rich region. Both are recognized plant intron splicing signals. The branchpoint sequence lies further upstream from the 3' splice site of intron 1 than is normally seen in plant introns. All dicotyledonous plant invertase genes contain this arrangement of sequence elements: a distal branchpoint sequence and adjacent, downstream U-rich region. Intron 1 sequences upstream of the branchpoint and sequences in exons 1, 2, or 3 do not determine inclusion, suggesting that intron or exon splicing enhancer elements seen in vertebrate mini-exon systems are absent. In addition, mutation of the 3' and 5' splice sites flanking the mini-exon cause skipping of the mini-exon, suggesting that both splice sites are required. The branchpoint/U-rich sequence is able to promote splicing of mini-exons of 6, 3, and 1 nt in length and of a chicken cTNT mini-exon of 6 nt. These sequence elements therefore act as a splicing enhancer and appear to function via interactions between factors bound at the branchpoint/U-rich region and at the 5' splice site of intron 2, activating removal of this intron followed by removal of intron 1. This first example of splicing of a plant mini-exon to be analyzed demonstrates that particular arrangement of standard plant intron splicing signals can drive constitutive splicing of a mini-exon.
Project description:In mammals, splice-regulatory domains impose marked trends on the relative abundance of certain amino acids near exon-intron boundaries. Is this a mammalian particularity or symptomatic of exonic splicing regulation across taxa? Are such trends more common in species that a priori have a harder time identifying exon ends, that is, those with pre-mRNA rich in intronic sequence? We address these questions surveying exon composition in a sample of phylogenetically diverse genomes.Biased amino acid usage near exon-intron boundaries is common throughout the metazoa but not restricted to the metazoa. There is extensive cross-species concordance as to which amino acids are affected, and reduced/elevated abundances are well predicted by knowledge of splice enhancers. Species expected to rely on exon definition for splicing, that is, those with a higher ratio of intronic to coding sequence, more introns per gene and longer introns, exhibit more amino acid skews. Notably, this includes the intron-rich basidiomycete Cryptococcus neoformans, which, unlike intron-poor ascomycetes (Schizosaccharomyces pombe, Saccharomyces cerevisiae), exhibits compositional biases reminiscent of the metazoa. Strikingly, 5 prime ends of nematode exons deviate radically from normality: amino acids strongly preferred near boundaries are strongly avoided in other species, and vice versa. This we suggest is a measure to avoid attracting trans-splicing machinery.Constraints on amino acid composition near exon-intron boundaries are phylogenetically widespread and characteristic of species where exon localization should be problematic. That compositional biases accord with sequence preferences of splice-regulatory proteins and are absent in ascomycetes is consistent with selection on exonic splicing regulation.
Project description:The Down syndrome cell adhesion molecule (Dscam) gene has essential roles in neural wiring and pathogen recognition in Drosophila melanogaster. Dscam encodes 38,016 distinct isoforms via extensive alternative splicing. The 95 alternative exons in Dscam are organized into clusters that are spliced in a mutually exclusive manner. The exon 6 cluster contains 48 variable exons and uses a complex system of competing RNA structures to ensure that only one variable exon is included. Here we show that the heterogeneous nuclear ribonucleoprotein hrp36 acts specifically within, and throughout, the exon 6 cluster to prevent the inclusion of multiple exons. Moreover, hrp36 prevents serine/arginine-rich proteins from promoting the ectopic inclusion of multiple exon 6 variants. Thus, the fidelity of mutually exclusive splicing in the exon 6 cluster is governed by an intricate combination of alternative RNA structures and a globally acting splicing repressor.
Project description:Mining massive amounts of transcript data for alternative splicing information is paramount to help understand how the maturation of RNA regulates gene expression. We developed an algorithm to cluster transcript data to annotated genes to detect unannotated splice variants. A higher number of alternatively spliced genes and isoforms were found compared to other alternative splicing databases. Comparison of human and mouse data revealed a marked increase, in human, of splice variants incorporating novel exons and retained introns. Previously unannotated exons were validated by tiling array expression data and shown to correspond preferentially to novel first exons. Retained introns were validated by tiling array and deep sequencing data. The majority of retained introns were shorter than 500 nt and had weak polypyrimidine tracts. A subset of retained introns matching small RNAs and displaying a high GC content suggests a possible coordination between splicing regulation and production of noncoding RNAs. Conservation of unannotated exons and retained introns was higher in horse, dog and cow than in rodents, and 64% of exon sequences were only found in primates. This analysis highlights previously bypassed alternative splice variants, which may be crucial to deciphering more complex pathways of gene regulation in human.
Project description:BACKGROUND: Focal adhesion kinase (FAK) is a non-receptor tyrosine kinase critical for processes ranging from embryo development to cancer progression. Although isoforms with specific molecular and functional properties have been characterized in rodents and chicken, the organization of FAK gene throughout phylogeny and its potential to generate multiple isoforms are not well understood. Here, we study the phylogeny of FAK, the organization of its gene, and its post-transcriptional processing in rodents and human. RESULTS: A single orthologue of FAK and the related PYK2 was found in non-vertebrate species. Gene duplication probably occurred in deuterostomes after the echinoderma embranchment, leading to the evolution of PYK2 with distinct properties. The amino acid sequence of FAK and PYK2 is conserved in their functional domains but not in their linker regions, with the absence of autophosphorylation site in C. elegans. Comparison of mouse and human FAK genes revealed the existence of multiple combinations of conserved and non-conserved 5'-untranslated exons in FAK transcripts suggesting a complex regulation of their expression. Four alternatively spliced coding exons (13, 14, 16, and 31), previously described in rodents, are highly conserved in vertebrates. Cis-regulatory elements known to regulate alternative splicing were found in conserved alternative exons of FAK or in the flanking introns. In contrast, other reported human variant exons were restricted to Homo sapiens, and, in some cases, other primates. Several of these non-conserved exons may correspond to transposable elements. The inclusion of conserved alternative exons was examined by RT-PCR in mouse and human brain during development. Inclusion of exons 14 and 16 peaked at the end of embryonic life, whereas inclusion of exon 13 increased steadily until adulthood. Study of various tissues showed that inclusion of these exons also occurred, independently from each other, in a tissue-specific fashion. CONCLUSION: The alternative coding exons 13, 14, 16, and 31 are highly conserved in vertebrates and their inclusion in mRNA is tightly but independently regulated. These exons may therefore be crucial for FAK function in specific tissues or during development. Conversely pathological disturbance of the expression of FAK and of its isoforms could lead to abnormal cellular regulation.
Project description:Splicing factor SRSF10 is known to function as a sequence-specific splicing activator that is capable of regulating alternative splicing both in vitro and in vivo. We recently used an RNA-seq approach coupled with bioinformatics analysis to identify the extensive splicing network regulated by SRSF10 in chicken cells. We found that SRSF10 promoted both exon inclusion and exclusion. Functionally, many of the SRSF10-verified alternative exons are linked to pathways of response to external stimulus. Here we describe in detail the experimental design, bioinformatics analysis and GO/pathway enrichment analysis of SRSF10-regulated genes to correspond with our data in the Gene Expression Omnibus with accession number GSE53354. Our data thus provide a resource for studying regulation of alternative splicing in vivo that underlines biological functions of splicing regulatory proteins in cells.
Project description:We have discovered that positions of splice junctions in genes are constrained by the tolerance for disorder-promoting amino acids in the translated protein region. It is known that efficient splicing requires nucleotide bias at the splice junction; the preferred usage produces a distribution of amino acids that is disorder-promoting. We observe that efficiency of splicing, as seen in the amino-acid distribution, is not compromised to accommodate globular structure. Thus we infer that it is the positions of splice junctions in the gene that must be under constraint by the local protein environment. Examining exonic splicing enhancers found near the splice junction in the gene, reveals that these (short DNA motifs) are more prevalent in exons that encode disordered protein regions than exons encoding structured regions. Thus we also conclude that local protein features constrain efficient splicing more in structure than in disorder.
Project description:Alternative splicing (AS) is a robust generator of mammalian transcriptome complexity. Splice site specification is controlled by interactions of cis-acting determinants on a transcript with specific RNA binding proteins. These interactions are frequently localized to the intronic U-rich polypyrimidine tracts (PPT) located 5' to the majority of splice acceptor junctions. ?CPs (also referred to as polyC-binding proteins (PCBPs) and hnRNPEs) comprise a subset of KH-domain proteins with high affinity and specificity for C-rich polypyrimidine motifs. Here, we demonstrate that ?CPs promote the splicing of a defined subset of cassette exons via binding to a C-rich subset of polypyrimidine tracts located 5' to the ?CP-enhanced exonic segments. This enhancement of splice acceptor activity is linked to interactions of ?CPs with the U2 snRNP complex and may be mediated by cooperative interactions with the canonical polypyrimidine tract binding protein, U2AF65. Analysis of ?CP-targeted exons predicts a substantial impact on fundamental cell functions. These findings lead us to conclude that the ?CPs play a direct and global role in modulating the splicing activity and inclusion of an array of cassette exons, thus driving a novel pathway of splice site regulation within the mammalian transcriptome.
Project description:TDP-43 is a predominantly nuclear RNA-binding protein that forms inclusion bodies in frontotemporal lobar degeneration (FTLD) and amyotrophic lateral sclerosis (ALS). The mRNA targets of TDP-43 in the human brain and its role in RNA processing are largely unknown. Using individual nucleotide-resolution ultraviolet cross-linking and immunoprecipitation (iCLIP), we found that TDP-43 preferentially bound long clusters of UG-rich sequences in vivo. Analysis of RNA binding by TDP-43 in brains from subjects with FTLD revealed that the greatest increases in binding were to the MALAT1 and NEAT1 noncoding RNAs. We also found that binding of TDP-43 to pre-mRNAs influenced alternative splicing in a similar position-dependent manner to Nova proteins. In addition, we identified unusually long clusters of TDP-43 binding at deep intronic positions downstream of silenced exons. A substantial proportion of alternative mRNA isoforms regulated by TDP-43 encode proteins that regulate neuronal development or have been implicated in neurological diseases, highlighting the importance of TDP-43 for the regulation of splicing in the brain.
Project description:To characterize the rules governing exon recognition during splicing, we analyzed RNA-seq datasets and identified ~4,000 GC-rich and ~5,000 AT-rich exons, labelled GC-exons and AT-exons, respectively whose inclusion depends on different sets of splicing factors. We show that a high GC-load is associated with predicted RNA secondary structures at 5'ss and that GC-exons are dependent on U1 snRNP-associated proteins. Meanwhile, a high AT-load is associated with a large number of decoy splicing-related signals upstream exons such as the number of branchpoints and SF1- or U2AF65-binding sites and AT-exons are dependent on U2 snRNP-associated proteins. Nucleotide composition bias also influences local chromatin organization. Since the GC content of exons correlates with that of their hosting-genes, -isochores and – topologically-associated domains, we propose that regional nucleotide composition bias leaves a footprint locally, at the exon level, inducing, during splicing, constraints that are alleviated by the local chromatin organization and specific splicing factors. Overall design: Samples siFUS, siGL2, siHNRNPC, siHNRNPH1, siHNRNPK and siTRA2A-B were made together and are part of the same experiment (simplicates). Samples siPP-1, siPP-2, siPP-3, siGL2-1, siGL2-2 and siGL2-3 were made together from different cell batchs and are part of the same experiment (triplicates).