G protein-coupled receptor genes in the FANTOM2 database.
ABSTRACT: G protein-coupled receptors (GPCRs) comprise the largest family of receptor proteins in mammals and play important roles in many physiological and pathological processes. Gene expression of GPCRs is temporally and spatially regulated, and many splicing variants are also described. In many instances, different expression profiles of GPCR gene are accountable for the changes of its biological function. Therefore, it is intriguing to assess the complexity of the transcriptome of GPCRs in various mammalian organs. In this study, we took advantage of the FANTOM2 (Functional Annotation Meeting of Mouse cDNA 2) project, which aimed to collect full-length cDNAs inclusively from mouse tissues, and found 410 candidate GPCR cDNAs. Clustering of these clones into transcriptional units (TUs) reduced this number to 213. Out of these, 165 genes were represented within the known 308 GPCRs in the Mouse Genome Informatics (MGI) resource. The remaining 48 genes were new to mouse, and 14 of them had no clear mammalian ortholog. To dissect the detailed characteristics of each transcript, tissue distribution pattern and alternative splicing were also ascertained. We found many splicing variants of GPCRs that may have a relevance to disease occurrence. In addition, the difficulty in cloning tissue-specific and infrequently transcribed GPCRs is discussed further.
Project description:The Mouse Genome Sequencing Consortium and the RIKEN Genome Exploration Research grouphave generated large sets of sequence data representing the mouse genome and transcriptome, respectively. These data provide a valuable foundation for genomic research. The challenges for the informatics community are how to integrate these data with the ever-expanding knowledge about the roles of genes and gene products in biological processes, and how to provide useful views to the scientific community. Public resources, such as the National Center for Biotechnology Information (NCBI; http://www.ncbi.nih.gov), and model organism databases, such as the Mouse Genome Informatics database (MGI; http://www.informatics.jax.org), maintain the primary data and provide connections between sequence and biology. In this paper, we describe how the partnership of MGI and NCBI LocusLink contributes to the integration of sequence and biology, especially in the context of the large-scale genome and transcriptome data now available for the laboratory mouse. In particular, we describe the methods and results of integration of 60,770 FANTOM2 mouse cDNAs with gene records in the databases of MGI and LocusLink.
Project description:The FANTOM2 cDNA sequence data set is an excellent model to demonstrate the power of large-scale cDNA sequencing, with the goal of providing a full-length transcript sequence for each mouse gene. This data set enhances the use of the mouse as a model for human disease. Here we identify mouse cDNA sequences in the FANTOM2 data set for a set of 67 human disease genes that as of May 2002 had no corresponding mouse cDNA annotated in the Mouse Genome Informatics (MGI) database. These 67 human disease genes include genes related to neurological and eye disorders and cancer. We also present a list of the human disease genes and their cloned mouse orthologs found in two public databases, LocusLink and MGI. Allelic variant and gene functional information available in MGI provides additional information relative to these mouse models, whereas computed sequence-based connections at NCBI support facile navigation through multiple genomes.
Project description:The number of known mRNA transcripts in the mouse has been greatly expanded by the RIKEN Mouse Gene Encyclopedia project. Validation of their reproducible expression in a tissue is an important contribution to the study of functional genomics. In this report, we determine the expression profile of 57,931 clones on 20 mouse tissues using cDNA microarrays. Of these 57,931 clones, 22,928 clones correspond to the FANTOM2 clone set. The set represents 20,234 transcriptional units (TUs) out of 33,409 TUs in the FANTOM2 set. We identified 7206 separate clones that satisfied stringent criteria for tissue-specific expression. Gene Ontology terms were assigned for these 7206 clones, and the proportion of 'molecular function' ontology for each tissue-specific clone was examined. These data will provide insights into the function of each tissue. Tissue-specific gene expression profiles obtained using our cDNA microarrays were also compared with the data extracted from the GNF Expression Atlas based on Affymetrix microarrays. One major outcome of the RIKEN transcriptome analysis is the identification of numerous nonprotein-coding mRNAs. The expression profile was also used to obtain evidence of expression for putative noncoding RNAs. In addition, 1926 clones (70%) of 2768 clones that were categorized as "unknown EST," and 1969 (58%) clones of 3388 clones that were categorized as "unclassifiable" were also shown to be reproducibly expressed.
Project description:The recent publication of the FANTOM mouse transcriptome has provided a unique opportunity to study the diversity of transcripts arising from a single gene locus. We have focused on the Gnas complex, as imprinting loci themselves provide unique insights into transcriptional regulation. Thirteen full-length cDNAs from the FANTOM2 set were mapped to the Gnas locus. These represented one previously described transcript and 12 putative new transcripts. Of these, eight were found to be differentially expressed from either the maternal or paternal allele. Two clones extended Nespas in the 3' direction, providing evidence of antisense transcription spanning a 30-kb genomic region from a single allele. The transcripts were summarized into six transcriptional units, Nespas, Nesp, Gnasxl, F7, exon 1A, and Gnas. The resolution of the Gnas transcript map by the FANTOM2 clones revealed a pattern of alternate splicing. In addition to the transcripts described previously as splicing onto exon 2 of Gnas, each new sense transcript had an alternate short 3'UTR independent of Gnas. Both spliced and unspliced variants of the new imprinted sense transcripts were found. Whereas the functional significance of these alternate transcripts is not known, the availability of the FANTOM clones has provided remarkable insights into the repertoire of transcripts in the Gnas complex locus.
Project description:BACKGROUND: The superfamily of G protein-coupled receptors (GPCRs) is one of the largest within most mammals. GPCRs are important targets for pharmaceuticals and the rat is one of the most widely used model organisms in biological research. Accurate comparisons of protein families in rat, mice and human are thus important for interpretation of many physiological and pharmacological studies. However, current automated protein predictions and annotations are limited and error prone. RESULTS: We searched the rat genome for GPCRs and obtained 1867 full-length genes and 739 pseudogenes. We identified 1277 new full-length rat GPCRs, whereof 1235 belong to the large group of olfactory receptors. Moreover, we updated the datasets of GPCRs from the human and mouse genomes with 1 and 43 new genes, respectively. The total numbers of full-length genes (and pseudogenes) identified were 799 (583) for human and 1783 (702) for mouse. The rat, human and mouse GPCRs were classified into 7 families named the Glutamate, Rhodopsin, Adhesion, Frizzled, Secretin, Taste2 and Vomeronasal1 families. We performed comprehensive phylogenetic analyses of these families and provide detailed information about orthologues and species-specific receptors. We found that 65 human Rhodopsin family GPCRs are orphans and 56 of these have an orthologue in rat. CONCLUSION: Interestingly, we found that the proportion of one-to-one GPCR orthologues was only 58% between rats and humans and only 70% between the rat and mouse, which is much lower than stated for the entire set of all genes. This is in mainly related to the sensory GPCRs. The average protein sequence identities of the GPCR orthologue pairs is also lower than for the whole genomes. We found these to be 80% for the rat and human pairs and 90% for the rat and mouse pairs. However, the proportions of orthologous and species-specific genes vary significantly between the different GPCR families. The largest diversification is seen for GPCRs that respond to exogenous stimuli indicating that the variation in their repertoires reflects to a large extent the adaptation of the species to their environment. This report provides the first overall roadmap of the GPCR repertoire in rat and detailed comparisons with the mouse and human repertoires.
Project description:G protein-coupled receptors (GPCRs) are the largest signaling family in the genome, serve an expansive array of functions, and are targets for approximately 50% of current therapeutics. In many tissues, such as airway smooth muscle (ASM), complex, unexpected, or paradoxical responses to agonists/antagonists occur without known mechanisms. We hypothesized that ASM express many more GPCRs than predicted, and that these undergo substantial alternative splicing, creating a highly diversified receptor milieu. Transcript arrays were designed detecting 434 GPCRs and their predicted splice variants. In this cell type, 353 GPCRs were detected (including 111 orphans), with expression levels varying by approximately 900-fold. Receptors used for treating airway disease were expressed lower than others with similar signaling properties, indicating potentially more effective targets. A disproportionate number of Class-A peptide-group receptors, and those coupling to G(q)/(11) or G(s) (vs. G(i)), was found. Importantly, 192 GPCRs had, on average, five different expressed receptor isoforms because of splicing events, including alternative splice donors and acceptors, novel introns, intron retentions, exon(s) skips, and novel exons, with the latter two events being most prevalent. The consequences of splicing were further investigated with the leukotriene B4 receptor, known for its aberrant responsiveness in lung. We found transcript expression of three variants because of alternative donor and acceptor splice sites, representing in-frame deletions of 38 and 100 aa, with protein expression of all three isoforms. Thus, alternative splicing, subject to conditional, temporal, and cell-type regulation, is a major mechanism that diversifies the GPCR superfamily, creating local recepteromes with specialized environments.
Project description:The cell cycle is one of the most fundamental processes within a cell. Phase-dependent expression and cell-cycle checkpoints require a high level of control. A large number of genes with varying functions and modes of action are responsible for this biology. In a targeted exploration of the FANTOM2-Variable Protein Set, a number of mouse homologs to known cell-cycle regulators as well as novel members of cell-cycle families were identified. Focusing on two prototype cell-cycle families, the cyclins and the NIMA-related kinases (NEKs), we believe we have identified all of the mouse members of these families, 24 cyclins and 10 NEKs, and mapped them to ENSEMBL transcripts. To attempt to globally identify all potential cell cycle-related genes within mouse, the MGI (Mouse Genome Database) assignments for the RIKEN Representative Set (RPS) and the results from two homology-based queries were merged. We identified 1415 genes with possible cell-cycle roles, and 1758 potential paralogs. We comment on the genes identified in this screen and evaluate the merits of each approach.
Project description:BACKGROUND: G protein-coupled receptors (GPCRs) constitute a large family of integral transmembrane receptor proteins that play a central role in signal transduction in eukaryotes. The genome of the protochordate Ciona intestinalis has a compact size with an ancestral complement of many diversified gene families of vertebrates and is a good model system for studying protochordate to vertebrate diversification. An analysis of the Ciona repertoire of GPCRs from a comparative genomic perspective provides insight into the evolutionary origins of the GPCR signalling system in vertebrates. RESULTS: We have identified 169 gene products in the Ciona genome that code for putative GPCRs. Phylogenetic analyses reveal that Ciona GPCRs have homologous representatives from the five major GRAFS (Glutamate, Rhodopsin, Adhesion, Frizzled and Secretin) families concomitant with other vertebrate GPCR repertoires. Nearly 39% of Ciona GPCRs have unambiguous orthologs of vertebrate GPCR families, as defined for the human, mouse, puffer fish and chicken genomes. The Rhodopsin family accounts for ~68% of the Ciona GPCR repertoire wherein the LGR-like subfamily exhibits a lineage specific gene expansion of a group of receptors that possess a novel domain organisation hitherto unobserved in metazoan genomes. CONCLUSION: Comparison of GPCRs in Ciona to that in human reveals a high level of orthology of a protochordate repertoire with that of vertebrate GPCRs. Our studies suggest that the ascidians contain the basic ancestral complement of vertebrate GPCR genes. This is evident at the subfamily level comparisons since Ciona GPCR sequences are significantly analogous to vertebrate GPCR subfamilies even while exhibiting Ciona specific genes. Our analysis provides a framework to perform future experimental and comparative studies to understand the roles of the ancestral chordate versions of GPCRs that predated the divergence of the urochordates and the vertebrates.
Project description:G protein-coupled receptors (GPCRs) comprise the largest family of transmembrane signaling molecules and regulate a host of physiological and disease processes. To better understand the functions of GPCRs in vivo, we quantified transcript levels of 353 nonodorant GPCRs in 41 adult mouse tissues. Cluster analysis placed many GPCRs into anticipated anatomical and functional groups and predicted previously unidentified roles for less-studied receptors. From one such prediction, we showed that the Gpr91 ligand succinate can regulate lipolysis in white adipose tissue, suggesting that signaling by this citric acid cycle intermediate may regulate energy homeostasis. We also showed that pairwise analysis of GPCR expression across tissues may help predict drug side effects. This resource will aid studies to understand GPCR function in vivo and may assist in the identification of therapeutic targets.
Project description:BACKGROUND: The dog is an important model organism and it is considered to be closer to humans than rodents regarding metabolism and responses to drugs. The close relationship between humans and dogs over many centuries has lead to the diversity of the canine species, important genetic discoveries and an appreciation of the effects of old age in another species. The superfamily of G protein-coupled receptors (GPCRs) is one of the largest gene families in most mammals and the most exploited in terms of drug discovery. An accurate comparison of the GPCR repertoires in dog and human is valuable for the prediction of functional similarities and differences between the species. RESULTS: We searched the dog genome for non-olfactory GPCRs and obtained 353 full-length GPCR gene sequences, 18 incomplete sequences and 13 pseudogenes. We established relationships between human, dog, rat and mouse GPCRs resolving orthologous pairs and species-specific duplicates. We found that 12 dog GPCR genes are missing in humans while 24 human GPCR genes are not part of the dog GPCR repertoire. There is a higher number of orthologous pairs between dog and human that are conserved as compared with either mouse or rat. In almost all cases the differences observed between the dog and human genomes coincide with other variations in the rodent species. Several GPCR gene expansions characteristic for rodents are not found in dog. CONCLUSION: The repertoire of dog non-olfactory GPCRs is more similar to the repertoire in humans as compared with the one in rodents. The comparison of the dog, human and rodent repertoires revealed several examples of species-specific gene duplications and deletions. This information is useful in the selection of model organisms for pharmacological experiments.