ABSTRACT: Human Proteinpedia (http://www.humanproteinpedia.org) is a publicly available proteome repository for sharing human protein data derived from multiple experimental platforms. It incorporates diverse features of the human proteome including protein-protein interactions, enzyme-substrate relationships, PTMs, subcellular localization, and expression of proteins in various human tissues and cell lines in diverse biological conditions including diseases. Through a publicly distributed annotation system developed especially for proteomic data, investigators across the globe can upload, view, and edit proteomic data even before they are published. Inclusion of information on investigators and laboratories that generated the data, as well as visualization of tandem mass spectra, stained tissue sections, protein/peptide microarrays, fluorescent micrographs, and western blots, ensures quality of proteomic data assimilated in Human Proteinpedia. Many of the protein annotations submitted to Human Proteinpedia have also been made available to the scientific community through Human Protein Reference Database (http://www.hprd.org), another resource developed by our group. In this protocol, we describe how to submit, edit, and retrieve proteomic data in Human Proteinpedia.
Project description:Sharing proteomic data with the biomedical community through a unified proteomic resource, especially in the context of individual proteins, is a challenging prospect. We have developed a community portal, designated as Human Proteinpedia (http://www.humanproteinpedia.org/), for sharing both unpublished and published human proteomic data through the use of a distributed annotation system designed specifically for this purpose. This system allows laboratories to contribute and maintain protein annotations, which are also mapped to the corresponding proteins through the Human Protein Reference Database (HPRD; http://www.hprd.org/). Thus, it is possible to visualize data pertaining to experimentally validated posttranslational modifications (PTMs), protein isoforms, protein-protein interactions (PPIs), tissue expression, expression in cell lines, subcellular localization and enzyme substrates in the context of individual proteins. With enthusiastic participation of the proteomics community, the past 15 months have witnessed data contributions from more than 75 labs around the world including 2710 distinct experiments, >1.9 million peptides, >4.8 million MS/MS spectra, 150,368 protein expression annotations, 17,410 PTMs, 34,624 PPIs and 2906 subcellular localization annotations. Human Proteinpedia should serve as an integrated platform to store, integrate and disseminate such proteomic data and is inching towards evolving into a unified human proteomics resource.
Project description:Human Protein Reference Database (HPRD) is a rich resource of experimentally proven features of human proteins. Protein information in HPRD includes protein-protein interactions, post-translational modifications, enzyme/substrate relationships, disease associations, tissue expression, and subcellular localization of human proteins. Although, protein-protein interaction data from HPRD has been widely used by the scientific community, its phosphoproteome data has not been exploited to its full potential. HPRD is one of the largest documentations of human phosphoproteins in the public domain. Currently, phosphorylation data in HPRD comprises of 95,016 phosphosites mapped on to 13,041 proteins. Additionally, enzyme-substrate reactions responsible for 5930 phosphorylation events were also documented. Significant improvements in technologies and high-throughput platforms in biomedical investigations led to an exponential increase of biological data and phosphoproteomic data in recent years. Human Proteinpedia, a community annotation portal developed by us, has also contributed to the significant increase in phosphoproteomic data in HPRD. A large number of phosphorylation events have been mapped on to reference sequences available in HPRD and Human Proteinpedia along with associated protein features. This will provide a platform for systems biology approaches to determine the role of protein phosphorylation in protein function, cell signaling, biological processes and their implication in human diseases. This review aims to provide a composite view of phosphoproteomic data pertaining to human proteins in HPRD and Human Proteinpedia.
Project description:PhyreRisk is an open-access, publicly accessible web application for interactively bridging genomic, proteomic and structural data facilitating the mapping of human variants onto protein structures. A major advance over other tools for sequence-structure variant mapping is that PhyreRisk provides information on 20,214 human canonical proteins and an additional 22,271 alternative protein sequences (isoforms). Specifically, PhyreRisk provides structural coverage (partial or complete) for 70% (14,035 of 20,214 canonical proteins) of the human proteome, by storing 18,874 experimental structures and 84,818 pre-built models of canonical proteins and their isoforms generated using our in house Phyre2. PhyreRisk reports 55,732 experimentally, multi-validated protein interactions from IntAct and 24,260 experimental structures of protein complexes. Another major feature of PhyreRisk is that, rather than presenting a limited set of precomputed variant-structure mapping of known genetic variants, it allows the user to explore novel variants using, as input, genomic coordinates formats (Ensembl, VCF, reference SNP ID and HGVS notations) and Human Build GRCh37 and GRCh38. PhyreRisk also supports mapping variants using amino acid coordinates and searching for genes or proteins of interest. PhyreRisk is designed to empower researchers to translate genetic data into protein structural information, thereby providing a more comprehensive appreciation of the functional impact of variants. PhyreRisk is freely available at http://phyrerisk.bc.ic.ac.uk.
Project description:Normal human urine contains large numbers of exosomes, which are 40- to 100-nm vesicles that originate as the internal vesicles in multivesicular bodies from every renal epithelial cell type facing the urinary space. Here, we used LC-MS/MS to profile the proteome of human urinary exosomes. Overall, the analysis identified 1132 proteins unambiguously, including 177 that are represented on the Online Mendelian Inheritance in Man database of disease-related genes, suggesting that exosome analysis is a potential approach to discover urinary biomarkers. We extended the proteomic analysis to phosphoproteomic profiling using neutral loss scanning, and this yielded multiple novel phosphorylation sites, including serine-811 in the thiazide-sensitive Na-Cl co-transporter, NCC. To demonstrate the potential use of exosome analysis to identify a genetic renal disease, we carried out immunoblotting of exosomes from urine samples of patients with a clinical diagnosis of Bartter syndrome type I, showing an absence of the sodium-potassium-chloride co-transporter 2, NKCC2. The proteomic data are publicly accessible at http://dir.nhlbi.nih.gov/papers/lkem/exosome/.
Project description:Deconvolution of targets and action mechanisms of anticancer compounds is fundamental in drug development. Here, we report on ProTargetMiner as a publicly available expandable proteome signature library of anticancer molecules in cancer cell lines. Based on 287 A549 adenocarcinoma proteomes affected by 56 compounds, the main dataset contains 7,328 proteins and 1,307,859 refined protein-drug pairs. These proteomic signatures cluster by compound targets and action mechanisms. The targets and mechanistic proteins are deconvoluted by partial least square modeling, provided through the website http://protargetminer.genexplain.com. For 9 molecules representing the most diverse mechanisms and the common cancer cell lines MCF-7, RKO and A549, deep proteome datasets are obtained. Combining data from the three cell lines highlights common drug targets and cell-specific differences. The database can be easily extended and merged with new compound signatures. ProTargetMiner serves as a chemical proteomics resource for the cancer research community, and can become a valuable tool in drug discovery.
Project description:Remarkable progress continues on the annotation of the proteins identified in the Human Proteome and on finding credible proteomic evidence for the expression of "missing proteins". Missing proteins are those with no previous protein-level evidence or insufficient evidence to make a confident identification upon reanalysis in PeptideAtlas and curation in neXtProt. Enhanced with several major new data sets published in 2014, the human proteome presented as neXtProt, version 2014-09-19, has 16,491 unique confident proteins (PE level 1), up from 13,664 at 2012-12 and 15,646 at 2013-09. That leaves 2948 missing proteins from genes classified having protein existence level PE 2, 3, or 4, as well as 616 dubious proteins at PE 5. Here, we document the progress of the HPP and discuss the importance of assessing the quality of evidence, confirming automated findings and considering alternative protein matches for spectra and peptides. We provide guidelines for proteomics investigators to apply in reporting newly identified proteins.
Project description:The availability of human genome sequence has transformed biomedical research over the past decade. However, an equivalent map for the human proteome with direct measurements of proteins and peptides does not exist yet. Here we present a draft map of the human proteome using high-resolution Fourier-transform mass spectrometry. In-depth proteomic profiling of 30 histologically normal human samples, including 17 adult tissues, 7 fetal tissues and 6 purified primary haematopoietic cells, resulted in identification of proteins encoded by 17,294 genes accounting for approximately 84% of the total annotated protein-coding genes in humans. A unique and comprehensive strategy for proteogenomic analysis enabled us to discover a number of novel protein-coding regions, which includes translated pseudogenes, non-coding RNAs and upstream open reading frames. This large human proteome catalogue (available as an interactive web-based resource at http://www.humanproteomemap.org) will complement available human genome and transcriptome data to accelerate biomedical research in health and disease.
Project description:The availability of human genome sequence has transformed biomedical research over the past decade. However, an equivalent map for the human proteome with direct measurements of proteins and peptides was lacking. To this end, Akhilesh Pandey's lab reported a draft map of the human proteome based on high resolution Fourier transform mass spectrometry-based proteomics technology, which included an in-depth proteomic profiling of 30 histologically normal human samples including 17 adult tissues, 7 fetal tissues and 6 purified primary hematopoietic cells ( http://dx.doi.org/10.1038/nature13302 ). The profiling resulted in identification of proteins encoded by greater than 17,000 genes accounting for ~84% of the total annotated protein-coding genes in humans. This large human proteome catalog (available as an interactive web-based resource at http://www.humanproteomemap.org) complements available human genome and transcriptome data to accelerate biomedical research in health and disease. Pandey's lab and collaborators request that those considering use of this primary dataset for commercial purposes contact email@example.com. The full details of this study can be found in the PRIDE database: www.ebi.ac.uk/pride/archive/projects/PXD000561/. This ArrayExpress entry represents a top level summary of the metadata only which formed the basis of the reanalysis performed by Joyti Choudhary's team ( firstname.lastname@example.org ), results of which are presented in the Expression Atlas at EMBL-EBI : http://www.ebi.ac.uk/gxa/experiments/E-PROT-1.
Project description:BACKGROUND:Unlike humans, there is currently no publicly available reference mass spectrometry-based circulating acellular proteome data for sheep, limiting the analysis and interpretation of a range of physiological changes and disease states. The objective of this study was to develop a robust and comprehensive method to characterise the circulating acellular proteome in ovine serum. METHODS:Serum samples from healthy sheep were subjected to shotgun proteomic analysis using nano liquid chromatography nano electrospray ionisation tandem mass spectrometry (nanoLC-nanoESI-MS/MS) on a quadrupole time-of-flight instrument (TripleTOF® 5600+, SCIEX). Proteins were identified using ProteinPilot™ (SCIEX) and Mascot (Matrix Science) software based on a minimum of two unmodified highly scoring unique peptides per protein at a false discovery rate (FDR) of 1% software by searching a subset of the Universal Protein Resource Knowledgebase (UniProtKB) database (http://www.uniprot.org). PeptideShaker (CompOmics, VIB-UGent) searches were used to validate protein identifications from ProteinPilot™ and Mascot. RESULTS:ProteinPilot™ and Mascot identified 245 and 379 protein groups (IDs), respectively, and PeptideShaker validated 133 protein IDs from the entire dataset. Since Mascot software is considered the industry standard and identified the most proteins, these were analysed using the Protein ANalysis THrough Evolutionary Relationships (PANTHER) classification tool revealing the association of 349 genes with 127 protein pathway hits. These data are available via ProteomeXchange with identifier PXD004989. CONCLUSIONS:These results demonstrated for the first time the feasibility of characterising the ovine circulating acellular proteome using nanoLC-nanoESI-MS/MS. This peptide spectral data contributes to a protein library that can be used to identify a wide range of proteins in ovine serum.
Project description:Omics sciences enable a systems-level perspective in characterizing cardiovascular biology. Integration of diverse proteomics data via a computational strategy will catalyze the assembly of contextualized knowledge, foster discoveries through multidisciplinary investigations, and minimize unnecessary redundancy in research efforts.The goal of this project is to develop a consolidated cardiac proteome knowledgebase with novel bioinformatics pipeline and Web portals, thereby serving as a new resource to advance cardiovascular biology and medicine.We created Cardiac Organellar Protein Atlas Knowledgebase (COPaKB; www.HeartProteome.org), a centralized platform of high-quality cardiac proteomic data, bioinformatics tools, and relevant cardiovascular phenotypes. Currently, COPaKB features 8 organellar modules, comprising 4203 LC-MS/MS experiments from human, mouse, drosophila, and Caenorhabditis elegans, as well as expression images of 10,924 proteins in human myocardium. In addition, the Java-coded bioinformatics tools provided by COPaKB enable cardiovascular investigators in all disciplines to retrieve and analyze pertinent organellar protein properties of interest.COPaKB provides an innovative and interactive resource that connects research interests with the new biological discoveries in protein sciences. With an array of intuitive tools in this unified Web server, nonproteomics investigators can conveniently collaborate with proteomics specialists to dissect the molecular signatures of cardiovascular phenotypes.