Local network topology in human protein interaction data predicts functional association.
ABSTRACT: The use of high-throughput techniques to generate large volumes of protein-protein interaction (PPI) data has increased the need for methods that systematically and automatically suggest functional relationships among proteins. In a yeast PPI network, previous work has shown that the local connection topology, particularly for two proteins sharing an unusually large number of neighbors, can predict functional association. In this study we improved the prediction scheme by developing a new algorithm and applied it on a human PPI network to make a genome-wide functional inference. We used the new algorithm to measure and reduce the influence of hub proteins on detecting function-associated protein pairs. We used the annotations of the Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) as benchmarks to compare and evaluate the function relevance. The application of our algorithms to human PPI data yielded 4,233 significant functional associations among 1,754 proteins. Further functional comparisons between them allowed us to assign 466 KEGG pathway annotations to 274 proteins and 123 GO annotations to 114 proteins with estimated false discovery rates of <21% for KEGG and <30% for GO. We clustered 1,729 proteins by their functional associations and made functional inferences from detailed analysis on one subcluster highly enriched in the TGF-beta signaling pathway (P<10(-50)). Analysis of another four subclusters also suggested potential new players in six signaling pathways worthy of further experimental investigations. Our study gives clear insight into the common neighbor-based prediction scheme and provides a reliable method for large-scale functional annotation in this post-genomic era.
Project description:Protein-protein interaction (PPI) plays an extremely remarkable role in the growth, reproduction, and metabolism of all lives. A thorough investigation of PPI can uncover the mechanism of how proteins express their functions. In this study, we used gene ontology (GO) terms and biological pathways to study an extended version of PPI (protein-protein functional associations) and subsequently identify some essential GO terms and pathways that can indicate the difference between two proteins with and without functional associations. The protein-protein functional associations validated by experiments were retrieved from STRING, a well-known database on collected associations between proteins from multiple sources, and they were termed as positive samples. The negative samples were constructed by randomly pairing two proteins. Each sample was represented by several features based on GO and KEGG pathway information of two proteins. Then, the mutual information was adopted to evaluate the importance of all features and some important ones could be accessed, from which a number of essential GO terms or KEGG pathways were identified. The final analysis of some important GO terms and one KEGG pathway can partly uncover the difference between proteins with and without functional associations.
Project description:COFECO is a web-based tool for a composite annotation of protein complexes, KEGG pathways and Gene Ontology (GO) terms within a class of genes and their orthologs under study. Widely used functional enrichment tools using GO and KEGG pathways create large list of annotations that make it difficult to derive consolidated information and often include over-generalized terms. The interrelationship of annotation terms can be more clearly delineated by integrating the information of physically interacting proteins with biological pathways and GO terms. COFECO has the following advanced characteristics: (i) The composite annotation sets of correlated functions and cellular processes for a given gene set can be identified in a more comprehensive and specified way by the employment of protein complex data together with GO and KEGG pathways as annotation resources. (ii) Orthology based integrative annotations among different species complement the defective annotations in an individual genome and provide the information of evolutionary conserved correlations. (iii) A term filtering feature enables users to collect the specified annotations enriched with selected function terms. (iv) A cross-comparison of annotation results between two different datasets is possible. In addition, COFECO provides a web-based GO hierarchical viewer and KEGG pathway viewer where the enrichment results can be summarized and further explored. COFECO is freely accessible at http://piech.kaist.ac.kr/cofeco.
Project description:UNLABELLED: BACKGROUND:3D domain swapping is a novel structural phenomenon observed in diverse set of protein structures in oligomeric conformations. A distinct structural feature, where structural segments in a protein dimer or higher oligomer were shared between two or more chains of a protein structure, characterizes 3D domain swapping. 3D domain swapping was observed as a key mediator of numerous functional mechanisms and play pathogenic role in various diseases including conformational diseases like amyloidosis, Alzheimer's disease, Parkinson's disease and prion diseases. We report the first study with a focus on identifying functional classes, pathways and diseases mediated by 3D domain swapping in the human proteome. METHODS:We used a panel of four enrichment tools with two different ontologies and two annotations database to derive biological and clinical relevant information associated with 3D domain swapping. Protein domain enrichment analysis followed by Gene Ontology (GO) term enrichment analysis revealed the functional repertoire of proteins involved in swapping. Pathway analysis using KEGG annotations revealed diverse pathway associations of human proteins involved in 3D domain swapping. Disease Ontology was used to find statistically significant associations with proteins in swapped conformation and various disease categories (P-value < 0.05). RESULTS:We report meta-analysis results of a literature-curated dataset of human gene products involved in 3D domain swapping and discuss new insights about the functional repertoire, pathway associations and disease implications of proteins involved in 3D domain swapping. CONCLUSIONS:Our integrated bioinformatics pipeline comprising of four different enrichment tools, two ontologies and two annotations revealed new insights into the functional and disease correlations with 3D domain swapping. GO term enrichment were used to infer terms associated with three different GO categories. Protein domain enrichment was used to identify conserved domains enriched in swapped proteins. Pathway enrichment analysis using KEGG annotations revealed that proteins with swapped conformations are present in all six classes of KEGG BRITE hierarchy and significantly enriched KEGG pathways were observed in five classes. Five major classes of disease were found to be associated with 3D domain swapping using functional disease ontology based enrichment analysis. Five classes of human diseases: cancer, diseases of the respiratory or pulmonary system, degenerative diseases of the central nervous system, vascular disease and encephalitis were found to be significant. In conclusion, our study shows that bioinformatics based analytical approaches using curated data can enhance the understanding of functional and disease implications of 3D domain swapping.
Project description:A number of tools for the alignment of protein-protein interaction (PPI) networks have laid the foundation for PPI network analysis. Most of alignment tools focus on finding conserved interaction regions across the PPI networks through either local or global mapping of similar sequences. Researchers are still trying to improve the speed, scalability, and accuracy of network alignment. In view of this, we introduce a connected-components based fast algorithm, HopeMap, for network alignment. Observing that the size of true orthologs across species is small comparing to the total number of proteins in all species, we take a different approach based on a precompiled list of homologs identified by KO terms. Applying this approach to S. cerevisiae (yeast) and D. melanogaster (fly), E. coli K12 and S. typhimurium, E. coli K12 and C. crescenttus, we analyze all clusters identified in the alignment. The results are evaluated through up-to-date known gene annotations, gene ontology (GO), and KEGG ortholog groups (KO). Comparing to existing tools, our approach is fast with linear computational cost, highly accurate in terms of KO and GO terms specificity and sensitivity, and can be extended to multiple alignments easily.
Project description:BACKGROUND:Pulpitis is an inflammatory disease, the grade of which is classified according to the level of inflammation. Traditional methods of evaluating the status of dental pulp tissue in clinical practice have limitations. The rapid and accurate diagnosis of pulpitis is essential for determining the appropriate treatment. By integrating different datasets from the Gene Expression Omnibus (GEO) database, we analysed a merged expression matrix of pulpitis, aiming to identify biological pathways and diagnostic biomarkers of pulpitis. METHODS:By integrating two datasets (GSE77459 and GSE92681) in the GEO database using the sva and limma packages of R, differentially expressed genes (DEGs) of pulpitis were identified. Then, the DEGs were analysed to identify biological pathways of dental pulp inflammation with Gene Ontology (GO) analysis, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis and Gene Set Enrichment Analysis (GSEA). Protein-protein interaction (PPI) networks and modules were constructed to identify hub genes with the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) and Cytoscape. RESULTS:A total of 470 DEGs comprising 394 upregulated and 76 downregulated genes were found in pulpitis tissue. GO analysis revealed that the DEGs were enriched in biological processes related to inflammation, and the enriched pathways in the KEGG pathway analysis were cytokine-cytokine receptor interaction, chemokine signalling pathway and NF-?B signalling pathway. The GSEA results provided further functional annotations, including complement system, IL6/JAK/STAT3 signalling pathway and inflammatory response pathways. According to the degrees of nodes in the PPI network, 10 hub genes were identified, and 8 diagnostic biomarker candidates were screened: PTPRC, CD86, CCL2, IL6, TLR8, MMP9, CXCL8 and ICAM1. CONCLUSIONS:With bioinformatics analysis of merged datasets, biomarker candidates of pulpitis were screened and the findings may be as reference to develop a new method of pulpitis diagnosis.
Project description:BACKGROUND:Studying the large-scale protein-protein interaction (PPI) network is important in understanding biological processes. The current research presents the first PPI map of swine, which aims to give new insights into understanding their biological processes. RESULTS:We used three methods, Interolog-based prediction of porcine PPI network, domain-motif interactions from structural topology-based prediction of porcine PPI network and motif-motif interactions from structural topology-based prediction of porcine PPI network, to predict porcine protein interactions among 25,767 porcine proteins. We predicted 20,213, 331,484, and 218,705 porcine PPIs respectively, merged the three results into 567,441 PPIs, constructed four PPI networks, and analyzed the topological properties of the porcine PPI networks. Our predictions were validated with Pfam domain annotations and GO annotations. Averages of 70, 10,495, and 863 interactions were related to the Pfam domain-interacting pairs in iPfam database. For comparison, randomized networks were generated, and averages of only 4.24, 66.79, and 44.26 interactions were associated with Pfam domain-interacting pairs in iPfam database. In GO annotations, we found 52.68%, 75.54%, 27.20% of the predicted PPIs sharing GO terms respectively. However, the number of PPI pairs sharing GO terms in the 10,000 randomized networks reached 52.68%, 75.54%, 27.20% is 0. Finally, we determined the accuracy and precision of the methods. The methods yielded accuracies of 0.92, 0.53, and 0.50 at precisions of about 0.93, 0.74, and 0.75, respectively. CONCLUSION:The results reveal that the predicted PPI networks are considerably reliable. The present research is an important pioneering work on protein function research. The porcine PPI data set, the confidence score of each interaction and a list of related data are available at (http://pppid.biositemap.com/).
Project description:<h4>Motivation</h4>The availability of large-scale curated protein interaction datasets has given rise to the opportunity to investigate higher level organization and modularity within the protein-protein interaction (PPI) network using graph theoretic analysis. Despite the recent progress, systems level analysis of high-throughput PPIs remains a daunting task because of the amount of data they present. In this article, we propose a novel PPI network decomposition algorithm called FACETS in order to make sense of the deluge of interaction data using Gene Ontology (GO) annotations. FACETS finds not just a single functional decomposition of the PPI network, but a multi-faceted atlas of functional decompositions that portray alternative perspectives of the functional landscape of the underlying PPI network. Each facet in the atlas represents a distinct interpretation of how the network can be functionally decomposed and organized. Our algorithm maximizes interpretative value of the atlas by optimizing inter-facet orthogonality and intra-facet cluster modularity.<h4>Results</h4>We tested our algorithm on the global networks from IntAct, and compared it with gold standard datasets from MIPS and KEGG. We demonstrated the performance of FACETS. We also performed a case study that illustrates the utility of our approach.<h4>Supplementary information</h4>Supplementary data are available at the Bioinformatics online.<h4>Availability</h4>Our software is available freely for non-commercial purposes from: http://www.cais.ntu.edu.sg/~assourav/Facets/
Project description:Chemoresistance is a significant factor associated with poor outcomes of osteosarcoma patients. The present study aims to identify Chemoresistance-regulated gene signatures and microRNAs (miRNAs) in Gene Expression Omnibus (GEO) database. The results of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) included positive regulation of transcription, DNA-templated, tryptophan metabolism, and the like. Then differentially expressed genes (DEGs) were uploaded to Search Tool for the Retrieval of Interacting Genes (STRING) to construct protein-protein interaction (PPI) networks, and 9 hub genes were screened, such as fucosyltransferase 3 (Lewis blood group) (FUT3) whose expression in chemoresistant samples was high, but with a better prognosis in osteosarcoma patients. Furthermore, the connection between DEGs and differentially expressed miRNAs (DEMs) was explored. GEO2R was utilized to screen out DEGs and DEMs. A total of 668 DEGs and 5 DEMs were extracted from GSE7437 and GSE30934 differentiating samples of poor and good chemotherapy reaction patients. The Database for Annotation, Visualization, and Integrated Discovery (DAVID) was used to perform GO and KEGG pathway enrichment analysis to identify potential pathways and functional annotations linked with osteosarcoma chemoresistance. The present study may provide a deeper understanding about regulatory genes of osteosarcoma chemoresistance and identify potential therapeutic targets for osteosarcoma.
Project description:Semantic similarity measures are useful to assess the physiological relevance of protein-protein interactions (PPIs). They quantify similarity between proteins based on their function using annotation systems like the Gene Ontology (GO). Proteins that interact in the cell are likely to be in similar locations or involved in similar biological processes compared to proteins that do not interact. Thus the more semantically similar the gene function annotations are among the interacting proteins, more likely the interaction is physiologically relevant. However, most semantic similarity measures used for PPI confidence assessment do not consider the unequal depth of term hierarchies in different classes of cellular location, molecular function, and biological process ontologies of GO and thus may over-or under-estimate similarity.We describe an improved algorithm, Topological Clustering Semantic Similarity (TCSS), to compute semantic similarity between GO terms annotated to proteins in interaction datasets. Our algorithm, considers unequal depth of biological knowledge representation in different branches of the GO graph. The central idea is to divide the GO graph into sub-graphs and score PPIs higher if participating proteins belong to the same sub-graph as compared to if they belong to different sub-graphs.The TCSS algorithm performs better than other semantic similarity measurement techniques that we evaluated in terms of their performance on distinguishing true from false protein interactions, and correlation with gene expression and protein families. We show an average improvement of 4.6 times the F1 score over Resnik, the next best method, on our Saccharomyces cerevisiae PPI dataset and 2 times on our Homo sapiens PPI dataset using cellular component, biological process and molecular function GO annotations.
Project description:BACKGROUND: Gallstones and gallbladder polyps (GPs) are two major types of gallbladder diseases that share multiple common symptoms. However, their pathological mechanism remains largely unknown. The aim of our study is to identify gallstones and GPs related-genes and gain an insight into the underlying genetic basis of these diseases. METHODS: We enrolled 7 patients with gallstones and 2 patients with GP for RNA-Seq and we conducted functional enrichment analysis and protein-protein interaction (PPI) networks analysis for identified differentially expressed genes (DEGs). RESULTS: RNA-Seq produced 41.7 million in gallstones and 32.1 million pairs in GPs. A total of 147 DEGs was identified between gallstones and GPs. We found GO terms for molecular functions significantly enriched in antigen binding (GO:0003823, P=5.9E-11), while for biological processes, the enriched GO terms were immune response (GO:0006955, P=2.6E-15), and for cellular component, the enriched GO terms were extracellular region (GO:0005576, P=2.7E-15). To further evaluate the biological significance for the DEGs, we also performed the KEGG pathway enrichment analysis. The most significant pathway in our KEGG analysis was Cytokine-cytokine receptor interaction (P=7.5E-06). PPI network analysis indicated that the significant hub proteins containing S100A9 (S100 calcium binding protein A9, Degree=94) and CR2 (complement component receptor 2, Degree=8). CONCLUSION: This present study suggests some promising genes and may provide a clue to the role of these genes playing in the development of gallstones and GPs.