MDD-Palm: Identification of protein S-palmitoylation sites with substrate motifs based on maximal dependence decomposition.
ABSTRACT: S-palmitoylation, the covalent attachment of 16-carbon palmitic acids to a cysteine residue via a thioester linkage, is an important reversible lipid modification that plays a regulatory role in a variety of physiological and biological processes. As the number of experimentally identified S-palmitoylated peptides increases, it is imperative to investigate substrate motifs to facilitate the study of protein S-palmitoylation. Based on 710 non-homologous S-palmitoylation sites obtained from published databases and the literature, we carried out a bioinformatics investigation of S-palmitoylation sites based on amino acid composition. Two Sample Logo indicates that positively charged and polar amino acids surrounding S-palmitoylated sites may be associated with the substrate site specificity of protein S-palmitoylation. Additionally, maximal dependence decomposition (MDD) was applied to explore the motif signatures of S-palmitoylation sites by categorizing a large-scale dataset into subgroups with statistically significant conservation of amino acids. Single features such as amino acid composition (AAC), amino acid pair composition (AAPC), position specific scoring matrix (PSSM), position weight matrix (PWM), amino acid substitution matrix (BLOSUM62), and accessible surface area (ASA) were considered, along with the effectiveness of incorporating MDD-identified substrate motifs into a two-layered prediction model. Evaluation by five-fold cross-validation showed that a hybrid of AAC and PSSM performs best at discriminating between S-palmitoylation and non-S-palmitoylation sites, according to the support vector machine (SVM). The two-layered SVM model integrating MDD-identified substrate motifs performed well, with a sensitivity of 0.79, specificity of 0.80, accuracy of 0.80, and Matthews Correlation Coefficient (MCC) value of 0.45. Using an independent testing dataset (613 S-palmitoylated and 5412 non-S-palmitoylated sites) obtained from the literature, we demonstrated that the two-layered SVM model could outperform other prediction tools, yielding a balanced sensitivity and specificity of 0.690 and 0.694, respectively. This two-layered SVM model has been implemented as a web-based system (MDD-Palm), which is now freely available at http://csb.cse.yzu.edu.tw/MDDPalm/.
Project description:BACKGROUND:Glutarylation, the addition of a glutaryl group (five carbons) to a lysine residue of a protein molecule, is an important post-translational modification and plays a regulatory role in a variety of physiological and biological processes. As the number of experimentally identified glutarylated peptides increases, it becomes imperative to investigate substrate motifs to enhance the study of protein glutarylation. We carried out a bioinformatics investigation of glutarylation sites based on amino acid composition using a public database containing information on 430 non-homologous glutarylation sites. RESULTS:The TwoSampleLogo analysis indicates that positively charged and polar amino acids surrounding glutarylated sites may be associated with the specificity in substrate site of protein glutarylation. Additionally, the chi-squared test was utilized to explore the intrinsic interdependence between two positions around glutarylation sites. Further, maximal dependence decomposition (MDD), which consists of partitioning a large-scale dataset into subgroups with statistically significant amino acid conservation, was used to capture motif signatures of glutarylation sites. We considered single features, such as amino acid composition (AAC), amino acid pair composition (AAPC), and composition of k-spaced amino acid pairs (CKSAAP), as well as the effectiveness of incorporating MDD-identified substrate motifs into an integrated prediction model. Evaluation by five-fold cross-validation showed that AAC was most effective in discriminating between glutarylation and non-glutarylation sites, according to support vector machine (SVM). CONCLUSIONS:The SVM model integrating MDD-identified substrate motifs performed well, with a sensitivity of 0.677, a specificity of 0.619, an accuracy of 0.638, and a Matthews Correlation Coefficient (MCC) value of 0.28. Using an independent testing dataset (46 glutarylated and 92 non-glutarylated sites) obtained from the literature, we demonstrated that the integrated SVM model could improve the predictive performance effectively, yielding a balanced sensitivity and specificity of 0.652 and 0.739, respectively. This integrated SVM model has been implemented as a web-based system (MDDGlutar), which is now freely available at http://csb.cse.yzu.edu.tw/MDDGlutar/ .
Project description:BACKGROUND:Carbonylation, which takes place through oxidation of reactive oxygen species (ROS) on specific residues, is an irreversibly oxidative modification of proteins. It has been reported that the carbonylation is related to a number of metabolic or aging diseases including diabetes, chronic lung disease, Parkinson's disease, and Alzheimer's disease. Due to the lack of computational methods dedicated to exploring motif signatures of protein carbonylation sites, we were motivated to exploit an iterative statistical method to characterize and identify carbonylated sites with motif signatures. RESULTS:By manually curating experimental data from research articles, we obtained 332, 144, 135, and 140 verified substrate sites for K (lysine), R (arginine), T (threonine), and P (proline) residues, respectively, from 241 carbonylated proteins. In order to examine the informative attributes for classifying between carbonylated and non-carbonylated sites, multifarious features including composition of twenty amino acids (AAC), composition of amino acid pairs (AAPC), position-specific scoring matrix (PSSM), and positional weighted matrix (PWM) were investigated in this study. Additionally, in an attempt to explore the motif signatures of carbonylation sites, an iterative statistical method was adopted to detect statistically significant dependencies of amino acid compositions between specific positions around substrate sites. Profile hidden Markov model (HMM) was then utilized to train a predictive model from each motif signature. Moreover, based on the method of support vector machine (SVM), we adopted it to construct an integrative model by combining the values of bit scores obtained from profile HMMs. The combinatorial model could provide an enhanced performance with evenly predictive sensitivity and specificity in the evaluation of cross-validation and independent testing. CONCLUSION:This study provides a new scheme for exploring potential motif signatures at substrate sites of protein carbonylation. The usefulness of the revealed motifs in the identification of carbonylated sites is demonstrated by their effective performance in cross-validation and independent testing. Finally, these substrate motifs were adopted to build an available online resource (MDD-Carb, http://csb.cse.yzu.edu.tw/MDDCarb/ ) and are also anticipated to facilitate the study of large-scale carbonylated proteomes.
Project description:BACKGROUND: Protein palmitoylation, an essential and reversible post-translational modification (PTM), has been implicated in cellular dynamics and plasticity. Although numerous experimental studies have been performed to explore the molecular mechanisms underlying palmitoylation processes, the intrinsic feature of substrate specificity has remained elusive. Thus, computational approaches for palmitoylation prediction are much desirable for further experimental design. RESULTS: In this work, we present NBA-Palm, a novel computational method based on Naïve Bayes algorithm for prediction of palmitoylation site. The training data is curated from scientific literature (PubMed) and includes 245 palmitoylated sites from 105 distinct proteins after redundancy elimination. The proper window length for a potential palmitoylated peptide is optimized as six. To evaluate the prediction performance of NBA-Palm, 3-fold cross-validation, 8-fold cross-validation and Jack-Knife validation have been carried out. Prediction accuracies reach 85.79% for 3-fold cross-validation, 86.72% for 8-fold cross-validation and 86.74% for Jack-Knife validation. Two more algorithms, RBF network and support vector machine (SVM), also have been employed and compared with NBA-Palm. CONCLUSION: Taken together, our analyses demonstrate that NBA-Palm is a useful computational program that provides insights for further experimentation. The accuracy of NBA-Palm is comparable with our previously described tool CSS-Palm. The NBA-Palm is freely accessible from: http://www.bioinfo.tsinghua.edu.cn/NBA-Palm.
Project description:The serotonergic system and in particular serotonin 1A receptor (5-HT1AR) are implicated in major depressive disorder (MDD). Here we demonstrated that 5-HT1AR is palmitoylated in human and rodent brains, and identified ZDHHC21 as a major palmitoyl acyltransferase, whose depletion reduced palmitoylation and consequently signaling functions of 5-HT1AR. Two rodent models for depression-like behavior show reduced brain ZDHHC21 expression and attenuated 5-HT1AR palmitoylation. Moreover, selective knock-down of ZDHHC21 in the murine forebrain induced depression-like behavior. We also identified the microRNA miR-30e as a negative regulator of Zdhhc21 expression. Through analysis of the post-mortem brain samples in individuals with MDD that died by suicide we find that miR-30e expression is increased, while ZDHHC21 expression, as well as palmitoylation of 5-HT1AR, are reduced within the prefrontal cortex. Our study suggests that downregulation of 5-HT1AR palmitoylation is a mechanism involved in depression, making the restoration of 5-HT1AR palmitoylation a promising clinical strategy for the treatment of MDD.
Project description:Protein S-acylation (palmitoylation) is a reversible lipid modification that is an important regulator of dynamic membrane-protein interactions. Proteomic approaches have uncovered many putative palmitoylated proteins however, methods for comprehensive palmitoylation site characterization are lacking. We demonstrate a quantitative site-specific-Acyl-Biotin-Exchange (ssABE) method that allowed the identification of 906 putative palmitoylation sites on 641 proteins from mouse forebrain. 62% of sites map to known palmitoylated proteins and 102 individual palmitoylation sites are known from the literature. 54% of palmitoylation sites map to synaptic proteins including many GPCRs, receptors/ion channels and peripheral membrane proteins. Phosphorylation sites were also identified on a subset of peptides that were palmitoylated, demonstrating for the first time co-identification of these modifications by mass spectrometry. Palmitoylation sites were identified on over half of the family of palmitoyl-acyltransferases (PATs) that mediate protein palmitoylation, including active site thioester-linked palmitoyl intermediates. Distinct palmitoylation motifs and site topology were identified for integral membrane and soluble proteins, indicating potential differences in associated PAT specificity and palmitoylation function. ssABE allows the global identification of palmitoylation sites as well as measurement of the active site modification state of PATs, enabling palmitoylation to be studied at a systems level.
Project description:Protein palmitoylation, a common post-translational lipid modification, plays an important role in protein trafficking and functions. Recently developed palmitoyl-proteomic methods identified many novel substrates. However, the whole picture of palmitoyl substrates has not been clarified. Here, we performed global in silico screening using the CSS-Palm 2.0 program, free software for prediction of palmitoylation sites, and selected 17 candidates as novel palmitoyl substrates. Of the 17 candidates, 10 proteins, including 6 synaptic proteins (Syd-1, transmembrane AMPA receptor regulatory protein (TARP) ?-2, TARP ?-8, cornichon-2, Ca(2+)/calmodulin-dependent protein kinase II?, and neurochondrin (Ncdn)/norbin), one focal adhesion protein (zyxin), two ion channels (TRPM8 and TRPC1), and one G-protein-coupled receptor (orexin 2 receptor), were palmitoylated. Using the DHHC palmitoylating enzyme library, we found that all tested substrates were palmitoylated by the Golgi-localized DHHC3/7 subfamily. Ncdn, a regulator for neurite outgrowth and synaptic plasticity, was robustly palmitoylated by the DHHC1/10 (zDHHC1/11; z1/11) subfamily, whose substrate has not yet been reported. As predicted by CSS-Palm 2.0, Cys-3 and Cys-4 are the palmitoylation sites for Ncdn. Ncdn was specifically localized in somato-dendritic regions, not in the axon of rat cultured neurons. Stimulated emission depletion microscopy revealed that Ncdn was localized to Rab5-positive early endosomes in a palmitoylation-dependent manner, where DHHC1/10 (z1/11) were also distributed. Knockdown of DHHC1, -3, or -10 (z11) resulted in the loss of Ncdn from Rab5-positive endosomes. Thus, through in silico screening, we demonstrate that Ncdn and the DHHC1/10 (z1/11) and DHHC3/7 subfamilies are novel palmitoyl substrate-enzyme pairs and that Ncdn palmitoylation plays an essential role in its specific endosomal targeting.
Project description:Palmitoylation affects membrane partitioning, trafficking and activities of membrane proteins. However, how specificity of palmitoylation and multiple palmitoylations in membrane proteins are determined is not well understood. Here, we profile palmitoylation states of three human claudins, human CD20 and cysteine-engineered prokaryotic KcsA and bacteriorhodopsin by native mass spectrometry. Cysteine scanning of claudin-3, KcsA, and bacteriorhodopsin shows that palmitoylation is independent of a sequence motif. Palmitoylations are observed for cysteines exposed on the protein surface and situated up to 8?Å into the inner leaflet of the membrane. Palmitoylation on multiple sites in claudin-3 and CD20 occurs stochastically, giving rise to a distribution of palmitoylated membrane-protein isoforms. Non-native sites in claudin-3 indicate that membrane-protein function imposed evolutionary restraints on native palmitoylation sites. These results suggest a generic, stochastic membrane-protein palmitoylation process that is determined by the accessibility of palmitoyl-acyl transferases to cysteines on membrane-embedded proteins, and not by a preferred substrate-sequence motif.
Project description:<h4>Background</h4>Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation, as these proteins play a crucial role in gene-regulation. In this paper, we developed various SVM modules for predicting DNA-binding domains and proteins. All models were trained and tested on multiple datasets of non-redundant proteins.<h4>Results</h4>SVM models have been developed on DNAaset, which consists of 1153 DNA-binding and equal number of non DNA-binding proteins, and achieved the maximum accuracy of 72.42% and 71.59% using amino acid and dipeptide compositions, respectively. The performance of SVM model improved from 72.42% to 74.22%, when evolutionary information in form of PSSM profiles was used as input instead of amino acid composition. In addition, SVM models have been developed on DNAset, which consists of 146 DNA-binding and 250 non-binding chains/domains, and achieved the maximum accuracy of 79.80% and 86.62% using amino acid composition and PSSM profiles. The SVM models developed in this study perform better than existing methods on a blind dataset.<h4>Conclusion</h4>A highly accurate method has been developed for predicting DNA-binding proteins using SVM and PSSM profiles. This is the first study in which evolutionary information in form of PSSM profiles has been used successfully for predicting DNA-binding proteins. A web-server DNAbinder has been developed for identifying DNA-binding proteins and domains from query amino acid sequences http://www.imtech.res.in/raghava/dnabinder/.
Project description:<h4>Background</h4>Membrane transport proteins (transporters) move hydrophilic substrates across hydrophobic membranes and play vital roles in most cellular functions. Transporters represent a diverse group of proteins that differ in topology, energy coupling mechanism, and substrate specificity as well as sequence similarity. Among the functional annotations of transporters, information about their transporting substrates is especially important. The experimental identification and characterization of transporters is currently costly and time-consuming. The development of robust bioinformatics-based methods for the prediction of membrane transport proteins and their substrate specificities is therefore an important and urgent task.<h4>Results</h4>Support vector machine (SVM)-based computational models, which comprehensively utilize integrative protein sequence features such as amino acid composition, dipeptide composition, physico-chemical composition, biochemical composition, and position-specific scoring matrices (PSSM), were developed to predict the substrate specificity of seven transporter classes: amino acid, anion, cation, electron, protein/mRNA, sugar, and other transporters. An additional model to differentiate transporters from non-transporters was also developed. Among the developed models, the biochemical composition and PSSM hybrid model outperformed other models and achieved an overall average prediction accuracy of 76.69% with a Mathews correlation coefficient (MCC) of 0.49 and a receiver operating characteristic area under the curve (AUC) of 0.833 on our main dataset. This model also achieved an overall average prediction accuracy of 78.88% and MCC of 0.41 on an independent dataset.<h4>Conclusions</h4>Our analyses suggest that evolutionary information (i.e., the PSSM) and the AAIndex are key features for the substrate specificity prediction of transport proteins. In comparison, similarity-based methods such as BLAST, PSI-BLAST, and hidden Markov models do not provide accurate predictions for the substrate specificity of membrane transport proteins. TrSSP: The Transporter Substrate Specificity Prediction Server, a web server that implements the SVM models developed in this paper, is freely available at http://bioinfo.noble.org/TrSSP.
Project description:Large-scale characterisation of cysteine modification is enabling study of the physicochemical determinants of reactivity. We find that location of cysteine at the amino terminus of an ?-helix, associated with activity in thioredoxins, is under-represented in human protein structures, perhaps indicative of selection against background reactivity. An amino-terminal helix location underpins the covalent linkage for one class of kinase inhibitors. Cysteine targets for S-palmitoylation, S-glutathionylation, and S-nitrosylation show little correlation with pKa values predicted from structures, although flanking sequences of S-palmitoylated sites are enriched in positively-charged amino acids, which could facilitate palmitoyl group transfer to substrate cysteine. A surprisingly large fraction of modified sites, across the three modifications, would be buried in native protein structure. Furthermore, modified cysteines are (on average) closer to lysine ubiquitinations than are unmodified cysteines, indicating that cysteine redox biology could be associated with protein degradation and degron recognition.