Investigation and identification of protein carbonylation sites based on position-specific amino acid composition and physicochemical features.
ABSTRACT: Protein carbonylation, an irreversible and non-enzymatic post-translational modification (PTM), is often used as a marker of oxidative stress. When reactive oxygen species (ROS) oxidized the amino acid side chains, carbonyl (CO) groups are produced especially on Lysine (K), Arginine (R), Threonine (T), and Proline (P). Nevertheless, due to the lack of information about the carbonylated substrate specificity, we were encouraged to develop a systematic method for a comprehensive investigation of protein carbonylation sites.After the removal of redundant data from multipe carbonylation-related articles, totally 226 carbonylated proteins in human are regarded as training dataset, which consisted of 307, 126, 128, and 129 carbonylation sites for K, R, T and P residues, respectively. To identify the useful features in predicting carbonylation sites, the linear amino acid sequence was adopted not only to build up the predictive model from training dataset, but also to compare the effectiveness of prediction with other types of features including amino acid composition (AAC), amino acid pair composition (AAPC), position-specific scoring matrix (PSSM), positional weighted matrix (PWM), solvent-accessible surface area (ASA), and physicochemical properties. The investigation of position-specific amino acid composition revealed that the positively charged amino acids (K and R) are remarkably enriched surrounding the carbonylated sites, which may play a functional role in discriminating between carbonylation and non-carbonylation sites. A variety of predictive models were built using various features and three different machine learning methods. Based on the evaluation by five-fold cross-validation, the models trained with PWM feature could provide better sensitivity in the positive training dataset, while the models trained with AAindex feature achieved higher specificity in the negative training dataset. Additionally, the model trained using hybrid features, including PWM, AAC and AAindex, obtained best MCC values of 0.432, 0.472, 0.443 and 0.467 on K, R, T and P residues, respectively.When comparing to an existing prediction tool, the selected models trained with hybrid features provided a promising accuracy on an independent testing dataset. In short, this work not only characterized the carbonylated substrate preference, but also demonstrated that the proposed method could provide a feasible means for accelerating preliminary discovery of protein carbonylation.
Project description:BACKGROUND:Carbonylation, which takes place through oxidation of reactive oxygen species (ROS) on specific residues, is an irreversibly oxidative modification of proteins. It has been reported that the carbonylation is related to a number of metabolic or aging diseases including diabetes, chronic lung disease, Parkinson's disease, and Alzheimer's disease. Due to the lack of computational methods dedicated to exploring motif signatures of protein carbonylation sites, we were motivated to exploit an iterative statistical method to characterize and identify carbonylated sites with motif signatures. RESULTS:By manually curating experimental data from research articles, we obtained 332, 144, 135, and 140 verified substrate sites for K (lysine), R (arginine), T (threonine), and P (proline) residues, respectively, from 241 carbonylated proteins. In order to examine the informative attributes for classifying between carbonylated and non-carbonylated sites, multifarious features including composition of twenty amino acids (AAC), composition of amino acid pairs (AAPC), position-specific scoring matrix (PSSM), and positional weighted matrix (PWM) were investigated in this study. Additionally, in an attempt to explore the motif signatures of carbonylation sites, an iterative statistical method was adopted to detect statistically significant dependencies of amino acid compositions between specific positions around substrate sites. Profile hidden Markov model (HMM) was then utilized to train a predictive model from each motif signature. Moreover, based on the method of support vector machine (SVM), we adopted it to construct an integrative model by combining the values of bit scores obtained from profile HMMs. The combinatorial model could provide an enhanced performance with evenly predictive sensitivity and specificity in the evaluation of cross-validation and independent testing. CONCLUSION:This study provides a new scheme for exploring potential motif signatures at substrate sites of protein carbonylation. The usefulness of the revealed motifs in the identification of carbonylated sites is demonstrated by their effective performance in cross-validation and independent testing. Finally, these substrate motifs were adopted to build an available online resource (MDD-Carb, http://csb.cse.yzu.edu.tw/MDDCarb/ ) and are also anticipated to facilitate the study of large-scale carbonylated proteomes.
Project description:BACKGROUND:Carbonyl derivatives are mainly formed by direct metal-catalysed oxidation (MCO) attacks on the amino-acid side chains of proline, arginine, lysine and threonine residues. For reasons unknown, only some proteins are prone to carbonylation. METHODOLOGY/PRINCIPAL FINDINGS:we used mass spectrometry analysis to identify carbonylated sites in: BSA that had undergone in vitro MCO, and 23 carbonylated proteins in Escherichia coli. The presence of a carbonylated site rendered the neighbouring carbonylatable site more prone to carbonylation. Most carbonylated sites were present within hot spots of carbonylation. These observations led us to suggest rules for identifying sites more prone to carbonylation. We used these rules to design an in silico model (available at http://www.lcb.cnrs-mrs.fr/CSPD/), allowing an effective and accurate prediction of sites and of proteins more prone to carbonylation in the E. coli proteome. CONCLUSIONS/SIGNIFICANCE:We observed that proteins evolve to either selectively maintain or lose predicted hot spots of carbonylation depending on their biological function. As our predictive model also allows efficient detection of carbonylated proteins in Bacillus subtilis, we believe that our model may be extended to direct MCO attacks in all organisms.
Project description:Carbonylation is a posttranslational modification (PTM or PTLM), where a carbonyl group is added to lysine (K), proline (P), arginine (R), and threonine (T) residue of a protein molecule. Carbonylation plays an important role in orchestrating various biological processes but it is also associated with many diseases such as diabetes, chronic lung disease, Parkinson's disease, Alzheimer's disease, chronic renal failure, and sepsis. Therefore, from the angles of both basic research and drug development, we are facing a challenging problem: for an uncharacterized protein sequence containing many residues of K, P, R, or T, which ones can be carbonylated, and which ones cannot? To address this problem, we have developed a predictor called iCar-PseCp by incorporating the sequence-coupled information into the general pseudo amino acid composition, and balancing out skewed training dataset by Monte Carlo sampling to expand positive subset. Rigorous target cross-validations on a same set of carbonylation-known proteins indicated that the new predictor remarkably outperformed its existing counterparts. For the convenience of most experimental scientists, a user-friendly web-server for iCar-PseCp has been established at http://www.jci-bioinfo.cn/iCar-PseCp, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved. It has not escaped our notice that the formulation and approach presented here can also be used to analyze many other problems in computational proteomics.
Project description:BACKGROUND: Carboxylation is a modification of glutamate (Glu) residues which occurs post-translation that is catalyzed by γ-glutamyl carboxylase in the lumen of the endoplasmic reticulum. Vitamin K is a critical co-factor in the post-translational conversion of Glu residues to γ-carboxyglutamate (Gla) residues. It has been shown that the process of carboxylation is involved in the blood clotting cascade, bone growth, and extraosseous calcification. However, studies in this field have been limited by the difficulty of experimentally studying substrate site specificity in γ-glutamyl carboxylation. In silico investigations have the potential for characterizing carboxylated sites before experiments are carried out. RESULTS: Because of the importance of γ-glutamyl carboxylation in biological mechanisms, this study investigates the substrate site specificity in carboxylation sites. It considers not only the composition of amino acids that surround carboxylation sites, but also the structural characteristics of these sites, including secondary structure and solvent-accessible surface area (ASA). The explored features are used to establish a predictive model for differentiating between carboxylation sites and non-carboxylation sites. A support vector machine (SVM) is employed to establish a predictive model with various features. A five-fold cross-validation evaluation reveals that the SVM model, trained with the combined features of positional weighted matrix (PWM), amino acid composition (AAC), and ASA, yields the highest accuracy (0.892). Furthermore, an independent testing set is constructed to evaluate whether the predictive model is over-fitted to the training set. CONCLUSIONS: Independent testing data that did not undergo the cross-validation process shows that the proposed model can differentiate between carboxylation sites and non-carboxylation sites. This investigation is the first to study carboxylation sites and to develop a system for identifying them. The proposed method is a practical means of preliminary analysis and greatly diminishes the total number of potential carboxylation sites requiring further experimental confirmation.
Project description:Protein secretion systems used by almost all bacteria are highly significant for the normal existence and interaction of bacteria with their host. The accumulation of genome sequence data in past few years has provided great insights into the distribution and function of these secretion systems. In this study, a support vector machine (SVM)- based method, SSPred was developed for the automated functional annotation of proteins involved in secretion systems further classifying them into five major sub-types (Type-I, Type-II, Type-III, Type-IV and Sec systems). The dataset used in this study for training and testing was obtained from KEGG and SwissProt database and was curated in order to avoid redundancy. To overcome the problem of imbalance in positive and negative dataset, an ensemble of SVM modules, each trained on a balanced subset of the training data were used. Firstly, protein sequence features like amino-acid composition (AAC), dipeptide composition (DPC) and physico-chemical composition (PCC) were used to develop the SVM-based modules that achieved an average accuracy of 84%, 85.17% and 82.59%, respectively. Secondly, a hybrid module (hybrid-I) integrating all the previously used features was developed that achieved an average accuracy of 86.12%. Another hybrid module (hybrid-II) developed using evolutionary information of a protein sequence extracted from position-specific scoring matrix and amino-acid composition achieved a maximum average accuracy of 89.73%. On unbiased evaluation using an independent data set, SSPred showed good prediction performance in identification and classification of secretion systems. SSPred is a freely available World Wide Web server at http//www.bioinformatics.org/sspred.
Project description:Dose-dependent oxidative stress by the anthracycline doxorubicin (Dox) and other chemotherapeutic agents causes irreversible cardiac damage, restricting their clinical effectiveness. We hypothesized that the resultant protein oxidation could be monitored and correlated with physiological functional impairment. We focused on protein carbonylation as an indicator of severe oxidative damage because it is irreversible and results in proteasomal degradation. We identified and investigated a specific high-molecular weight cardiac protein that showed a significant increase in carbonylation under Dox-induced cardiotoxic conditions in a spontaneously hypertensive rat model. We confirmed carbonylation and degradation of this protein under oxidative stress and prevention of such effect in the presence of the iron chelator dexrazoxane. Using MS, the Dox-induced carbonylated protein was identified as the 140-kDa cardiac myosin binding protein C (MyBPC). We confirmed the carbonylation and degradation of MyBPC using HL-1 cardiomyocytes and a purified recombinant untagged cardiac MyBPC under metal-catalyzed oxidative stress conditions. The carbonylation and degradation of MyBPC were time- and drug concentration-dependent. We demonstrated that carbonylated MyBPC undergoes proteasome-mediated degradation under Dox-induced oxidative stress. Cosedimentation, immunoprecipitation, and actin binding assays were used to study the functional consequences of carbonylated MyBPC. Carbonylation of MyBPC showed significant functional impairment associated with its actin binding properties. The dissociation constant of carbonylated recombinant MyBPC for actin was 7.35 ± 1.9 ?M compared with 2.7 ± 0.6 ?M for native MyBPC. Overall, our findings indicate that MyBPC carbonylation serves as a critical determinant of cardiotoxicity and could serve as a mechanistic indicator for Dox-induced cardiotoxicity.
Project description:S-palmitoylation, the covalent attachment of 16-carbon palmitic acids to a cysteine residue via a thioester linkage, is an important reversible lipid modification that plays a regulatory role in a variety of physiological and biological processes. As the number of experimentally identified S-palmitoylated peptides increases, it is imperative to investigate substrate motifs to facilitate the study of protein S-palmitoylation. Based on 710 non-homologous S-palmitoylation sites obtained from published databases and the literature, we carried out a bioinformatics investigation of S-palmitoylation sites based on amino acid composition. Two Sample Logo indicates that positively charged and polar amino acids surrounding S-palmitoylated sites may be associated with the substrate site specificity of protein S-palmitoylation. Additionally, maximal dependence decomposition (MDD) was applied to explore the motif signatures of S-palmitoylation sites by categorizing a large-scale dataset into subgroups with statistically significant conservation of amino acids. Single features such as amino acid composition (AAC), amino acid pair composition (AAPC), position specific scoring matrix (PSSM), position weight matrix (PWM), amino acid substitution matrix (BLOSUM62), and accessible surface area (ASA) were considered, along with the effectiveness of incorporating MDD-identified substrate motifs into a two-layered prediction model. Evaluation by five-fold cross-validation showed that a hybrid of AAC and PSSM performs best at discriminating between S-palmitoylation and non-S-palmitoylation sites, according to the support vector machine (SVM). The two-layered SVM model integrating MDD-identified substrate motifs performed well, with a sensitivity of 0.79, specificity of 0.80, accuracy of 0.80, and Matthews Correlation Coefficient (MCC) value of 0.45. Using an independent testing dataset (613 S-palmitoylated and 5412 non-S-palmitoylated sites) obtained from the literature, we demonstrated that the two-layered SVM model could outperform other prediction tools, yielding a balanced sensitivity and specificity of 0.690 and 0.694, respectively. This two-layered SVM model has been implemented as a web-based system (MDD-Palm), which is now freely available at http://csb.cse.yzu.edu.tw/MDDPalm/.
Project description:UVB oxidizes proteins through the generation of reactive oxygen species. One consequence of UVB irradiation is carbonylation, the irreversible formation of a carbonyl group on proline, lysine, arginine or threonine residues. In this study, redox proteomics was performed to identify carbonylated proteins in the UVB resistant marine bacterium Photobacterium angustum. Mass-spectrometry was performed with either biotin-labeled or dinitrophenylhydrazide (DNPH) derivatized proteins. The DNPH redox proteomics method enabled the identification of 62 carbonylated proteins (5% of 1221 identified proteins) in cells exposed to UVB or darkness. Eleven carbonylated proteins were quantified and the UVB/dark abundance ratio was determined at both the protein and peptide levels. As a result we determined which functional classes of proteins were carbonylated, which residues were preferentially modified, and what the implications of the carbonylation were for protein function. As the first large scale, shotgun redox proteomics analysis examining carbonylation to be performed on bacteria, our study provides a new level of understanding about the effects of UVB on cellular proteins, and provides a methodology for advancing studies in other biological systems.
Project description:Background:Non-alcoholic fatty liver disease (NAFLD) is caused by excessive accumulation of fat within the liver, leading to further severe conditions such as non-alcoholic steatohepatitis (NASH). Progression of healthy liver to steatosis and NASH is not yet fully understood in terms of process and response. Hepatic oxidative stress is believed to be one of the factors driving steatosis to NASH. Oxidative protein modification is the major cause of protein functional impairment in which alteration of key hepatic enzymes is likely to be a crucial factor for NAFLD biology. In the present study, we aimed to discover carbonylated protein profiles involving in NAFLD biology in vitro. Methods:Hepatocyte cell line was used to induce steatosis with fatty acids (FA) in the presence and absence of menadione (oxidative stress inducer). Two-dimensional gel electrophoresis-based proteomics and dinitrophenyl hydrazine derivatization technique were used to identify carbonylated proteins. Sequentially, in order to view changes in protein carbonylation pathway, enrichment using Funrich algorithm was performed. The selected carbonylated proteins were validated with western blot and carbonylated sites were further identified by high-resolution LC-MS/MS. Results:Proteomic results and pathway analysis revealed that carbonylated proteins are involved in NASH pathogenesis pathways in which most of them play important roles in energy metabolisms. Particularly, carbonylation level of ATP synthase subunit ? (ATP5A), a key protein in cellular respiration, was reduced after FA and FA with oxidative stress treatment, whereas its expression was not altered. Carbonylated sites on this protein were identified and it was revealed that these sites are located in nucleotide binding region. Modification of these sites may, therefore, disturb ATP5A activity. As a consequence, the lower carbonylation level on ATP5A after FA treatment solely or with oxidative stress can increase ATP production. Conclusions:The reduction in carbonylated level of ATP5A might occur to generate more energy in response to pathological conditions, in our case, fat accumulation and oxidative stress in hepatocytes. This would imply the association between protein carbonylation and molecular response to development of steatosis and NASH.
Project description:One of the most important irreversible oxidative modifications of proteins is carbonylation, the process of introducing a carbonyl group in reaction with reactive oxygen species. Notably, carbonylation increases with the age of cells and is associated with the formation of intracellular protein aggregates and the pathogenesis of age-related disorders such as neurodegenerative diseases and cancer. However, it is still largely unclear how carbonylation affects protein structure, dynamics, and aggregability at the atomic level. Here, we use classical molecular dynamics simulations to study structure and dynamics of the carbonylated headpiece domain of villin, a key actin-organizing protein. We perform an exhaustive set of molecular dynamics simulations of a native villin headpiece together with every possible combination of carbonylated versions of its seven lysine, arginine, and proline residues, quantitatively the most important carbonylable amino acids. Surprisingly, our results suggest that high levels of carbonylation, far above those associated with cell death in vivo, may be required to destabilize and unfold protein structure through the disruption of specific stabilizing elements, such as salt bridges or proline kinks, or tampering with the hydrophobic effect. On the other hand, by using thermodynamic integration and molecular hydrophobicity potential approaches, we quantitatively show that carbonylation of hydrophilic lysine and arginine residues is equivalent to introducing hydrophobic, charge-neutral mutations in their place, and, by comparison with experimental results, we demonstrate that this by itself significantly increases the intrinsic aggregation propensity of both structured, native proteins and their unfolded states. Finally, our results provide a foundation for a novel experimental strategy to study the effects of carbonylation on protein structure, dynamics, and aggregability using site-directed mutagenesis.