Prediction of RNA-binding proteins by voting systems.
ABSTRACT: It is important to identify which proteins can interact with RNA for the purpose of protein annotation, since interactions between RNA and proteins influence the structure of the ribosome and play important roles in gene expression. This paper tries to identify proteins that can interact with RNA using voting systems. Firstly through Weka, 34 learning algorithms are chosen for investigation. Then simple majority voting system (SMVS) is used for the prediction of RNA-binding proteins, achieving average ACC (overall prediction accuracy) value of 79.72% and MCC (Matthew's correlation coefficient) value of 59.77% for the independent testing dataset. Then mRMR (minimum redundancy maximum relevance) strategy is used, which is transferred into algorithm selection. In addition, the MCC value of each classifier is assigned to be the weight of the classifier's vote. As a result, best average MCC values are attained when 22 algorithms are selected and integrated through weighted votes, which are 64.70% for the independent testing dataset, and ACC value is 82.04% at this moment.
Project description:As a result of the growing body of protein phosphorylation sites data, the number of phosphoprotein databases is constantly increasing, and dozens of tools are available for predicting protein phosphorylation sites to achieve fast automatic results. However, none of the existing tools has been developed to predict protein phosphorylation sites in rice.In this paper, the phosphorylation site predictors, NetPhos 2.0, NetPhosK, Kinasephos, Scansite, Disphos and Predphosphos, were integrated to construct meta-predictors of rice-specific phosphorylation sites using several methods, including unweighted voting, unreduced weighted voting, reduced unweighted voting and weighted voting strategies. PhosphoRice, the meta-predictor produced by using weighted voting strategy with parameters selected by restricted grid search and conditional random search, performed the best at predicting phosphorylation sites in rice. Its Matthew's Correlation Coefficient (MCC) and Accuracy (ACC) reached to 0.474 and 73.8%, respectively. Compared to the best individual element predictor (Disphos_default), PhosphoRice archieved a significant increase in MCC of 0.071 (P < 0.01), and an increase in ACC of 4.6%.PhosphoRice is a powerful tool for predicting unidentified phosphorylation sites in rice. Compared to the existing methods, we found that our tool showed greater robustness in ACC and MCC. PhosphoRice is available to the public at http://bioinformatics.fafu.edu.cn/PhosphoRice.
Project description:Anterior cingulate cortex (ACC) and midcingulate cortex (MCC) have been implicated in the regulation of aggressive behaviour. For instance, patients with conduct disorder (CD) show increased levels of aggression accompanied by changes in ACC and MCC volume. However, accounts of ACC/MCC changes in CD patients have been conflicting, likely due to the heterogeneity of the studied populations. Here, we address these discrepancies by studying volumetric changes of ACC/MCC in the BALB/cJ mouse, a model of aggression, compared to an age- and gender-matched control group of BALB/cByJ mice. We quantified aggression in BALB/cJ and BALB/cByJ mice using the resident-intruder test, and related this to volumetric measures of ACC/MCC based on Nissl-stained coronal brain slices of the same animals. We demonstrate that BALB/cJ behave consistently more aggressively (shorter attack latencies, more frequent attacks, anti-social biting) than the control group, while at the same time showing an increased volume of ACC and a decreased volume of MCC. Differences in ACC and MCC volume jointly predicted a high amount of variance in aggressive behaviour, while regression with only one predictor had a poor fit. This suggests that, beyond their individual contributions, the relationship between ACC and MCC plays an important role in regulating aggressive behaviour. Finally, we show the importance of switching from the classical rodent anatomical definition of ACC as cingulate area 2 and 1 to a definition that includes the MCC and is directly homologous to higher mammalian species: clear behaviour-related differences in ACC/MCC anatomy were only observed using the homologous definition.
Project description:Two broad categories of extracellular vesicles (EVs), exosomes and shed microvesicles (sMVs), which differ in size distribution as well as protein and RNA profiles, have been described. EVs are known to play key roles in cell-cell communication, acting proximally as well as systemically. This Review discusses the nature of EV subtypes, strategies for isolating EVs from both cell-culture media and body fluids, and procedures for quantifying EVs. We also discuss proteins selectively enriched in exosomes and sMVs that have the potential for use as markers to discriminate between EV subtypes, as well as various applications of EVs in clinical diagnosis.
Project description:<h4>Background</h4>Guanosine triphosphate (GTP)-binding proteins play an important role in regulation of G-protein. Thus prediction of GTP interacting residues in a protein is one of the major challenges in the field of the computational biology. In this study, an attempt has been made to develop a computational method for predicting GTP interacting residues in a protein with high accuracy (Acc), precision (Prec) and recall (Rc).<h4>Result</h4>All the models developed in this study have been trained and tested on a non-redundant (40% similarity) dataset using five-fold cross-validation. Firstly, we have developed neural network based models using single sequence and PSSM profile and achieved maximum Matthews Correlation Coefficient (MCC) 0.24 (Acc 61.30%) and 0.39 (Acc 68.88%) respectively. Secondly, we have developed a support vector machine (SVM) based models using single sequence and PSSM profile and achieved maximum MCC 0.37 (Prec 0.73, Rc 0.57, Acc 67.98%) and 0.55 (Prec 0.80, Rc 0.73, Acc 77.17%) respectively. In this work, we have introduced a new concept of predicting GTP interacting dipeptide (two consecutive GTP interacting residues) and tripeptide (three consecutive GTP interacting residues) for the first time. We have developed SVM based model for predicting GTP interacting dipeptides using PSSM profile and achieved MCC 0.64 with precision 0.87, recall 0.74 and accuracy 81.37%. Similarly, SVM based model have been developed for predicting GTP interacting tripeptides using PSSM profile and achieved MCC 0.70 with precision 0.93, recall 0.73 and accuracy 83.98%.<h4>Conclusion</h4>These results show that PSSM based method performs better than single sequence based method. The prediction models based on dipeptides or tripeptides are more accurate than the traditional model based on single residue. A web server "GTPBinder" http://www.imtech.res.in/raghava/gtpbinder/ based on above models has been developed for predicting GTP interacting residues in a protein.
Project description:DNA-binding proteins are crucial for various cellular processes, such as recognition of specific nucleotide, regulation of transcription, and regulation of gene expression. Developing an effective model for identifying DNA-binding proteins is an urgent research problem. Up to now, many methods have been proposed, but most of them focus on only one classifier and cannot make full use of the large number of negative samples to improve predicting performance. This study proposed a predictor called enDNA-Prot for DNA-binding protein identification by employing the ensemble learning technique. Experiential results showed that enDNA-Prot was comparable with DNA-Prot and outperformed DNAbinder and iDNA-Prot with performance improvement in the range of 3.97-9.52% in ACC and 0.08-0.19 in MCC. Furthermore, when the benchmark dataset was expanded with negative samples, the performance of enDNA-Prot outperformed the three existing methods by 2.83-16.63% in terms of ACC and 0.02-0.16 in terms of MCC. It indicated that enDNA-Prot is an effective method for DNA-binding protein identification and expanding training dataset with negative samples can improve its performance. For the convenience of the vast majority of experimental scientists, we developed a user-friendly web-server for enDNA-Prot which is freely accessible to the public.
Project description:?-Hemolysin (Hla) is a significant virulence factor in Staphylococcus aureus (S. aureus)-caused infectious diseases such as pneumonia. Thus, to prevent the production of Hla when treating S. aureus infection, it is necessary to choose an antibiotic with good antibacterial activity and effect. In our study, we observed that Fosfomycin (FOM) at a sub-inhibitory concentration inhibited expression of Hla. Molecular dynamics demonstrated that FOM bound to the binding sites LYS 154 and ASP 108 of Hla, potentially inhibiting Hla. Furthermore, we verified that staphylococcal membrane-derived vesicles (SMVs) contain Hla and that FOM treatment significantly reduced the production of SMVs and Hla. Based on our pharmacological inhibition analysis, ERK and p38 activated NLRP3 inflammasomes. Moreover, FOM inhibited expression of MAPKs and NLRP3 inflammasome-related proteins in S. aureus as well as SMV-infected human macrophages (M?) and alveolar epithelial cells. In vivo, SMVs isolated from S. aureus DU1090 (an isogenic Hla deletion mutant) or the strain itself caused weaker inflammation than that of its parent strain 8325-4. FOM also significantly reduced the phosphorylation levels of ERK and P38 and expression of NLRP3 inflammasome-related proteins. In addition, FOM decreased MPO activity, pulmonary vascular permeability and edema formation in the lungs of mice with S. aureus-caused pneumonia. Taken together, these data indicate that FOM exerts protective effects against S. aureus infection in vitro and in vivo by inhibiting Hla in SMVs and blocking ERK/P38-mediated NLRP3 inflammasome activation by Hla.
Project description:Microvesicles (MVs), which are cell-derived membrane vesicles present in body fluids, are closely associated with the development of malignant tumours. Saliva, one of the most versatile body fluids, is an important source of MVs. However, the association between salivary MVs (SMVs) and oral squamous cell carcinoma (OSCC), which is directly immersed in the salivary milieu, remains unclear. SMVs from 65 patients with OSCC, 21 patients with oral ulcer (OU), and 42 healthy donors were purified, quantified and analysed for their correlations with the clinicopathologic features and prognosis of OSCC patients. The results showed that the level of SMVs was significantly elevated in patients with OSCC compared to healthy donors and OU patients. Meanwhile, the level of SMVs showed close correlations with the lymph node status, and the clinical stage of OSCC patients. Additionally, the ratio of apoptotic to non-apoptotic SMVs was significantly decreased in OSCC patients with higher pathological grade. Consistently, poorer overall survival was observed in patients with lower ratio of apoptotic to non-apoptotic SMVs. In conclusion, the elevated level of SMVs is associated with clinicopathologic features and decreased survival in patients with OSCC, suggesting that SMVs are a potential biomarker and/or regulator of the malignant progression of OSCC.
Project description:Successfully navigating social interactions requires the precise and balanced integration of social and environmental cues. When such flexible information integration fails, maladaptive behavioral patterns arise, including excessive aggression, empathy deficits, and social withdrawal, as seen in disorders such as conduct disorder and autism spectrum disorder. One of the main hubs for the context-dependent regulation of behavior is cingulate cortex, specifically anterior cingulate cortex (ACC) and midcingulate cortex (MCC). While volumetric abnormalities of ACC and MCC have been demonstrated in patients, little is known about the exact structural changes responsible for the dysregulation of behaviors such as aggression and social withdrawal. Here, we demonstrate that the distribution of parvalbumin (PV) and somatostatin (SOM) interneurons across ACC and MCC differentially predicts aggression and social withdrawal in BALB/cJ mice. BALB/cJ mice were phenotyped for their social behavior (three-chamber task) and aggression (resident-intruder task) compared to control (BALB/cByJ) mice. In line with previous studies, BALB/cJ mice behaved more aggressively than controls. The three-chamber task revealed two sub-groups of highly-sociable versus less-sociable BALB/cJ mice. Highly-sociable BALB/cJ mice were as aggressive as the less-sociable group-in fact, they committed more acts of socially acceptable aggression (threats and harmless bites). PV and SOM immunostaining revealed that a lack of specificity in the distribution of SOM and PV interneurons across cingulate cortex coincided with social withdrawal: both control mice and highly-sociable BALB/cJ mice showed a differential distribution of PV and SOM interneurons across the sub-areas of cingulate cortex, while for less-sociable BALB/cJ mice, the distributions were near-flat. In contrast, both highly-sociable and less-sociable BALB/cJ mice had a decreased concentration of PV interneurons in MCC compared to controls, which was therefore linked to aggressive behavior. Together, these results suggest that the dynamic balance of excitatory and inhibitory activity across ACC and MCC shapes both social and aggressive behavior.
Project description:Nucleosomes are the basic units of eukaryotes. The accurate positioning of nucleosomes plays a significant role in understanding many biological processes such as transcriptional regulation mechanisms and DNA replication and repair. Here, we describe the development of a novel method, termed ZCMM, based on Z-curve theory and position weight matrix (PWM). The ZCMM was trained and tested using the nucleosomal and linker sequences determined by support vector machine (SVM) in Saccharomyces cerevisiae (S. cerevisiae), and experimental results showed that the sensitivity (Sn), specificity (Sp), accuracy (Acc), and Matthews correlation coefficient (MCC) values for ZCMM were 91.40%, 96.56%, 96.75%, and 0.88, respectively, and the average area under the receiver operating characteristic curve (AUC) value was 0.972. A ZCMM predictor was developed to predict nucleosome positioning in Homo sapiens (H. sapiens), Caenorhabditis elegans (C. elegans), and Drosophila melanogaster (D. melanogaster) genomes, and the accuracy (Acc) values were 77.72%, 85.34%, and 93.62%, respectively. The maximum AUC values of the four species were 0.982, 0.861, 0.912 and 0.911, respectively. Another independent dataset for S. cerevisiae was used to predict nucleosome positioning. Compared with the results of Wu's method, it was found that the Sn, Sp, Acc, and MCC of ZCMM results for S. cerevisiae were all higher, reaching 96.72%, 96.54%, 94.10%, and 0.88. Compared with the Guo's method 'iNuc-PseKNC', the results of ZCMM for D. melanogaster were better. Meanwhile, the ZCMM was compared with some experimental data in vitro and in vivo for S. cerevisiae, and the results showed that the nucleosomes predicted by ZCMM were highly consistent with those confirmed by these experiments. Therefore, it was further confirmed that the ZCMM method has good accuracy and reliability in predicting nucleosome positioning.
Project description:<h4>Background</h4>Lysine succinylation is one of the reversible protein post-translational modifications (PTMs), which regulate the structure and function of proteins. It plays a significant role in various cellular physiologies including some diseases of human as well as many other organisms. The accurate identification of succinylation site is essential to understand the various biological functions and drug development.<h4>Methods</h4>In this study, we developed an improved method to predict lysine succinylation sites mapping on <i>Homo sapiens by</i> the fusion of three encoding schemes such as binary, the composition of <i>k</i>-spaced amino acid pairs (CKSAAP) and amino acid composition (AAC) with the random forest (RF) classifier. The prediction performance of the proposed random forest (RF) based on the fusion model in a comparison of other candidates was investigated by using 20-fold cross-validation (CV) and two independent test datasets were collected from two different sources.<h4>Results</h4>The CV results showed that the proposed predictor achieves the highest scores of sensitivity (SN) as 0.800, specificity (SP) as 0.902, accuracy (ACC) as 0.919, Mathew correlation coefficient (MCC) as 0.766 and partial AUC (pAUC) as 0.163 at a false-positive rate (FPR) = 0.10 and area under the ROC curve (AUC) as 0.958. It achieved the highest performance scores of SN as 0.811, SP as 0.902, ACC as 0.891, MCC as 0.629 and pAUC as 0.139 and AUC as 0.921 for the independent test protein set-1 and SN as 0.772, SP as 0.901, ACC as 0.836, MCC as 0.677 and pAUC as 0.141 at FPR = 0.10 and AUC as 0.923 for the independent test protein set-2. It also outperformed all the other existing prediction models.<h4>Conclusion</h4>The prediction performances as discussed in this article recommend that the proposed method might be a useful and encouraging computational resource for lysine succinylation site prediction in the case of human population.