Epitope prediction based on random peptide library screening: benchmark dataset and prediction tools evaluation.
ABSTRACT: Epitope prediction based on random peptide library screening has become a focus as a promising method in immunoinformatics research. Some novel software and web-based servers have been proposed in recent years and have succeeded in given test cases. However, since the number of available mimotopes with the relevant structure of template-target complex is limited, a systematic evaluation of these methods is still absent. In this study, a new benchmark dataset was defined. Using this benchmark dataset and a representative dataset, five examples of the most popular epitope prediction software products which are based on random peptide library screening have been evaluated. Using the benchmark dataset, in no method did performance exceed a 0.42 precision and 0.37 sensitivity, and the MCC scores suggest that the epitope prediction results of these software programs are greater than random prediction about 0.09-0.13; while using the representative dataset, most of the values of these performance measures are slightly improved, but the overall performance is still not satisfactory. Many test cases in the benchmark dataset cannot be applied to these pieces of software due to software limitations. Moreover chances are that these software products are overfitted to the small dataset and will fail in other cases. Therefore finding the correlation between mimotopes and genuine epitope residues is still far from resolved and much larger dataset for mimotope-based epitope prediction is desirable.
Project description:Identification of epitopes which invokes strong humoral responses is an essential issue in the field of immunology. Various computational methods that have been developed based on the antigen structures and the mimotopes these years narrow the search for experimental validation. These methods can be divided into two categories: antigen structure-based methods and mimotope-based methods. Though new methods of the two kinds have been proposed in these years, they cannot maintain a high degree of satisfaction in various circumstances. In this paper, we proposed a new conformational B-cell epitope prediction method based on antigen preprocessing and mimotopes analysis. The method classifies the antigen surface residues into "epitopes" and "nonepitopes" by six epitope propensity scales, removing the "nonepitopes" and using the preprocessed antigen for epitope prediction based on mimotope sequences. The proposed method gives out the mean F score of 0.42 on the testing dataset. When compared with other publicly available servers by using the testing dataset, the new method yields better performance. The results demonstrate the proposed method is competent for the conformational B-cell epitope prediction.
Project description:The ability to predict antibody binding sites (aka antigenic determinants or B-cell epitopes) for a given protein is a precursor to new vaccine design and diagnostics. Among the various methods of B-cell epitope identification X-ray crystallography is one of the most reliable methods. Using these experimental data computational methods exist for B-cell epitope prediction. As the number of structures of antibody-protein complexes grows, further interest in prediction methods using 3D structure is anticipated. This work aims to establish a benchmark for 3D structure-based epitope prediction methods.Two B-cell epitope benchmark datasets inferred from the 3D structures of antibody-protein complexes were defined. The first is a dataset of 62 representative 3D structures of protein antigens with inferred structural epitopes. The second is a dataset of 82 structures of antibody-protein complexes containing different structural epitopes. Using these datasets, eight web-servers developed for antibody and protein binding sites prediction have been evaluated. In no method did performance exceed a 40% precision and 46% recall. The values of the area under the receiver operating characteristic curve for the evaluated methods were about 0.6 for ConSurf, DiscoTope, and PPI-PRED methods and above 0.65 but not exceeding 0.70 for protein-protein docking methods when the best of the top ten models for the bound docking were considered; the remaining methods performed close to random. The benchmark datasets are included as a supplement to this paper.It may be possible to improve epitope prediction methods through training on datasets which include only immune epitopes and through utilizing more features characterizing epitopes, for example, the evolutionary conservation score. Notwithstanding, overall poor performance may reflect the generality of antigenicity and hence the inability to decipher B-cell epitopes as an intrinsic feature of the protein. It is an open question as to whether ultimately discriminatory features can be found.
Project description:BACKGROUND: The prediction of conformational B-cell epitopes is one of the most important goals in immunoinformatics. The solution to this problem, even if approximate, would help in designing experiments to precisely map the residues of interaction between an antigen and an antibody. Consequently, this area of research has received considerable attention from immunologists, structural biologists and computational biologists. Phage-displayed random peptide libraries are powerful tools used to obtain mimotopes that are selected by binding to a given monoclonal antibody (mAb) in a similar way to the native epitope. These mimotopes can be considered as functional epitope mimics. Mimotope analysis based methods can predict not only linear but also conformational epitopes and this has been the focus of much research in recent years. Though some algorithms based on mimotope analysis have been proposed, the precise localization of the interaction site mimicked by the mimotopes is still a challenging task. RESULTS: In this study, we propose a method for B-cell epitope prediction based on mimotope analysis called Pep-3D-Search. Given the 3D structure of an antigen and a set of mimotopes (or a motif sequence derived from the set of mimotopes), Pep-3D-Search can be used in two modes: mimotope or motif. To evaluate the performance of Pep-3D-Search to predict epitopes from a set of mimotopes, 10 epitopes defined by crystallography were compared with the predicted results from a Pep-3D-Search: the average Matthews correlation coefficient (MCC), sensitivity and precision were 0.1758, 0.3642 and 0.6948. Compared with other available prediction algorithms, Pep-3D-Search showed comparable MCC, specificity and precision, and could provide novel, rational results. To verify the capability of Pep-3D-Search to align a motif sequence to a 3D structure for predicting epitopes, 6 test cases were used. The predictive performance of Pep-3D-Search was demonstrated to be superior to that of other similar programs. Furthermore, a set of test cases with different lengths of sequences was constructed to examine Pep-3D-Search's capability in searching sequences on a 3D structure. The experimental results demonstrated the excellent search capability of Pep-3D-Search, especially when the length of the query sequence becomes longer; the iteration numbers of Pep-3D-Search to precisely localize the target paths did not obviously increase. This means that Pep-3D-Search has the potential to quickly localize the epitope regions mimicked by longer mimotopes. CONCLUSION: Our Pep-3D-Search provides a powerful approach for localizing the surface region mimicked by the mimotopes. As a publicly available tool, Pep-3D-Search can be utilized and conveniently evaluated, and it can also be used to complement other existing tools. The data sets and open source code used to obtain the results in this paper are available on-line and as supplementary material. More detailed materials may be accessed at (http://kyc.nenu.edu.cn/Pep3DSearch/).
Project description:T cell epitope candidates are commonly identified using computational prediction tools in order to enable applications such as vaccine design, cancer neoantigen identification, development of diagnostics and removal of unwanted immune responses against protein therapeutics. Most T cell epitope prediction tools are based on machine learning algorithms trained on MHC binding or naturally processed MHC ligand elution data. The ability of currently available tools to predict T cell epitopes has not been comprehensively evaluated. In this study, we used a recently published dataset that systematically defined T cell epitopes recognized in vaccinia virus (VACV) infected C57BL/6 mice (expressing H-2Db and H-2Kb), considering both peptides predicted to bind MHC or experimentally eluted from infected cells, making this the most comprehensive dataset of T cell epitopes mapped in a complex pathogen. We evaluated the performance of all currently publicly available computational T cell epitope prediction tools to identify these major epitopes from all peptides encoded in the VACV proteome. We found that all methods were able to improve epitope identification above random, with the best performance achieved by neural network-based predictions trained on both MHC binding and MHC ligand elution data (NetMHCPan-4.0 and MHCFlurry). Impressively, these methods were able to capture more than half of the major epitopes in the top N = 277 predictions within the N = 767,788 predictions made for distinct peptides of relevant lengths that can theoretically be encoded in the VACV proteome. These performance metrics provide guidance for immunologists as to which prediction methods to use, and what success rates are possible for epitope predictions when considering a highly controlled system of administered immunizations to inbred mice. In addition, this benchmark was implemented in an open and easy to reproduce format, providing developers with a framework for future comparisons against new tools.
Project description:BACKGROUND: A B-cell epitope is a group of residues on the surface of an antigen which stimulates humoral responses. Locating these epitopes on antigens is important for the purpose of effective vaccine design. In recent years, mapping affinity-selected peptides screened from a random phage display library to the native epitope has become popular in epitope prediction. These peptides, also known as mimotopes, share the similar structure and function with the corresponding native epitopes. Great effort has been made in using this similarity between such mimotopes and native epitopes in prediction, which has resulted in better outcomes than statistics-based methods can. However, it cannot maintain a high degree of satisfaction in various circumstances. RESULTS: In this study, we propose a new method that maps a group of mimotopes back to a source antigen so as to locate the interacting epitope on the antigen. The core of this method is a searching algorithm that is incorporated with both dynamic programming (DP) and branch and bound (BB) optimization and operated on a series of overlapping patches on the surface of a protein. These patches are then transformed to a number of graphs using an adaptable distance threshold (ADT) regulated by an appropriate compactness factor (CF), a novel parameter proposed in this study. Compared with both Pep-3D-Search and PepSurf, two leading graph-based search tools, on average from the results of 18 test cases, MimoPro, the Web-based implementation of our proposed method, performed better in sensitivity, precision, and Matthews correlation coefficient (MCC) than both did in epitope prediction. In addition, MimoPro is significantly faster than both Pep-3D-Search and PepSurf in processing. CONCLUSIONS: Our search algorithm designed for processing well constructed graphs using an ADT regulated by CF is more sensitive and significantly faster than other graph-based approaches in epitope prediction. MimoPro is a viable alternative to both PepSurf and Pep-3D-Search for epitope prediction in the same kind, and freely accessible through the MimoPro server located at http://informatics.nenu.edu.cn/MimoPro.
Project description:It is important to accurately determine the performance of peptide:MHC binding predictions, as this enables users to compare and choose between different prediction methods and provides estimates of the expected error rate. Two common approaches to determine prediction performance are cross-validation, in which all available data are iteratively split into training and testing data, and the use of blind sets generated separately from the data used to construct the predictive method. In the present study, we have compared cross-validated prediction performances generated on our last benchmark dataset from 2009 with prediction performances generated on data subsequently added to the Immune Epitope Database (IEDB) which served as a blind set.We found that cross-validated performances systematically overestimated performance on the blind set. This was found not to be due to the presence of similar peptides in the cross-validation dataset. Rather, we found that small size and low sequence/affinity diversity of either training or blind datasets were associated with large differences in cross-validated vs. blind prediction performances. We use these findings to derive quantitative rules of how large and diverse datasets need to be to provide generalizable performance estimates.It has long been known that cross-validated prediction performance estimates often overestimate performance on independently generated blind set data. We here identify and quantify the specific factors contributing to this effect for MHC-I binding predictions. An increasing number of peptides for which MHC binding affinities are measured experimentally have been selected based on binding predictions and thus are less diverse than historic datasets sampling the entire sequence and affinity space, making them more difficult benchmark data sets. This has to be taken into account when comparing performance metrics between different benchmarks, and when deriving error estimates for predictions based on benchmark performance.
Project description:BACKGROUND:The development of accurate epitope prediction tools is important in facilitating disease diagnostics, treatment and vaccine development. The advent of new approaches making use of antibody and TCR sequence information to predict receptor-specific epitopes have the potential to transform the epitope prediction field. Development and validation of these new generation of epitope prediction methods would benefit from regularly updated high-quality receptor-antigen complex datasets. RESULTS:To address the need for high-quality datasets to benchmark performance of these new generation of receptor-specific epitope prediction tools, a webserver called SCEptRe (Structural Complexes of Epitope-Receptor) was created. SCEptRe extracts weekly updated 3D complexes of antibody-antigen, TCR-pMHC and MHC-ligand from the Immune Epitope Database and clusters them based on antigen, receptor and epitope features to generate benchmark datasets. SCEptRe also provides annotated information such as CDR sequences and VDJ genes on the receptors. Users can generate custom datasets based by selecting thresholds for structural quality and clustering parameters (e.g. resolution, R-free factor, antigen or epitope sequence identity) based on their need. CONCLUSIONS:SCEptRe provides weekly updated, user-customized comprehensive benchmark datasets of immune receptor-epitope structural complexes. These datasets can be used to develop and benchmark performance of receptor-specific epitope prediction tools in the future. SCEptRe is freely accessible at http://tools.iedb.org/sceptre .
Project description:Next generation sequencing (NGS) is widely applied in immunological research, but has yet to become common in antibody epitope mapping. A method utilizing a 12-mer random peptide library expressed in bacteria coupled with magnetic-based cell sorting and NGS correctly identified >75% of epitope residues on the antigens of two monoclonal antibodies (trastuzumab and bevacizumab). PepSurf, a web-based computational method designed for structural epitope mapping was utilized to compare peptides in libraries enriched for monoclonal antibody (mAb) binders to antigen surfaces (HER2 and VEGF-A). Compared to mimotopes recovered from Sanger sequencing of plated colonies from the same sorting protocol, motifs derived from sets of the NGS data improved epitope prediction as defined by sensitivity and precision, from 18% to 82% and 0.27 to 0.51 for trastuzumab and 47% to 76% and 0.19 to 0.27 for bevacizumab. Specificity was similar for Sanger and NGS, 99% and 97% for trastuzumab and 66% and 67% for bevacizumab. These results indicate that combining peptide library screening with NGS yields epitope motifs that can improve prediction of structural epitopes.
Project description:BACKGROUND: Development of computational tools that can accurately predict presence and location of B-cell epitopes on pathogenic proteins has a valuable application to the field of vaccinology. Because of the highly variable yet enigmatic nature of B-cell epitopes, their prediction presents a great challenge to computational immunologists. METHODS: We propose a method, BEEPro (B-cell epitope prediction by evolutionary information and propensity scales), which adapts a linear averaging scheme on 16 properties using a support vector machine model to predict both linear and conformational B-cell epitopes. These 16 properties include position specific scoring matrix (PSSM), an amino acid ratio scale, and a set of 14 physicochemical scales obtained via a feature selection process. Finally, a three-way data split procedure is used during the validation process to prevent over-estimation of prediction performance and avoid bias in our experiment results. RESULTS: In our experiment, first we use a non-redundant linear B-cell epitope dataset curated by Sollner et al. for feature selection and parameter optimization. Evaluated by a three-way data split procedure, BEEPro achieves significant improvement with the area under the receiver operating curve (AUC) = 0.9987, accuracy = 99.29%, mathew's correlation coefficient (MCC) = 0.9281, sensitivity = 0.9604, specificity = 0.9946, positive predictive value (PPV) = 0.9042 for the Sollner dataset. In addition, the same parameters are used to evaluate performance on other independent linear B-cell epitope test datasets, BEEPro attains an AUC which ranges from 0.9874 to 0.9950 and an accuracy which ranges from 93.73% to 97.31%. Moreover, five-fold cross-validation on one benchmark conformational B-cell epitope dataset yields an accuracy of 92.14% and AUC of 0.9066. CONCLUSIONS: Compared with other current models, our method achieves a significant improvement with respect to AUC, accuracy, MCC, sensitivity, specificity, and PPV. Thus, we have shown that an appropriate combination of evolutionary information and propensity scales with a support vector machine model can significantly enhance the prediction performance of both linear and conformational B-cell epitopes.
Project description:SUMMARY:We present the first tool of gene prediction, PlasGUN, for plasmid metagenomic short-read data. The tool, developed based on deep learning algorithm of multiple input Convolutional Neural Network, demonstrates much better performance when tested on a benchmark dataset of artificial short reads and presents more reliable results for real plasmid metagenomic data than traditional gene prediction tools designed primarily for chromosome-derived short reads. AVAILABILITY AND IMPLEMENTATION:The PlasGUN software is available at http://cqb.pku.edu.cn/ZhuLab/PlasGUN/ or https://github.com/zhenchengfang/PlasGUN/. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.