Dataset Information

Comprehensive assessment of computational algorithms in predicting cancer driver mutations.

ABSTRACT:

Background

The initiation and subsequent evolution of cancer are largely driven by a relatively small number of somatic mutations with critical functional impacts, so-called driver mutations. Identifying driver mutations in a patient's tumor cells is a central task in the era of precision cancer medicine. Over the decade, many computational algorithms have been developed to predict the effects of missense single-nucleotide variants, and they are frequently employed to prioritize mutation candidates. These algorithms employ diverse molecular features to build predictive models, and while some algorithms are cancer-specific, others are not. However, the relative performance of these algorithms has not been rigorously assessed.

Results

We construct five complementary benchmark datasets: mutation clustering patterns in the protein 3D structures, literature annotation based on OncoKB, TP53 mutations based on their effects on target-gene transactivation, effects of cancer mutations on tumor formation in xenograft experiments, and functional annotation based on in vitro cell viability assays we developed including a new dataset of ~ 200 mutations. We evaluate the performance of 33 algorithms and found that CHASM, CTAT-cancer, DEOGEN2, and PrimateAI show consistently better performance than the other algorithms. Moreover, cancer-specific algorithms show much better performance than those designed for a general purpose.

Conclusions

Our study is a comprehensive assessment of the performance of different algorithms in predicting cancer driver mutations and provides deep insights into the best practice of computationally prioritizing cancer mutation candidates for end-users and for the future development of new algorithms.

SUBMITTER: Chen H

PROVIDER: S-EPMC7033911 | biostudies-literature | 2020 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Comprehensive assessment of computational algorithms in predicting cancer driver mutations.

Chen Hu H Li Jun J Wang Yumeng Y Ng Patrick Kwok-Shing PK Tsang Yiu Huen YH Shaw Kenna R KR Mills Gordon B GB Liang Han H

Genome biology 20200220 1

<h4>Background</h4>The initiation and subsequent evolution of cancer are largely driven by a relatively small number of somatic mutations with critical functional impacts, so-called driver mutations. Identifying driver mutations in a patient's tumor cells is a central task in the era of precision cancer medicine. Over the decade, many computational algorithms have been developed to predict the effects of missense single-nucleotide variants, and they are frequently employed to prioritize mutation ...[more]

PMID: 32079540

Similar Datasets

Project description:BackgroundRecent advances in sequencing technologies have greatly increased the identification of mutations in cancer genomes. However, it remains a significant challenge to identify cancer-driving mutations, since most observed missense changes are neutral passenger mutations. Various computational methods have been developed to predict the effects of amino acid substitutions on protein function and classify mutations as deleterious or benign. These include approaches that rely on evolutionary conservation, structural constraints, or physicochemical attributes of amino acid substitutions. Here we review existing methods and further examine eight tools: SIFT, PolyPhen2, Condel, CHASM, mCluster, logRE, SNAP, and MutationAssessor, with respect to their coverage, accuracy, availability and dependence on other tools.ResultsSingle nucleotide polymorphisms with high minor allele frequencies were used as a negative (neutral) set for testing, and recurrent mutations from the COSMIC database as well as novel recurrent somatic mutations identified in very recent cancer studies were used as positive (non-neutral) sets. Conservation-based methods generally had moderately high accuracy in distinguishing neutral from deleterious mutations, whereas the performance of machine learning based predictors with comprehensive feature spaces varied between assessments using different positive sets. MutationAssessor consistently provided the highest accuracies. For certain combinations metapredictors slightly improved the performance of included individual methods, but did not outperform MutationAssessor as stand-alone tool.ConclusionsOur independent assessment of existing tools reveals various performance disparities. Cancer-trained methods did not improve upon more general predictors. No method or combination of methods exceeds 81% accuracy, indicating there is still significant room for improvement for driver mutation prediction, and perhaps more sophisticated feature integration is needed to develop a more robust tool.

Dataset Information

Comprehensive assessment of computational algorithms in predicting cancer driver mutations.

Background

Results

Conclusions

Publications

Comprehensive assessment of computational algorithms in predicting cancer driver mutations.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets