Dataset Information

Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests.

ABSTRACT:

SUBMITTER: Vickers AJ

PROVIDER: S-EPMC4724785 | biostudies-literature | 2016 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests.

Vickers Andrew J AJ Van Calster Ben B Steyerberg Ewout W EW

BMJ (Clinical research ed.) 20160125

PMID: 26810254

Similar Datasets

Project description:Microbiome data are becoming increasingly available in large health cohorts, yet metabolomics data are still scant. While many studies generate microbiome data, they lack matched metabolomics data or have considerable missing proportions of metabolites. Since metabolomics is key to understanding microbial and general biological activities, the possibility of imputing individual metabolites or inferring metabolomics pathways from microbial taxonomy or metagenomics is intriguing. Importantly, current metabolomics profiling methods such as the HMP Unified Metabolic Analysis Network (HUMAnN) have unknown accuracy and are limited in their ability to predict individual metabolites. To address this gap, we developed a novel metabolite prediction method, and we present its application and evaluation in an oral microbiome study. The new method for predicting metabolites using microbiome data (ENVIM) is based on the elastic net model (ENM). ENVIM introduces an extra step to ENM to consider variable importance (VI) scores, and thus, achieves better prediction power. We investigate the metabolite prediction performance of ENVIM using metagenomic and metatranscriptomic data in a supragingival biofilm multi-omics dataset of 289 children ages 3-5 who were participants of a community-based study of early childhood oral health (ZOE 2.0) in North Carolina, United States. We further validate ENVIM in two additional publicly available multi-omics datasets generated from studies of gut health. We select gene family sets based on variable importance scores and modify the existing ENM strategy used in the MelonnPan prediction software to accommodate the unique features of microbiome and metabolome data. We evaluate metagenomic and metatranscriptomic predictors and compare the prediction performance of ENVIM to the standard ENM employed in MelonnPan. The newly developed ENVIM method showed superior metabolite predictive accuracy than MelonnPan when trained with metatranscriptomics data only, metagenomics data only, or both. Better metabolite prediction is achieved in the gut microbiome compared with the oral microbiome setting. We report the best-predictable compounds in all these three datasets from two different body sites. For example, the metabolites trehalose, maltose, stachyose, and ribose are all well predicted by the supragingival microbiome.

Project description:BackgroundMachine learning (ML) can be an effective tool to extract information from attribute-rich molecular datasets for the generation of molecular diagnostic tests. However, the way in which the resulting scores or classifications are produced from the input data may not be transparent. Algorithmic explainability or interpretability has become a focus of ML research. Shapley values, first introduced in game theory, can provide explanations of the result generated from a specific set of input data by a complex ML algorithm.MethodsFor a multivariate molecular diagnostic test in clinical use (the VeriStrat® test), we calculate and discuss the interpretation of exact Shapley values. We also employ some standard approximation techniques for Shapley value computation (local interpretable model-agnostic explanation (LIME) and Shapley Additive Explanations (SHAP) based methods) and compare the results with exact Shapley values.ResultsExact Shapley values calculated for data collected from a cohort of 256 patients showed that the relative importance of attributes for test classification varied by sample. While all eight features used in the VeriStrat® test contributed equally to classification for some samples, other samples showed more complex patterns of attribute importance for classification generation. Exact Shapley values and Shapley-based interaction metrics were able to provide interpretable classification explanations at the sample or patient level, while patient subgroups could be defined by comparing Shapley value profiles between patients. LIME and SHAP approximation approaches, even those seeking to include correlations between attributes, produced results that were quantitatively and, in some cases qualitatively, different from the exact Shapley values.ConclusionsShapley values can be used to determine the relative importance of input attributes to the result generated by a multivariate molecular diagnostic test for an individual sample or patient. Patient subgroups defined by Shapley value profiles may motivate translational research. However, correlations inherent in molecular data and the typically small ML training sets available for molecular diagnostic test development may cause some approximation methods to produce approximate Shapley values that differ both qualitatively and quantitatively from exact Shapley values. Hence, caution is advised when using approximate methods to evaluate Shapley explanations of the results of molecular diagnostic tests.

Project description:BackgroundThe World Health Organization (WHO) has targeted a reduction in viral hepatitis-related mortality by 65% and incidence by 90% by 2030, necessitating enhanced hepatitis B treatment and prevention programmes in low- and middle-income countries. Hepatitis B e antigen (HBeAg) status is used in the assessment of eligibility for antiviral treatment and for prevention of mother-to-child transmission (PMTCT). Accordingly, the WHO has classified HBeAg rapid diagnostic tests (RDTs) as essential medical devices.MethodsWe assessed the performance characteristics of three commercially available HBeAg RDTs (SD Bioline, Alere, South Africa; Creative Diagnostics, USA; and Biopanda Reagents, UK) in two hepatitis B surface antigen-positive cohorts in Blantyre, Malawi: participants of a community study (n = 100) and hospitalised patients with cirrhosis or hepatocellular carcinoma (n = 94). Two investigators, blinded to the reference test result, independently assessed each assay. We used an enzyme-linked immunoassay (Monolisa HBeAg, Bio-Rad, France) as a reference test and quantified HBeAg concentration using dilutions of the WHO HBeAg standard. We related the findings to HBV DNA levels, and evaluated treatment eligibility using the TREAT-B score.ResultsAmong 194 HBsAg positive patients, median age was 37 years, 42% were femaleand 26% were HIV co-infected. HBeAg prevalence was 47/194 (24%). The three RDTs showed diagnostic sensitivity of 28% (95% CI 16-43), 53% (38-68) and 72% (57-84) and specificity of 96-100% for detection of HBeAg. Overall inter-rater agreement κ statistic was high at 0.9-1.0. Sensitivity for identifying patients at the threshold where antiviral treatment is recommended for PMTCT, with HBV DNA > 200,000 IU/ml (39/194; 20%), was 22, 49 and 54% respectively. Using the RDTs in place of the reference HBeAg assay resulted in 3/43 (9%), 5/43 (12%) and 8/43 (19%) of patients meeting the TREAT-B treatment criteria being misclassified as ineligible for treatment. A relationship between HBeAg concentration and HBeAg detection by RDT was observed. A minimum HBeAg concentration of 2.2-3.1 log10IU/ml was required to yield a reactive RDT.ConclusionsCommercially available HBeAg RDTs lack sufficient sensitivity to accurately classify hepatitis B patients in Malawi. This has implications for hepatitis B public health programs in sub-Saharan Africa. Alternative diagnostic assays are recommended.

Dataset Information

Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests.

Publications

Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets