Dataset Information

Model averaging in calibration of near-infrared instruments with correlated high-dimensional data

ABSTRACT:

SUBMITTER: Salaki D

PROVIDER: S-EPMC10810656 | biostudies-literature | 2022 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:BackgroundMicroarray technology is increasingly used to identify potential biomarkers for cancer prognostics and diagnostics. Previously, we have developed the iterative Bayesian Model Averaging (BMA) algorithm for use in classification. Here, we extend the iterative BMA algorithm for application to survival analysis on high-dimensional microarray data. The main goal in applying survival analysis to microarray data is to determine a highly predictive model of patients' time to event (such as death, relapse, or metastasis) using a small number of selected genes. Our multivariate procedure combines the effectiveness of multiple contending models by calculating the weighted average of their posterior probability distributions. Our results demonstrate that our iterative BMA algorithm for survival analysis achieves high prediction accuracy while consistently selecting a small and cost-effective number of predictor genes.ResultsWe applied the iterative BMA algorithm to two cancer datasets: breast cancer and diffuse large B-cell lymphoma (DLBCL) data. On the breast cancer data, the algorithm selected a total of 15 predictor genes across 84 contending models from the training data. The maximum likelihood estimates of the selected genes and the posterior probabilities of the selected models from the training data were used to divide patients in the test (or validation) dataset into high- and low-risk categories. Using the genes and models determined from the training data, we assigned patients from the test data into highly distinct risk groups (as indicated by a p-value of 7.26e-05 from the log-rank test). Moreover, we achieved comparable results using only the 5 top selected genes with 100% posterior probabilities. On the DLBCL data, our iterative BMA procedure selected a total of 25 genes across 3 contending models from the training data. Once again, we assigned the patients in the validation set to significantly distinct risk groups (p-value = 0.00139).ConclusionThe strength of the iterative BMA algorithm for survival analysis lies in its ability to account for model uncertainty. The results from this study demonstrate that our procedure selects a small number of genes while eclipsing other methods in predictive performance, making it a highly accurate and cost-effective prognostic tool in the clinical setting.

Project description:Optimal management of free-ranging herbivores requires the accurate assessment of an animal's nutritional status. For this purpose 'near-infrared reflectance spectroscopy' (NIRS) is very useful, especially when nutritional assessment is done through faecal indicators such as faecal nitrogen (FN). In order to perform an NIRS calibration, the default protocol recommends starting by generating an initial equation based on at least 50-75 samples from the given species. Although this protocol optimises prediction accuracy, it limits the use of NIRS with rare or endangered species where sample sizes are often small. To overcome this limitation we tested a single NIRS equation (i.e., multispecies calibration) to predict FN in herbivores. Firstly, we used five herbivore species with highly contrasting digestive physiologies to build monospecies and multispecies calibrations, namely horse, sheep, Pyrenean chamois, red deer and European rabbit. Secondly, the equation accuracy was evaluated by two procedures using: (1) an external validation with samples from the same species, which were not used in the calibration process; and (2) samples from different ungulate species, specifically Alpine ibex, domestic goat, European mouflon, roe deer and cattle. The multispecies equation was highly accurate in terms of the coefficient of determination for calibration R2 = 0.98, standard error of validation SECV = 0.10, standard error of external validation SEP = 0.12, ratio of performance to deviation RPD = 5.3, and range error of prediction RER = 28.4. The accuracy of the multispecies equation to predict other herbivore species was also satisfactory (R2 > 0.86, SEP < 0.27, RPD > 2.6, and RER > 8.1). Lastly, the agreement between multi- and monospecies calibrations was also confirmed by the Bland-Altman method. In conclusion, our single multispecies equation can be used as a reliable, cost-effective, easy and powerful analytical method to assess FN in a wide range of herbivore species.

Dataset Information

Model averaging in calibration of near-infrared instruments with correlated high-dimensional data

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets