Dataset Information

The rise of data repositories in materials chemistry.

ABSTRACT:

SUBMITTER: Stracke K

PROVIDER: S-EPMC10959999 | biostudies-literature | 2024 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

The rise of data repositories in materials chemistry.

Stracke Konstantin K Evans Jack D JD

Communications chemistry 20240322 1

PMID: 38519628

Similar Datasets

Project description:BackgroundClinical data repositories (CDR) have great potential to improve outcome prediction and risk modeling. However, most clinical studies require careful study design, dedicated data collection efforts, and sophisticated modeling techniques before a hypothesis can be tested. We aim to bridge this gap, so that clinical domain users can perform first-hand prediction on existing repository data without complicated handling, and obtain insightful patterns of imbalanced targets for a formal study before it is conducted. We specifically target for interpretability for domain users where the model can be conveniently explained and applied in clinical practice.MethodsWe propose an interpretable pattern model which is noise (missing) tolerant for practice data. To address the challenge of imbalanced targets of interest in clinical research, e.g., deaths less than a few percent, the geometric mean of sensitivity and specificity (G-mean) optimization criterion is employed, with which a simple but effective heuristic algorithm is developed.ResultsWe compared pattern discovery to clinically interpretable methods on two retrospective clinical datasets. They contain 14.9% deaths in 1 year in the thoracic dataset and 9.1% deaths in the cardiac dataset, respectively. In spite of the imbalance challenge shown on other methods, pattern discovery consistently shows competitive cross-validated prediction performance. Compared to logistic regression, Naïve Bayes, and decision tree, pattern discovery achieves statistically significant (p-values < 0.01, Wilcoxon signed rank test) favorable averaged testing G-means and F1-scores (harmonic mean of precision and sensitivity). Without requiring sophisticated technical processing of data and tweaking, the prediction performance of pattern discovery is consistently comparable to the best achievable performance.ConclusionsPattern discovery has demonstrated to be robust and valuable for target prediction on existing clinical data repositories with imbalance and noise. The prediction results and interpretable patterns can provide insights in an agile and inexpensive way for the potential formal studies.

Dataset Information

The rise of data repositories in materials chemistry.

Publications

The rise of data repositories in materials chemistry.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets