Dataset Information

Evidence of questionable research practices in clinical prediction models.

ABSTRACT:

Background

Clinical prediction models are widely used in health and medical research. The area under the receiver operating characteristic curve (AUC) is a frequently used estimate to describe the discriminatory ability of a clinical prediction model. The AUC is often interpreted relative to thresholds, with "good" or "excellent" models defined at 0.7, 0.8 or 0.9. These thresholds may create targets that result in "hacking", where researchers are motivated to re-analyse their data until they achieve a "good" result.

Methods

We extracted AUC values from PubMed abstracts to look for evidence of hacking. We used histograms of the AUC values in bins of size 0.01 and compared the observed distribution to a smooth distribution from a spline.

Results

The distribution of 306,888 AUC values showed clear excesses above the thresholds of 0.7, 0.8 and 0.9 and shortfalls below the thresholds.

Conclusions

The AUCs for some models are over-inflated, which risks exposing patients to sub-optimal clinical decision-making. Greater modelling transparency is needed, including published protocols, and data and code sharing.

SUBMITTER: White N

PROVIDER: S-EPMC10478406 | biostudies-literature | 2023 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Evidence of questionable research practices in clinical prediction models.

White Nicole N Parsons Rex R Collins Gary G Barnett Adrian A

BMC medicine 20230904 1

<h4>Background</h4>Clinical prediction models are widely used in health and medical research. The area under the receiver operating characteristic curve (AUC) is a frequently used estimate to describe the discriminatory ability of a clinical prediction model. The AUC is often interpreted relative to thresholds, with "good" or "excellent" models defined at 0.7, 0.8 or 0.9. These thresholds may create targets that result in "hacking", where researchers are motivated to re-analyse their data until ...[more]

PMID: 37667344

Similar Datasets

Project description:INTRODUCTION:Engaging in scientific misconduct and questionable research practices (QRPs) is a noted problem across fields, including health professions education (HPE). To mitigate these practices, other disciplines have enacted strategies based on researcher characteristics and practice factors. Thus, to inform HPE, this study seeks to determine which researcher characteristics and practice factors, if any, might explain the frequency of irresponsible research practices. METHOD:In 2017, a cross-sectional survey of HPE researchers was conducted. The survey included 66 items adapted from three published surveys: two published QRP surveys and a publication pressure scale. The outcome variable was a self-reported misconduct score, which is a weighted mean score for each respondent on all misconduct and QRP items. Statistical analysis included descriptive statistics, reliability and correlation analysis, and multiple linear regression modelling. RESULTS AND DISCUSSION:In total, 590 researchers completed the survey. Results from the final regression model indicated that researcher age had a negative association with the misconduct score (b = -0.01, β = -0.22, t = -2.91, p <0.05), suggesting that older researchers tended to report less misconduct. On the other hand, those with more publications had higher misconduct scores (b = 0.001, β = 0.17, t = 3.27, p < 0.05) and, compared with researchers in the region of North America, researchers in Asia tended to have higher misconduct scores (b = 0.21, β = 0.12, t = 2.84, p < 0.01). In addition, compared with those who defined their work role as clinician, those who defined their role as researcher tended to have higher misconduct scores (b = 0.12, β = 0.13, t = 2.15, p < 0.05). Finally, publication pressure emerged as the strongest individual predictor of misconduct (b = 0.20, β = 0.34, t = 7.82, p < 0.01); the greater the publication pressure, the greater the reported misconduct. Overall, the explanatory variables accounted for 21% of the variance in the misconduct score, with publication pressure accounting for 10% of the variance in the outcome, above and beyond the other explanatory variables. Although correlational, these findings suggest several researcher characteristics and practice factors that could be targeted to address scientific misconduct and QRPs in HPE.

Project description:We describe a method of quantifying the effect of Questionable Research Practices (QRPs) on the results of meta-analyses. As an example we simulated a meta-analysis of a controversial telepathy protocol to assess the extent to which these experimental results could be explained by QRPs. Our simulations used the same numbers of studies and trials as the original meta-analysis and the frequencies with which various QRPs were applied in the simulated experiments were based on surveys of experimental psychologists. Results of both the meta-analysis and simulations were characterized by 4 metrics, two describing the trial and mean experiment hit rates (HR) of around 31%, where 25% is expected by chance, one the correlation between sample-size and hit-rate, and one the complete P-value distribution of the database. A genetic algorithm optimized the parameters describing the QRPs, and the fitness of the simulated meta-analysis was defined as the sum of the squares of Z-scores for the 4 metrics. Assuming no anomalous effect a good fit to the empirical meta-analysis was found only by using QRPs with unrealistic parameter-values. Restricting the parameter space to ranges observed in studies of QRP occurrence, under the untested assumption that parapsychologists use comparable QRPs, the fit to the published Ganzfeld meta-analysis with no anomalous effect was poor. We allowed for a real anomalous effect, be it unidentified QRPs or a paranormal effect, where the HR ranged from 25% (chance) to 31%. With an anomalous HR of 27% the fitness became F = 1.8 (p = 0.47 where F = 0 is a perfect fit). We conclude that the very significant probability cited by the Ganzfeld meta-analysis is likely inflated by QRPs, though results are still significant (p = 0.003) with QRPs. Our study demonstrates that quantitative simulations of QRPs can assess their impact. Since meta-analyses in general might be polluted by QRPs, this method has wide applicability outside the domain of experimental parapsychology.

Dataset Information

Evidence of questionable research practices in clinical prediction models.

Background

Methods

Results

Conclusions

Publications

Evidence of questionable research practices in clinical prediction models.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets