Questionable research practices may have little effect on replicability.
ABSTRACT: This article examines why many studies fail to replicate statistically significant published results. We address this issue within a general statistical framework that also allows us to include various questionable research practices (QRPs) that are thought to reduce replicability. The analyses indicate that the base rate of true effects is the major factor that determines the replication rate of scientific results. Specifically, for purely statistical reasons, replicability is low in research domains where true effects are rare (e.g., search for effective drugs in pharmacology). This point is under-appreciated in current scientific and media discussions of replicability, which often attribute poor replicability mainly to QRPs.
Project description:BACKGROUND:Health Services Research findings (HSR) reported in scientific publications may become part of the decision-making process on healthcare. This study aimed to explore associations between researcher's individual, institutional, and scientific environment factors and the occurrence of questionable research practices (QRPs) in the reporting of messages and conclusions in scientific HSR publications. METHODS:We employed a mixed-methods study design. We identified factors possibly contributing to QRPs in the reporting of messages and conclusions through a literature review, 14 semi-structured interviews with HSR institutional leaders, and 13 focus-groups amongst researchers. A survey corresponding with these factors was developed and shared with 172 authors of 116 scientific HSR publications produced by Dutch research institutes in 2016. We assessed the included publications for the occurrence of QRPs. An exploratory factor analysis was conducted to identify factors within individual, institutional, and environmental domains. Next, we conducted bivariate analyses using simple Poisson regression to explore factors' association with the number of QRPs in the assessed HSR publications. Factors related to QRPs with a p-value < .30 were included in four multivariate models tested through a multiple Poisson regression. RESULTS:In total, 78 (45%) participants completed the survey (51.3% first authors and 48.7% last authors). Twelve factors were included in the multivariate analyses. In all four multivariate models, a higher score of "pressure to create societal impact" (Exp B?=?1.28, 95% CI [1.11, 1.47]), was associated with higher number of QRPs. Higher scores on "specific training" (Exp B?=?0.85, 95% CI [0.77-0.94]) and "co-author conflict of interest" (Exp B?=?0.85, 95% CI [0.75-0.97]) factors were associated with a lower number of QRPs. Stratification between first and last authors indicated different factors were related to the occurrence of QRPs for these groups. CONCLUSION:Experienced pressure to create societal impact is associated with more QRPs in the reporting of messages and conclusions in HSR publications. Specific training in reporting messages and conclusions and awareness of co-author conflict of interests are related to fewer QRPs. Our results should stimulate awareness within the field of HSR internationally on opportunities to better support reporting in scientific HSR publications.
Project description:<h4>Objectives</h4>Explore the occurrence and nature of questionable research practices (QRPs) in the reporting of messages and conclusions in international scientific Health Services Research (HSR) publications authored by researchers from HSR institutions in the Netherlands.<h4>Design</h4>In a joint effort to assure the overall quality of HSR publications in the Netherlands, 13 HSR institutions in the Netherlands participated in this study. Together with these institutions, we constructed and validated an assessment instrument covering 35 possible QRPs in the reporting of messages and conclusions. Two reviewers independently assessed a random sample of 116 HSR articles authored by researchers from these institutions published in international peer-reviewed scientific journals in 2016.<h4>Setting</h4>Netherlands, 2016.<h4>Sample</h4>116 international peer-reviewed HSR publications.<h4>Main outcome measures</h4>Median number of QRPs per publication, the percentage of publications with observed QRP frequencies, occurrence of specific QRPs and difference in total number of QRPs by methodological approach, type of research and study design.<h4>Results</h4>We identified a median of six QRPs per publication out of 35 possible QRPs. QRPs occurred most frequently in the reporting of implications for practice, recommendations for practice, contradictory evidence, study limitations and conclusions based on the results and in the context of the literature. We identified no differences in total number of QRPs in papers based on different methodological approach, type of research or study design.<h4>Conclusions</h4>Given the applied nature of HSR, both the severity of the identified QRPs, and the recommendations for policy and practice in HSR publications warrant discussion. We recommend that the HSR field further define and establish its own scientific norms in publication practices to improve scientific reporting and strengthen the impact of HSR. The results of our study can serve as an empirical basis for continuous critical reflection on the reporting of messages and conclusions.
Project description:We surveyed 807 researchers (494 ecologists and 313 evolutionary biologists) about their use of Questionable Research Practices (QRPs), including cherry picking statistically significant results, p hacking, and hypothesising after the results are known (HARKing). We also asked them to estimate the proportion of their colleagues that use each of these QRPs. Several of the QRPs were prevalent within the ecology and evolution research community. Across the two groups, we found 64% of surveyed researchers reported they had at least once failed to report results because they were not statistically significant (cherry picking); 42% had collected more data after inspecting whether results were statistically significant (a form of p hacking) and 51% had reported an unexpected finding as though it had been hypothesised from the start (HARKing). Such practices have been directly implicated in the low rates of reproducible results uncovered by recent large scale replication studies in psychology and other disciplines. The rates of QRPs found in this study are comparable with the rates seen in psychology, indicating that the reproducibility problems discovered in psychology are also likely to be present in ecology and evolution.
Project description:INTRODUCTION:Engaging in scientific misconduct and questionable research practices (QRPs) is a noted problem across fields, including health professions education (HPE). To mitigate these practices, other disciplines have enacted strategies based on researcher characteristics and practice factors. Thus, to inform HPE, this study seeks to determine which researcher characteristics and practice factors, if any, might explain the frequency of irresponsible research practices. METHOD:In 2017, a cross-sectional survey of HPE researchers was conducted. The survey included 66 items adapted from three published surveys: two published QRP surveys and a publication pressure scale. The outcome variable was a self-reported misconduct score, which is a weighted mean score for each respondent on all misconduct and QRP items. Statistical analysis included descriptive statistics, reliability and correlation analysis, and multiple linear regression modelling. RESULTS AND DISCUSSION:In total, 590 researchers completed the survey. Results from the final regression model indicated that researcher age had a negative association with the misconduct score (b?=?-0.01, ??=?-0.22, t?=?-2.91, p?<0.05), suggesting that older researchers tended to report less misconduct. On the other hand, those with more publications had higher misconduct scores (b?=?0.001, ??=?0.17, t?=?3.27, p?<?0.05) and, compared with researchers in the region of North America, researchers in Asia tended to have higher misconduct scores (b?=?0.21, ??=?0.12, t?=?2.84, p?<?0.01). In addition, compared with those who defined their work role as clinician, those who defined their role as researcher tended to have higher misconduct scores (b?=?0.12, ??=?0.13, t?=?2.15, p?<?0.05). Finally, publication pressure emerged as the strongest individual predictor of misconduct (b?=?0.20, ??=?0.34, t?=?7.82, p?<?0.01); the greater the publication pressure, the greater the reported misconduct. Overall, the explanatory variables accounted for 21% of the variance in the misconduct score, with publication pressure accounting for 10% of the variance in the outcome, above and beyond the other explanatory variables. Although correlational, these findings suggest several researcher characteristics and practice factors that could be targeted to address scientific misconduct and QRPs in HPE.
Project description:The replicability of research findings has recently been disputed across multiple scientific disciplines. In constructive reaction, the research culture in psychology is facing fundamental changes, but investigations of research practices that led to these improvements have almost exclusively focused on academic researchers. By contrast, we investigated the statistical reporting quality and selected indicators of questionable research practices (QRPs) in psychology students' master's theses. In a total of 250 theses, we investigated utilization and magnitude of standardized effect sizes, along with statistical power, the consistency and completeness of reported results, and possible indications of p-hacking and further testing. Effect sizes were reported for 36% of focal tests (median r = 0.19), and only a single formal power analysis was reported for sample size determination (median observed power 1 - ? = 0.67). Statcheck revealed inconsistent p-values in 18% of cases, while 2% led to decision errors. There were no clear indications of p-hacking or further testing. We discuss our findings in the light of promoting open science standards in teaching and student supervision.
Project description:We describe a method of quantifying the effect of Questionable Research Practices (QRPs) on the results of meta-analyses. As an example we simulated a meta-analysis of a controversial telepathy protocol to assess the extent to which these experimental results could be explained by QRPs. Our simulations used the same numbers of studies and trials as the original meta-analysis and the frequencies with which various QRPs were applied in the simulated experiments were based on surveys of experimental psychologists. Results of both the meta-analysis and simulations were characterized by 4 metrics, two describing the trial and mean experiment hit rates (HR) of around 31%, where 25% is expected by chance, one the correlation between sample-size and hit-rate, and one the complete P-value distribution of the database. A genetic algorithm optimized the parameters describing the QRPs, and the fitness of the simulated meta-analysis was defined as the sum of the squares of Z-scores for the 4 metrics. Assuming no anomalous effect a good fit to the empirical meta-analysis was found only by using QRPs with unrealistic parameter-values. Restricting the parameter space to ranges observed in studies of QRP occurrence, under the untested assumption that parapsychologists use comparable QRPs, the fit to the published Ganzfeld meta-analysis with no anomalous effect was poor. We allowed for a real anomalous effect, be it unidentified QRPs or a paranormal effect, where the HR ranged from 25% (chance) to 31%. With an anomalous HR of 27% the fitness became F = 1.8 (p = 0.47 where F = 0 is a perfect fit). We conclude that the very significant probability cited by the Ganzfeld meta-analysis is likely inflated by QRPs, though results are still significant (p = 0.003) with QRPs. Our study demonstrates that quantitative simulations of QRPs can assess their impact. Since meta-analyses in general might be polluted by QRPs, this method has wide applicability outside the domain of experimental parapsychology.
Project description:Although questionable research practices (QRPs) and p-hacking have received attention in recent years, little research has focused on their prevalence and acceptance in students. Students are the researchers of the future and will represent the field in the future. Therefore, they should not be learning to use and accept QRPs, which would reduce their ability to produce and evaluate meaningful research. 207 psychology students and fresh graduates provided self-report data on the prevalence and predictors of QRPs. Attitudes towards QRPs, belief that significant results constitute better science or lead to better grades, motivation, and stress levels were predictors. Furthermore, we assessed perceived supervisor attitudes towards QRPs as an important predictive factor. The results were in line with estimates of QRP prevalence from academia. The best predictor of QRP use was students' QRP attitudes. Perceived supervisor attitudes exerted both a direct and indirect effect via student attitudes. Motivation to write a good thesis was a protective factor, whereas stress had no effect. Students in this sample did not subscribe to beliefs that significant results were better for science or their grades. Such beliefs further did not impact QRP attitudes or use in this sample. Finally, students engaged in more QRPs pertaining to reporting and analysis than those pertaining to study design. We conclude that supervisors have an important function in shaping students' attitudes towards QRPs and can improve their research practices by motivating them well. Furthermore, this research provides some impetus towards identifying predictors of QRP use in academia.
Project description:Introduction:In this study, we tested a simple, active "ethical consistency" intervention aimed at reducing researchers' endorsement of questionable research practices (QRPs). Methods:We developed a simple, active ethical consistency intervention and tested it against a control using an established QRP survey instrument. Before responding to a survey that asked about attitudes towards each of fifteen QRPs, participants were randomly assigned to either a consistency or control 3-5-min writing task. A total of 201 participants completed the survey: 121 participants were recruited from a database of currently funded NSF/NIH scientists, and 80 participants were recruited from a pool of active researchers at a large university medical center in the southeastern US. Narrative responses to the writing prompts were coded and analyzed to assist post hoc interpretation of the quantitative data. Results:We hypothesized that participants in the consistency condition would find ethically ambiguous QRPs less defensible and would indicate less willingness to engage in them than participants in the control condition. The results showed that the consistency intervention had no significant effect on respondents' reactions regarding the defensibility of the QRPs or their willingness to engage in them. Exploratory analyses considering the narrative themes of participants' responses indicated that participants in the control condition expressed lower perceptions of QRP defensibility and willingness. Conclusion:The results did not support the main hypothesis, and the consistency intervention may have had the unwanted effect of inducing increased rationalization. These results may partially explain why RCR courses often seem to have little positive effect.
Project description:A survey in the United States revealed that an alarmingly large percentage of university psychologists admitted having used questionable research practices that can contaminate the research literature with false positive and biased findings. We conducted a replication of this study among Italian research psychologists to investigate whether these findings generalize to other countries. All the original materials were translated into Italian, and members of the Italian Association of Psychology were invited to participate via an online survey. The percentages of Italian psychologists who admitted to having used ten questionable research practices were similar to the results obtained in the United States although there were small but significant differences in self-admission rates for some QRPs. Nearly all researchers (88%) admitted using at least one of the practices, and researchers generally considered a practice possibly defensible if they admitted using it, but Italian researchers were much less likely than US researchers to consider a practice defensible. Participants' estimates of the percentage of researchers who have used these practices were greater than the self-admission rates, and participants estimated that researchers would be unlikely to admit it. In written responses, participants argued that some of these practices are not questionable and they have used some practices because reviewers and journals demand it. The similarity of results obtained in the United States, this study, and a related study conducted in Germany suggest that adoption of these practices is an international phenomenon and is likely due to systemic features of the international research and publication processes.
Project description:A recent study of the replicability of key psychological findings is a major contribution toward understanding the human side of the scientific process. Despite the careful and nuanced analysis reported, the simple narrative disseminated by the mass, social, and scientific media was that in only 36% of the studies were the original results replicated. In the current study, however, we showed that 77% of the replication effect sizes reported were within a 95% prediction interval calculated using the original effect size. Our analysis suggests two critical issues in understanding replication of psychological studies. First, researchers' intuitive expectations for what a replication should show do not always match with statistical estimates of replication. Second, when the results of original studies are very imprecise, they create wide prediction intervals-and a broad range of replication effects that are consistent with the original estimates. This may lead to effects that replicate successfully, in that replication results are consistent with statistical expectations, but do not provide much information about the size (or existence) of the true effect. In this light, the results of the Reproducibility Project: Psychology can be viewed as statistically consistent with what one might expect when performing a large-scale replication experiment.