Bias in the reporting of sex and age in biomedical research on mouse models.
ABSTRACT: In animal-based biomedical research, both the sex and the age of the animals studied affect disease phenotypes by modifying their susceptibility, presentation and response to treatment. The accurate reporting of experimental methods and materials, including the sex and age of animals, is essential so that other researchers can build on the results of such studies. Here we use text mining to study 15,311 research papers in which mice were the focus of the study. We find that the percentage of papers reporting the sex and age of mice has increased over the past two decades: however, only about 50% of the papers published in 2014 reported these two variables. We also compared the quality of reporting in six preclinical research areas and found evidence for different levels of sex-bias in these areas: the strongest male-bias was observed in cardiovascular disease models and the strongest female-bias was found in infectious disease models. These results demonstrate the ability of text mining to contribute to the ongoing debate about the reproducibility of research, and confirm the need to continue efforts to improve the reporting of experimental methods and materials.
Project description:Observer bias and other "experimenter effects" occur when researchers' expectations influence study outcome. These biases are strongest when researchers expect a particular result, are measuring subjective variables, and have an incentive to produce data that confirm predictions. To minimize bias, it is good practice to work "blind," meaning that experimenters are unaware of the identity or treatment group of their subjects while conducting research. Here, using text mining and a literature review, we find evidence that blind protocols are uncommon in the life sciences and that nonblind studies tend to report higher effect sizes and more significant p-values. We discuss methods to minimize bias and urge researchers, editors, and peer reviewers to keep blind protocols in mind.
Project description:<h4>Background</h4>Electronic medical records (EMRs) are revolutionizing health-related research. One key issue for study quality is the accurate identification of patients with the condition of interest. Information in EMRs can be entered as structured codes or unstructured free text. The majority of research studies have used only coded parts of EMRs for case-detection, which may bias findings, miss cases, and reduce study quality. This review examines whether incorporating information from text into case-detection algorithms can improve research quality.<h4>Methods</h4>A systematic search returned 9659 papers, 67 of which reported on the extraction of information from free text of EMRs with the stated purpose of detecting cases of a named clinical condition. Methods for extracting information from text and the technical accuracy of case-detection algorithms were reviewed.<h4>Results</h4>Studies mainly used US hospital-based EMRs, and extracted information from text for 41 conditions using keyword searches, rule-based algorithms, and machine learning methods. There was no clear difference in case-detection algorithm accuracy between rule-based and machine learning methods of extraction. Inclusion of information from text resulted in a significant improvement in algorithm sensitivity and area under the receiver operating characteristic in comparison to codes alone (median sensitivity 78% (codes?+?text) vs 62% (codes), P?=?.03; median area under the receiver operating characteristic 95% (codes?+?text) vs 88% (codes), P?=?.025).<h4>Conclusions</h4>Text in EMRs is accessible, especially with open source information extraction algorithms, and significantly improves case detection when combined with codes. More harmonization of reporting within EMR studies is needed, particularly standardized reporting of algorithm accuracy metrics like positive predictive value (precision) and sensitivity (recall).
Project description:BACKGROUND:Blood chemicals are routinely measured in clinical or preclinical research studies to diagnose diseases, assess risks in epidemiological research, or use metabolomic phenotyping in response to treatments. A vast volume of blood-related literature is available via the PubMed database for data mining. OBJECTIVES:We aimed to generate a comprehensive blood exposome database of endogenous and exogenous chemicals associated with the mammalian circulating system through text mining and database fusion. METHODS:Using NCBI resources, we retrieved PubMed abstracts, PubChem chemical synonyms, and PMC supplementary tables. We then employed text mining and PubChem crowdsourcing to associate phrases relating to blood with PubChem chemicals. False positives were removed by a phrase pattern and a compound exclusion list. RESULTS:A query to identify blood-related publications in the PubMed database yielded 1.1 million papers. Matching a total of 15 million synonyms from 6.5 million relevant PubChem chemicals against all blood-related publications yielded 37,514 chemicals and 851,999 publications records. Mapping PubChem compound identifiers to the PubMed database yielded 49,940 unique chemicals linked to 676,643 papers. Analysis of open-access metabolomics papers related to blood phrases in the PMC database yielded 4,039 unique compounds and 204 papers. Consolidating these three approaches summed up to a total of 41,474 achiral structures that were linked to 65,957 PubChem CIDs and to over 878,966 PubMed articles. We mapped these compounds to 50 databases such as those covering metabolites and pathways, governmental and toxicological databases, pharmacology resources, and bioassay repositories. In comparison, HMDB, the Human Metabolome Database, links 1,075 compounds to blood-related primary publications. CONCLUSION:This new Blood Exposome Database can be used for prioritizing chemicals for systematic reviews, developing target assays in exposome research, identifying compounds in untargeted mass spectrometry, and biological interpretation in metabolomics data. The database is available at http://bloodexposome.org. https://doi.org/10.1289/EHP4713.
Project description:Risk-of-bias assessments are now a standard component of systematic reviews. At present, reviewers need to manually identify relevant parts of research articles for a set of methodological elements that affect the risk of bias, in order to make a risk-of-bias judgement for each of these elements. We investigate the use of text mining methods to automate risk-of-bias assessments in systematic reviews. We aim to identify relevant sentences within the text of included articles, to rank articles by risk of bias and to reduce the number of risk-of-bias assessments that the reviewers need to perform by hand.We use supervised machine learning to train two types of models, for each of the three risk-of-bias properties of sequence generation, allocation concealment and blinding. The first model predicts whether a sentence in a research article contains relevant information. The second model predicts a risk-of-bias value for each research article. We use logistic regression, where each independent variable is the frequency of a word in a sentence or article, respectively.We found that sentences can be successfully ranked by relevance with area under the receiver operating characteristic (ROC) curve (AUC)?> 0.98. Articles can be ranked by risk of bias with AUC > 0.72. We estimate that more than 33% of articles can be assessed by just one reviewer, where two reviewers are normally required.We show that text mining can be used to assist risk-of-bias assessments.
Project description:An estimated quarter of a trillion US dollars is invested in the biomedical research enterprise annually. There is growing alarm that a significant portion of this investment is wasted because of problems in reproducibility of research findings and in the rigor and integrity of research conduct and reporting. Recent years have seen a flurry of activities focusing on standardization and guideline development to enhance the reproducibility and rigor of biomedical research. Research activity is primarily communicated via textual artifacts, ranging from grant applications to journal publications. These artifacts can be both the source and the manifestation of practices leading to research waste. For example, an article may describe a poorly designed experiment, or the authors may reach conclusions not supported by the evidence presented. In this article, we pose the question of whether biomedical text mining techniques can assist the stakeholders in the biomedical research enterprise in doing their part toward enhancing research integrity and rigor. In particular, we identify four key areas in which text mining techniques can make a significant contribution: plagiarism/fraud detection, ensuring adherence to reporting guidelines, managing information overload and accurate citation/enhanced bibliometrics. We review the existing methods and tools for specific tasks, if they exist, or discuss relevant research that can provide guidance for future work. With the exponential increase in biomedical research output and the ability of text mining approaches to perform automatic tasks at large scale, we propose that such approaches can support tools that promote responsible research practices, providing significant benefits for the biomedical research enterprise.
Project description:Background: There is a growing research focus on temporal cognition, due to its importance in memory and planning, and links with psychological wellbeing. Researchers are increasingly using diary studies, experience sampling and social media data to study temporal thought. However, it remains unclear whether such reports can be accurately interpreted for temporal orientation. In this study, temporal orientation judgements about text reports of thoughts were compared across human coding, automatic text mining, and participant self-report. Methods: 214 participants responded to randomly timed text message prompts, categorically reporting the temporal direction of their thoughts and describing the content of their thoughts, producing a corpus of 2505 brief (1-358, M = 43 characters) descriptions. Two researchers independently, blindly coded temporal orientation of the descriptions. Four approaches to automated coding used tense to establish temporal category for each description. Concordance between temporal orientation assessments by self-report, human coding, and automatic text mining was evaluated. Results: Human coding more closely matched self-reported coding than automated methods. Accuracy for human (79.93% correct) and automated (57.44% correct) coding was diminished when multiple guesses at ambiguous temporal categories (ties) were allowed in coding (reduction to 74.95% correct for human, 49.05% automated). Conclusion: Ambiguous tense poses a challenge for both human and automated coding protocols that attempt to infer temporal orientation from text describing momentary thought. While methods can be applied to minimize bias, this study demonstrates that researchers need to be wary about attributing temporal orientation to text-reported thought processes, and emphasize the importance of eliciting self-reported judgements.
Project description:This paper uses text data mining to identify long-term developments in tourism academic research from the perspectives of thematic focus, geography, and gender of tourism authorship. Abstracts of papers published in the period of 1970-2017 in high-ranking tourist journals were extracted from the Scopus database and served as data source for the analysis. Fourteen subject areas were identified using the Latent Dirichlet Allocation (LDA) text mining approach. LDA integrated with GIS information allowed to obtain geography distribution and trends of scholarly output, while probabilistic methods of gender identification based on social network data mining were used to track gender dynamics with sufficient confidence. The findings indicate that, while all 14 topics have been prominent from the inception of tourism studies to the present day, the geography of scholarship has notably expanded and the share of female authorship has increased through time and currently almost equals that of male authorship.
Project description:The association between vitamins and oral health have recently been discussed, yielding increased attention from medical and dental perspectives. The present review aimed to systematically evaluate and appraise the most recently scientific papers investigating the role of vitamins in the prevention and treatment of the main oral diseases as hard dental pathological processes and gum/periodontal disease. Randomized controlled trials, cross-sectional studies, cohort studies, comparative studies, validation studies and evaluation studies, following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, reporting associations between vitamins and oral diseases or the use of vitamins to prevent or treat oral diseases in patients of any age were included. PubMed, Embase and Scopus were searched to November 2019 using an ad hoc prepared search string. All the papers meeting the inclusion criteria were subjected to a quality assessment. The search identified 1597 papers; 741 were selected after removing duplicates. A total of 334 articles were excluded after title and abstract evaluation; 407 were assessed and 73 papers were full-text assessed; other 14 papers were discharged after full text evaluation, leaving finally 58 papers included. In general, there is weak evidence supporting the association between vitamins and both gingival/periodontal disease and hard dental pathological processes.
Project description:Numerous functional magnetic resonance imaging (fMRI) studies have reported sex differences. To empirically evaluate for evidence of excessive significance bias in this literature, we searched for published fMRI studies of human brain to evaluate sex differences, regardless of the topic investigated, in Medline and Scopus over 10 years. We analyzed the prevalence of conclusions in favor of sex differences and the correlation between study sample sizes and number of significant foci identified. In the absence of bias, larger studies (better powered) should identify a larger number of significant foci. Across 179 papers, median sample size was n?=?32 (interquartile range 23-47.5). A median of 5 foci related to sex differences were reported (interquartile range, 2-9.5). Few articles (n?=?2) had titles focused on no differences or on similarities (n?=?3) between sexes. Overall, 158 papers (88%) reached "positive" conclusions in their abstract and presented some foci related to sex differences. There was no statistically significant relationship between sample size and the number of foci (-0.048% increase for every 10 participants, p?=?0.63). The extremely high prevalence of "positive" results and the lack of the expected relationship between sample size and the number of discovered foci reflect probable reporting bias and excess significance bias in this literature.
Project description:INTRODUCTION:Transparency and completeness of health research is highly variable, with important deficiencies in the reporting of methods and results of studies. Reporting guidelines aim to improve transparency and quality of research reports, and are often developed by consortia of journal editors, peer reviewers, authors, consumers and other key stakeholders. The objective of this study will be to investigate the characteristics of scientific collaboration among developers and the citation metrics of reporting guidelines of health research. METHODS AND ANALYSIS:This is the study protocol for a cross-sectional analysis of completed reporting guidelines indexed in the Enhancing the QUAlity and Transparency Of health Research Network Library. We will search PubMed/MEDLINE and the Web of Science. Screening, selection and data abstraction will be conducted by one researcher and verified by a second researcher. Potential discrepancies will be resolved via discussion. We will include published papers of reporting guidelines written in English. Published papers will have to meet the definition of a reporting guideline related to health research (eg, a checklist, flow diagram or explicit text), with no restrictions by study design, medical specialty, disease or condition. Raw data from each included paper (including title, publication year, journal, subject category, keywords, citations, and the authors' names, author's affiliated institution and country) will be exported from the Web of Science. Descriptive analyses will be conducted (including the number of papers, citations, authors, countries, journals, keywords and main collaboration metrics). We will identify the most prolific authors, institutions, countries, journals and the most cited papers. Network analyses will be carried out to study the structure of collaborations. ETHICS AND DISSEMINATION:No ethical approval will be required. Findings from this study will be published in peer-reviewed journals. All data will be deposited in a cross-disciplinary public repository. It is anticipated the study findings could be relevant to a variety of audiences.