Visualizing the Bayesian 2-test case: The effect of tree diagrams on medical decision making.
ABSTRACT: In medicine, diagnoses based on medical test results are probabilistic by nature. Unfortunately, cognitive illusions regarding the statistical meaning of test results are well documented among patients, medical students, and even physicians. There are two effective strategies that can foster insight into what is known as Bayesian reasoning situations: (1) translating the statistical information on the prevalence of a disease and the sensitivity and the false-alarm rate of a specific test for that disease from probabilities into natural frequencies, and (2) illustrating the statistical information with tree diagrams, for instance, or with other pictorial representation. So far, such strategies have only been empirically tested in combination for "1-test cases", where one binary hypothesis ("disease" vs. "no disease") has to be diagnosed based on one binary test result ("positive" vs. "negative"). However, in reality, often more than one medical test is conducted to derive a diagnosis. In two studies, we examined a total of 388 medical students from the University of Regensburg (Germany) with medical "2-test scenarios". Each student had to work on two problems: diagnosing breast cancer with mammography and sonography test results, and diagnosing HIV infection with the ELISA and Western Blot tests. In Study 1 (N = 190 participants), we systematically varied the presentation of statistical information ("only textual information" vs. "only tree diagram" vs. "text and tree diagram in combination"), whereas in Study 2 (N = 198 participants), we varied the kinds of tree diagrams ("complete tree" vs. "highlighted tree" vs. "pruned tree"). All versions were implemented in probability format (including probability trees) and in natural frequency format (including frequency trees). We found that natural frequency trees, especially when the question-related branches were highlighted, improved performance, but that none of the corresponding probabilistic visualizations did.
Project description:BACKGROUND:Summary of findings tables in systematic reviews are highly informative but require epidemiological training to be interpreted correctly. The usage of fishbone diagrams as graphical displays could offer researchers an effective approach to simplify content for readers with limited epidemiological training. In this paper we demonstrate how fishbone diagrams can be applied to systematic reviews and present the results of an initial user testing. METHODS:Findings from two systematic reviews were graphically depicted in the form of the fishbone diagram. To test the utility of fishbone diagrams compared with summary of findings tables, we developed and pilot-tested an online survey using Qualtrics. Respondents were randomized to the fishbone diagram or a summary of findings table presenting the same body of evidence. They answered questions in both open-ended and closed-answer formats; all responses were anonymous. Measures of interest focused on first and second impressions, the ability to find and interpret critical information, as well as user experience with both displays. We asked respondents about the perceived utility of fishbone diagrams compared to summary of findings tables. We analyzed quantitative data by conducting t-tests and comparing descriptive statistics. RESULTS:Based on real world systematic reviews, we provide two different fishbone diagrams to show how they might be used to display complex information in a clear and succinct manner. User testing on 77 students with basic epidemiological training revealed that participants preferred summary of findings tables over fishbone diagrams. Significantly more participants liked the summary of findings table than the fishbone diagram (71.8% vs. 44.8%; p?<?.01); significantly more participants found the fishbone diagram confusing (63.2% vs. 35.9%, p?<?.05) or indicated that it was difficult to find information (65.8% vs. 45%; p?<?.01). However, more than half of the participants in both groups were unable to find critical information and answer three respective questions correctly (52.6% in the fishbone group; 51.3% in the summary of findings group). CONCLUSIONS:Fishbone diagrams are compact visualizations that, theoretically, may prove useful for summarizing the findings of systematic reviews. Initial user testing, however, did not support the utility of such graphical displays.
Project description:In teaching statistics in secondary schools and at university, two visualizations are primarily used when situations with two dichotomous characteristics are represented: 2 × 2 tables and tree diagrams. Both visualizations can be depicted either with probabilities or with frequencies. Visualizations with frequencies have been shown to help students significantly more in Bayesian reasoning problems than probability visualizations do. Because tree diagrams or double-trees (which are largely unknown in school) are node-branch structures, these two visualizations (in contrast to the 2 × 2 table) can even simultaneously display probabilities on branches and frequencies inside the nodes. This is a teaching advantage as it allows the frequency concept to be used to better understand probabilities. However, 2 × 2 tables and (double-)trees have a decisive disadvantage: While joint probabilities [e.g., P(A?B)] are represented in 2 × 2 tables but no conditional probabilities [e.g., P(A|B)], it is exactly the other way around with (double-)trees. Therefore, a visualization that is equally suitable for the representation of joint probabilities and conditional probabilities is desirable. In this article, we present a new visualization-the frequency net-in which all absolute frequencies and all types of probabilities can be depicted. In addition to a detailed theoretical analysis of the frequency net, we report the results of a study with 249 university students that shows that "net diagrams" can improve reasoning without previous instruction to a similar extent as 2 × 2 tables and double-trees. Regarding questions about conditional probabilities, frequency visualizations (2 × 2 table, double-tree, or net diagram with absolute frequencies) are consistently superior to probability visualizations, and the frequency net performs as well as the frequency double-tree. Only the 2 × 2 table with frequencies-the one visualization that participants were already familiar with-led to higher performance rates. If, on the other hand, a question about a joint probability had to be answered, all implemented visualizations clearly supported participants' performance, but no uniform format effect becomes visible. Here, participants reached the highest performance in the versions with probability 2 × 2 tables and probability net diagrams. Furthermore, after conducting a detailed error analysis, we report interesting error shifts between the two information formats and the different visualizations and give recommendations for teaching probability.
Project description:Systematic reviews and/or meta-analyses generally provide the best evidence for medical research. Authors are recommended to use flow diagrams to present the review process, allowing for better understanding among readers. However, no studies as of yet have assessed the quality of flow diagrams in systematic review/meta-analyses. Our study aims to evaluate the quality of systematic review/meta-analyses over a period of ten years, by assessing the quality of the flow diagrams, and the correlation to the methodological quality. Two hundred articles of "systematic review" and/or "meta-analysis" from January 2004 to August 2015 were randomly retrieved in Pubmed to be assessed for the flow diagram and methodological qualities. The flow diagrams were evaluated using a 16-grade scale corresponding to the four stages of PRISMA flow diagram. It composes four parts: Identification, Screening, Eligibility and Inclusion. Of the 200 articles screened, 154 articles were included and were assessed with AMSTAR checklist. Among them, 78 articles (50.6%) had the flow diagram. Over ten years, the proportion of papers with flow diagram available had been increasing significantly with regression coefficient beta = 5.649 (p = 0.002). However, the improvement in quality of the flow diagram increased slightly but not significantly (regression coefficient beta = 0.177, p = 0.133). Our analysis showed high variation in the proportion of articles that reported flow diagram components. The lowest proportions were 1% for reporting methods of duplicates removal in screening phase, followed by 6% for manual search in identification phase, 22% for number of studies for each specific/subgroup analysis, 27% for number of articles retrieved from each database, and 31% for number of studies included in qualitative analysis. The flow diagram quality was correlated with the methodological quality with the Pearson's coefficient r = 0.32 (p = 0.0039). Therefore, this review suggests that the reporting quality of flow diagram is less satisfactory, hence not maximizing the potential benefit of the flow diagrams. A guideline with standardized flow diagram is recommended to improve the quality of systematic reviews, and to enable better reader comprehension of the review process.
Project description:<h4>Background</h4>Visualization of orthogonal (disjoint) or overlapping datasets is a common task in bioinformatics. Few tools exist to automate the generation of extensively-customizable, high-resolution Venn and Euler diagrams in the R statistical environment. To fill this gap we introduce VennDiagram, an R package that enables the automated generation of highly-customizable, high-resolution Venn diagrams with up to four sets and Euler diagrams with up to three sets.<h4>Results</h4>The VennDiagram package offers the user the ability to customize essentially all aspects of the generated diagrams, including font sizes, label styles and locations, and the overall rotation of the diagram. We have implemented scaled Venn and Euler diagrams, which increase graphical accuracy and visual appeal. Diagrams are generated as high-definition TIFF files, simplifying the process of creating publication-quality figures and easing integration with established analysis pipelines.<h4>Conclusions</h4>The VennDiagram package allows the creation of high quality Venn and Euler diagrams in the R statistical environment.
Project description:Indicator vector analysis of a nucleotide sequence alignment generates a compact heat map, called a Klee diagram, with potential insight into clustering patterns in evolution. However, so far this approach has examined only mitochondrial cytochrome c oxidase I (COI) DNA barcode sequences. To further explore, we developed TreeParser, a freely-available web-based program that sorts a sequence alignment according to a phylogenetic tree generated from the dataset. We applied TreeParser to nuclear gene and COI barcode alignments from birds and butterflies. Distinct blocks in the resulting Klee diagrams corresponded to species and higher-level taxonomic divisions in both groups, and this enabled graphic comparison of phylogenetic information in nuclear and mitochondrial genes. Our results demonstrate TreeParser-aided Klee diagrams objectively display taxonomic clusters in nucleotide sequence alignments. This approach may help establish taxonomy in poorly studied groups and investigate higher-level clustering which appears widespread but not well understood.
Project description:Gene trees inferred solely from multiple alignments of homologous sequences often contain weakly supported and uncertain branches. Information for their full resolution may lie in the dependency between gene families and their genomic context. Integrative methods, using species tree information in addition to sequence information, often rely on a computationally intensive tree space search which forecloses an application to large genomic databases.We propose a new method, called ProfileNJ, that takes a gene tree with statistical supports on its branches, and corrects its weakly supported parts by using a combination of information from a species tree and a distance matrix. Its low running time enabled us to use it on the whole Ensembl Compara database, for which we propose an alternative, arguably more plausible set of gene trees. This allowed us to perform a genome-wide analysis of duplication and loss patterns on the history of 63 eukaryote species, and predict ancestral gene content and order for all ancestors along the phylogeny.A web interface called RefineTree, including ProfileNJ as well as a other gene tree correction methods, which we also test on the Ensembl gene families, is available at: http://www-ens.iro.umontreal.ca/~adbit/polytomysolver.html. The code of ProfileNJ as well as the set of gene trees corrected by ProfileNJ from Ensembl Compara version 73 families are also made available.
Project description:Causal loop diagrams developed by groups capture a shared understanding of complex problems and provide a visual tool to guide interventions. This paper explores the application of network analytic methods as a new way to gain quantitative insight into the structure of an obesity causal loop diagram to inform intervention design. Identification of the structural features of causal loop diagrams is likely to provide new insights into the emergent properties of complex systems and analysing central drivers has the potential to identify leverage points. The results found the structure of the obesity causal loop diagram to resemble commonly observed empirical networks known for efficient spread of information. Known drivers of obesity were found to be the most central variables along with others unique to obesity prevention in the community. While causal loop diagrams are often specific to single communities, the analytic methods provide means to contrast and compare multiple causal loop diagrams for complex problems.
Project description:The accurate inference of gene trees is a necessary step in many evolutionary studies. Although the problem of accurate gene tree inference has received considerable attention, most existing methods are only applicable to gene families unaffected by horizontal gene transfer. As a result, the accurate inference of gene trees affected by horizontal gene transfer remains a largely unaddressed problem.In this study, we introduce a new and highly effective method for gene tree error correction in the presence of horizontal gene transfer. Our method efficiently models horizontal gene transfers, gene duplications and losses, and uses a statistical hypothesis testing framework [Shimodaira-Hasegawa (SH) test] to balance sequence likelihood with topological information from a known species tree. Using a thorough simulation study, we show that existing phylogenetic methods yield inaccurate gene trees when applied to horizontally transferred gene families and that our method dramatically improves gene tree accuracy. We apply our method to a dataset of 11 cyanobacterial species and demonstrate the large impact of gene tree accuracy on downstream evolutionary analyses.An implementation of our method is available at http://compbio.mit.edu/treefix-dtl/: email@example.com or firstname.lastname@example.orgSupplementary data are available at Bioinformatics online.
Project description:Changing the information format from probabilities into frequencies as well as employing appropriate visualizations such as tree diagrams or 2 × 2 tables are important tools that can facilitate people's statistical reasoning. Previous studies have shown that despite their widespread use in statistical textbooks, both of those visualization types are only of restricted help when they are provided with probabilities, but that they can foster insight when presented with frequencies instead. In the present study, we attempt to replicate this effect and also examine, by the method of eye tracking, <i>why</i> probabilistic 2 × 2 tables and tree diagrams do not facilitate reasoning with regard to Bayesian inferences (i.e., determining what errors occur and whether they can be explained by scan paths), and <i>why</i> the same visualizations are of great help to an individual when they are combined with frequencies. All ten inferences of <i>N</i> = 24 participants were based solely on tree diagrams or 2 × 2 tables that presented either the famous "mammography context" or an "economics context" (without additional textual wording). We first asked participants for marginal, conjoint, and (non-inverted) conditional probabilities (or frequencies), followed by related Bayesian tasks. While solution rates were higher for natural frequency questions as compared to probability versions, eye-tracking analyses indeed yielded noticeable differences regarding eye movements between correct and incorrect solutions. For instance, heat maps (aggregated scan paths) of distinct results differed remarkably, thereby making correct and faulty strategies visible in the line of theoretical classifications. Moreover, the inherent structure of 2 × 2 tables seems to help participants avoid certain Bayesian mistakes (e.g., "Fisherian" error) while tree diagrams seem to help steer them away from others (e.g., "joint occurrence"). We will discuss resulting educational consequences at the end of the paper.
Project description:Despite recent interest in reconstructing neuronal networks, complete wiring diagrams on the level of individual synapses remain scarce and the insights into function they can provide remain unclear. Even for Caenorhabditis elegans, whose neuronal network is relatively small and stereotypical from animal to animal, published wiring diagrams are neither accurate nor complete and self-consistent. Using materials from White et al. and new electron micrographs we assemble whole, self-consistent gap junction and chemical synapse networks of hermaphrodite C. elegans. We propose a method to visualize the wiring diagram, which reflects network signal flow. We calculate statistical and topological properties of the network, such as degree distributions, synaptic multiplicities, and small-world properties, that help in understanding network signal propagation. We identify neurons that may play central roles in information processing, and network motifs that could serve as functional modules of the network. We explore propagation of neuronal activity in response to sensory or artificial stimulation using linear systems theory and find several activity patterns that could serve as substrates of previously described behaviors. Finally, we analyze the interaction between the gap junction and the chemical synapse networks. Since several statistical properties of the C. elegans network, such as multiplicity and motif distributions are similar to those found in mammalian neocortex, they likely point to general principles of neuronal networks. The wiring diagram reported here can help in understanding the mechanistic basis of behavior by generating predictions about future experiments involving genetic perturbations, laser ablations, or monitoring propagation of neuronal activity in response to stimulation.