Project description:ObjectiveTo investigate how the general public trades off explainability versus accuracy of artificial intelligence (AI) systems and whether this differs between healthcare and non-healthcare scenarios.Materials and methodsCitizens' juries are a form of deliberative democracy eliciting informed judgment from a representative sample of the general public around policy questions. We organized two 5-day citizens' juries in the UK with 18 jurors each. Jurors considered 3 AI systems with different levels of accuracy and explainability in 2 healthcare and 2 non-healthcare scenarios. Per scenario, jurors voted for their preferred system; votes were analyzed descriptively. Qualitative data on considerations behind their preferences included transcribed audio-recordings of plenary sessions, observational field notes, outputs from small group work and free-text comments accompanying jurors' votes; qualitative data were analyzed thematically by scenario, per and across AI systems.ResultsIn healthcare scenarios, jurors favored accuracy over explainability, whereas in non-healthcare contexts they either valued explainability equally to, or more than, accuracy. Jurors' considerations in favor of accuracy regarded the impact of decisions on individuals and society, and the potential to increase efficiency of services. Reasons for emphasizing explainability included increased opportunities for individuals and society to learn and improve future prospects and enhanced ability for humans to identify and resolve system biases.ConclusionCitizens may value explainability of AI systems in healthcare less than in non-healthcare domains and less than often assumed by professionals, especially when weighed against system accuracy. The public should therefore be actively consulted when developing policy on AI explainability.
Project description:The basal ganglia (BG) play a key role in decision-making, preventing impulsive actions in some contexts while facilitating fast adaptations in others. The specific contributions of different BG structures to this nuanced behavior remain unclear, particularly under varying situations of noisy and conflicting information that necessitate ongoing adjustments in the balance between speed and accuracy. Theoretical accounts suggest that dynamic regulation of the amount of evidence required to commit to a decision (a dynamic "decision boundary") may be necessary to meet these competing demands. Through the application of novel computational modeling tools in tandem with direct neural recordings from human BG areas, we find that neural dynamics in the theta band manifest as variations in a collapsing decision boundary as a function of conflict and uncertainty. We collected intracranial recordings from patients diagnosed with either Parkinson's disease (PD) (n = 14) or dystonia (n = 3) in the subthalamic nucleus (STN), globus pallidus internus (GPi), and globus pallidus externus (GPe) during their performance of a novel perceptual discrimination task in which we independently manipulated uncertainty and conflict. To formally characterize whether these task and neural components influenced decision dynamics, we leveraged modified diffusion decision models (DDMs). Behavioral choices and response time distributions were best characterized by a modified DDM in which the decision boundary collapsed over time, but where the onset and shape of this collapse varied with conflict. Moreover, theta dynamics in BG structures modulated the onset and shape of this collapse but differentially across task conditions. In STN, theta activity was related to a prolonged decision boundary (indexed by slower collapse and therefore more deliberate choices) during high conflict situations. Conversely, rapid declines in GPe theta during low conflict conditions were related to rapidly collapsing boundaries and expedited choices, with additional complementary decision bound adjustments during high uncertainty situations. Finally, GPi theta effects were uniform across conditions, with increases in theta associated with a prolongation of decision bound collapses. Together, these findings provide a nuanced understanding of how our brain thwarts impulsive actions while nonetheless enabling behavioral adaptation amidst noisy and conflicting information.
Project description:PurposeClinical genome sequencing (cGS) followed by orthogonal confirmatory testing is standard practice. While orthogonal testing significantly improves specificity, it also results in increased turnaround time and cost of testing. The purpose of this study is to evaluate machine learning models trained to identify false positive variants in cGS data to reduce the need for orthogonal testing.MethodsWe sequenced five reference human genome samples characterized by the Genome in a Bottle Consortium (GIAB) and compared the results with an established set of variants for each genome referred to as a truth set. We then trained machine learning models to identify variants that were labeled as false positives.ResultsAfter training, the models identified 99.5% of the false positive heterozygous single-nucleotide variants (SNVs) and heterozygous insertions/deletions variants (indels) while reducing confirmatory testing of nonactionable, nonprimary SNVs by 85% and indels by 75%. Employing the algorithm in clinical practice reduced overall orthogonal testing using dideoxynucleotide (Sanger) sequencing by 71%.ConclusionOur results indicate that a low false positive call rate can be maintained while significantly reducing the need for confirmatory testing. The framework that generated our models and results is publicly available at https://github.com/HudsonAlpha/STEVE .
Project description:This study addressed the cognitive impacts of providing correct and incorrect machine learning (ML) outputs in support of an object detection task. The study consisted of five experiments that manipulated the accuracy and importance of mock ML outputs. In each of the experiments, participants were given the T and L task with T-shaped targets and L-shaped distractors. They were tasked with categorizing each image as target present or target absent. In Experiment 1, they performed this task without the aid of ML outputs. In Experiments 2-5, they were shown images with bounding boxes, representing the output of an ML model. The outputs could be correct (hits and correct rejections), or they could be erroneous (false alarms and misses). Experiment 2 manipulated the overall accuracy of these mock ML outputs. Experiment 3 manipulated the proportion of different types of errors. Experiments 4 and 5 manipulated the importance of specific types of stimuli or model errors, as well as the framing of the task in terms of human or model performance. These experiments showed that model misses were consistently harder for participants to detect than model false alarms. In general, as the model's performance increased, human performance increased as well, but in many cases the participants were more likely to overlook model errors when the model had high accuracy overall. Warning participants to be on the lookout for specific types of model errors had very little impact on their performance. Overall, our results emphasize the importance of considering human cognition when determining what level of model performance and types of model errors are acceptable for a given task.
Project description:The literature has been relatively silent about post-conflict processes. However, understanding the way humans deal with post-conflict situations is a challenge in our societies. With this in mind, we focus the present study on the rationality of cooperative decision making after an intergroup conflict, i.e., the extent to which groups take advantage of post-conflict situations to obtain benefits from collaborating with the other group involved in the conflict. Based on dual-process theories of thinking and affect heuristic, we propose that intergroup conflict hinders the rationality of cooperative decision making. We also hypothesize that this rationality improves when groups are involved in an in-group deliberative discussion. Results of a laboratory experiment support the idea that intergroup conflict -associated with indicators of the activation of negative feelings (negative affect state and heart rate)- has a negative effect on the aforementioned rationality over time and on both group and individual decision making. Although intergroup conflict leads to sub-optimal decision making, rationality improves when groups and individuals subjected to intergroup conflict make decisions after an in-group deliberative discussion. Additionally, the increased rationality of the group decision making after the deliberative discussion is transferred to subsequent individual decision making.
Project description:Drug-related errors are a leading cause of preventable patient harm in the clinical setting. We present the first wearable camera system to automatically detect potential errors, prior to medication delivery. We demonstrate that using deep learning algorithms, our system can detect and classify drug labels on syringes and vials in drug preparation events recorded in real-world operating rooms. We created a first-of-its-kind large-scale video dataset from head-mounted cameras comprising 4K footage across 13 anesthesiology providers, 2 hospitals and 17 operating rooms over 55 days. The system was evaluated on 418 drug draw events in routine patient care and a controlled environment and achieved 99.6% sensitivity and 98.8% specificity at detecting vial swap errors. These results suggest that our wearable camera system has the potential to provide a secondary check when a medication is selected for a patient, and a chance to intervene before a potential medical error.
Project description:BackgroundFalse duplications in genome assemblies lead to false biological conclusions. We quantified false duplications in popularly used previous genome assemblies for platypus, zebra finch, and Anna's Hummingbird, and their new counterparts of the same species generated by the Vertebrate Genomes Project, of which the Vertebrate Genomes Project pipeline attempted to eliminate false duplications through haplotype phasing and purging. These assemblies are among the first generated by the Vertebrate Genomes Project where there was a prior chromosomal level reference assembly to compare with.ResultsWhole genome alignments revealed that 4 to 16% of the sequences are falsely duplicated in the previous assemblies, impacting hundreds to thousands of genes. These lead to overestimated gene family expansions. The main source of the false duplications is heterotype duplications, where the haplotype sequences were relatively more divergent than other parts of the genome leading the assembly algorithms to classify them as separate genes or genomic regions. A minor source is sequencing errors. Ancient ATP nucleotide binding gene families have a higher prevalence of false duplications compared to other gene families. Although present in a smaller proportion, we observe false duplications remaining in the Vertebrate Genomes Project assemblies that can be identified and purged.ConclusionsThis study highlights the need for more advanced assembly methods that better separate haplotypes and sequence errors, and the need for cautious analyses on gene gains.
Project description:In recent years, next-generation sequencing (NGS) has become a cornerstone of clinical genetics and diagnostics. Many clinical applications require high precision, especially if rare events such as somatic mutations in cancer or genetic variants causing rare diseases need to be identified. Although random sequencing errors can be modeled statistically and deep sequencing minimizes their impact, systematic errors remain a problem even at high depth of coverage. Understanding their source is crucial to increase precision of clinical NGS applications. In this work, we studied the relation between recurrent biases in allele balance (AB), systematic errors, and false positive variant calls across a large cohort of human samples analyzed by whole exome sequencing (WES). We have modeled the AB distribution for biallelic genotypes in 987 WES samples in order to identify positions recurrently deviating significantly from the expectation, a phenomenon we termed allele balance bias (ABB). Furthermore, we have developed a genotype callability score based on ABB for all positions of the human exome, which detects false positive variant calls that passed state-of-the-art filters. Finally, we demonstrate the use of ABB for detection of false associations proposed by rare variant association studies. Availability: https://github.com/Francesc-Muyas/ABB.