Can counterfactual explanations of AI systems’ predictions skew lay users’ causal intuitions about the world? If so, can we correct for that?
Ontology highlight
ABSTRACT: Summary Counterfactual (CF) explanations have been employed as one of the modes of explainability in explainable artificial intelligence (AI)—both to increase the transparency of AI systems and to provide recourse. Cognitive science and psychology have pointed out that people regularly use CFs to express causal relationships. Most AI systems, however, are only able to capture associations or correlations in data, so interpreting them as casual would not be justified. In this perspective, we present two experiments (total n = 364) exploring the effects of CF explanations of AI systems’ predictions on lay people’s causal beliefs about the real world. In Experiment 1, we found that providing CF explanations of an AI system’s predictions does indeed (unjustifiably) affect people’s causal beliefs regarding factors/features the AI uses and that people are more likely to view them as causal factors in the real world. Inspired by the literature on misinformation and health warning messaging, Experiment 2 tested whether we can correct for the unjustified change in causal beliefs. We found that pointing out that AI systems capture correlations and not necessarily causal relationships can attenuate the effects of CF explanations on people’s causal beliefs. The bigger picture Explainable artificial intelligence provides methods for bringing in transparency into black-box artificial intelligence (AI) systems. These methods produce explanations of AI systems’ predictions that are aimed at increasing the understanding of the AI systems’ behavior and help us to appropriately calibrate our trust in these systems. In this perspective, we explore some of the potential undesirable effects of providing explanations of AI systems to human users and ways to mitigate such effects. We start from the observation that most AI systems capture correlations and associations in data and not causal relationships. Explanations of the AI systems’ predictions would make the correlations more transparent. They would not, however, make the explained relationships causal. In two experiments, we show how providing counterfactual explanations of AI systems’ predictions unjustifiably changes people’s beliefs about causal relationships in the real world. We also show how we may go about preventing such a change in beliefs and hope to open doors for further exploration into psychological effects of AI explanations on human recipients. Most AI systems capture correlations and associations in data. Explaining AI systems’ predictions amounts to making these correlational relationships more transparent, yet not causal. Some AI explainability methods produce types of explanations that people would naturally use to communicate causal relationships in the world. We provide evidence that these explanations of AI systems, known as counterfactual explanations, can lead to an unjustified change in people’s causal beliefs about the world. We also propose a way to prevent this unjustified change.
SUBMITTER: Tesic M  
PROVIDER: S-EPMC9768678 | biostudies-literature | 2022 Dec 
REPOSITORIES:  biostudies-literature
ACCESS DATA