Chain Versus Common Cause: Biased Causal Strength Judgments in Humans and Large Language Models

Anita Keshmirian

Premium content

Access to this content requires a subscription. You must be a premium user to view this content.

Monthly subscription - $9.99 Pay per view - $4.99 Access through your institution Login with Underline account

Need help?

Contact us

CogSci 2024

•

July 25, 2024

•

Rotterdam, Netherlands

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Causal reasoning is a critical aspect of both human cognition and artificial intelligence (AI), playing a prominent role in understanding the relationships between events. Causal Bayesian Networks (CBNs) have been instrumental in modeling such relationships, using directed, acyclic links between nodes in a network to depict probabilistic associations between variables. Deviations from these graphical models’ edicts would result in biased judgments. This study explores one such bias in the causal judgments of humans and Large Language Models (LLMs) by examining two structures in CBNs: Canonical Chain (A→B→C) and Common Cause (A←B→C) networks. In these structures, once the intermediate variable (B) is known, the probability of the outcome (C) is normatively independent of the initial cause (A). However, studies have shown that humans often ignore this independence. We tested the mutually exclusive predictions of three theories that could account for this bias (N=300). Using hierarchical mixed-effect models, we found that humans tend to perceive causes in Chain structures as significantly stronger, providing support for only one of the hypotheses. This increase in perceived causal power might reflect a view of intermediate causes as more reflective of reliable mechanisms, which could in turn stem from our interactions with the world or the way we communicate causality to others. LLMs are primarily trained on language data. Therefore, examining whether they exhibit similar biases in causal reasoning can help us understand the origins of canonical Chain structures’ perceived causal power, while also shedding light on whether LLMs can abstract causal principles. To investigate this, we subjected three LLMs, GPT3.5-Turbo, GPT4, and Luminous Supreme Control to the same queries as our human subjects, adjusting a key ‘temperature’ hyperparameter. Our findings show that, particularly with higher temperatures (i.e., greater randomness), LLMs exhibit a similar boost in the perceived causal power of Chains, suggesting the bias is at least partly reflected in language use. Similar results across items suggest a degree of causal principle abstraction in the studied models. Implications for causal representation in humans and LLMs are discussed.

Authors:

Anita Keshmirian: Forward College; Moritz Willig: Technical University of Darmstadt; Babak Hemmatian: University of Illinois Urbana-Champaign; Kristian Kersting Kersting: TU Darmstadt; Ulrike Hahn: Birkbeck, University of London; Tobias Gerstenberg: Stanford University