AAAI 2026 Main Conference

January 24, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Although Large language models (LLMs) are increasingly implicated in interpersonal and societal decision-making, their ability to navigate explicit conflicts between legitimately different cultural value systems remains largely unexamined. Existing benchmarks predominantly target cultural knowledge (CulturalBench), value prediction (WorldValuesBench), or single-axis bias diagnostics (CDEval); none, however, evaluate how LLMs adjudicate when multiple culturally grounded values directly clash. We address this gap with CCD-Bench, a benchmark that assesses LLM decision-making under overt cross-cultural value conflict. CCD-Bench comprises 2,182 open-ended dilemmas spanning seven domains, each paired with exactly ten anonymized response options corresponding to the ten GLOBE cultural clusters, which represent the organizational behavior of 62 societies. These dilemmas are presented using a Stratified Latin Square to mitigate ordering effects. We evaluate 17 leading non-reasoning LLMs. LLMs disproportionately prefer Nordic Europe (mean 20.2\%) and Germanic Europe (12.4\%), while the options for Eastern Europe and the Middle East \& North Africa are underrepresented (5.6–5.8\%). Although 87.9\% of rationales reference two or more GLOBE dimensions, this apparent pluralism is largely superficial: LLMs repeatedly recombine a narrow subset of Future Orientation and Performance Orientation, and rarely ground choices in Assertiveness or Gender Egalitarianism (both $<$3\%). Ordering effects are negligible (Cramér’s $V < 0.10$), and symmetrized KL divergence indicates LLMs clustering by developer lineage rather than geography. Taken together, these patterns suggest that contemporary alignment pipelines encourage a consensus-oriented, progress-centric worldview that underserves scenarios demanding explicit power negotiation, rights-based reasoning, or gender-aware analysis. CCD-Bench thus shifts evaluation from isolated bias detection to pluralistic decision making, revealing that current LLMs maintain Western-centric, consensus-oriented preferences even when confronted with ten equally valid, culturally diverse alternatives, and underscoring the need for alignment strategies that substantively engage with diverse worldviews.

Downloads

Paper

Next from AAAI 2026 Main Conference

The Illusion of Fairness: Auditing Fairness Interventions in Algorithmic Hiring with Audit Studies
poster

The Illusion of Fairness: Auditing Fairness Interventions in Algorithmic Hiring with Audit Studies

AAAI 2026 Main Conference

+1
Patrick Button and 3 other authors

24 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved