AAAI 2026

January 23, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Humans display significant uncertainty when faced with moral dilemmas, yet the extent of such uncertainty in large language models (LLMs) remains underexplored. In contrast, studies have confirmed the tendency of LLMs to be overly confident in their judgments, even as they are embedded in ethical decision-making frameworks, necessitating a deeper understanding of their moral reasoning and inherent uncertainties for building reliable AI systems. This work examines how uncertainties affect moral decisions in trolley problems across 32 open-source LLMs, spanning 9 distinct moral dimensions. Our analysis reveals that the variance in LLM confidence is greater among different models than it is within moral dimensions, indicating that moral uncertainty is predominantly shaped by the LLM architecture and training methodology. Next, we measure uncertainty via binary entropy and decompose it into total entropy, conditional entropy, and mutual information. To explore the effect of uncertainty in models, we deliberately added stochasticity in models via “dropout” at inference time. Our findings indicate that this intervention leads to a higher total entropy, primarily through an increase in mutual information, while conditional entropy remains largely unchanged. This intervention further yields significant improvements in human-LLM moral alignment, with correlations in mutual information and alignment score shifts. Our results highlight the potential to better align model-generated decisions and human preferences by deliberately modulating uncertainty and reducing LLM’s confidence in morally complex scenarios.

Downloads

Paper

Next from AAAI 2026

Editing as Unlearning: Are Knowledge Editing Methods Strong Baselines for Large Language Model Unlearning?
poster

Editing as Unlearning: Are Knowledge Editing Methods Strong Baselines for Large Language Model Unlearning?

AAAI 2026

+5
Dongqi Cai and 7 other authors

23 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved