Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Humans display significant uncertainty when faced with moral dilemmas, yet the extent of such uncertainty in large language models (LLMs) remains underexplored. In contrast, studies have confirmed the tendency of LLMs to be overly confident in their judgments, even as they are embedded in ethical decision-making frameworks, necessitating a deeper understanding of their moral reasoning and inherent uncertainties for building reliable AI systems. This work examines how uncertainties affect moral decisions in trolley problems across 32 open-source LLMs, spanning 9 distinct moral dimensions. Our analysis reveals that the variance in LLM confidence is greater among different models than it is within moral dimensions, indicating that moral uncertainty is predominantly shaped by the LLM architecture and training methodology. Next, we measure uncertainty via binary entropy and decompose it into total entropy, conditional entropy, and mutual information. To explore the effect of uncertainty in models, we deliberately added stochasticity in models via “dropout” at inference time. Our findings indicate that this intervention leads to a higher total entropy, primarily through an increase in mutual information, while conditional entropy remains largely unchanged. This intervention further yields significant improvements in human-LLM moral alignment, with correlations in mutual information and alignment score shifts. Our results highlight the potential to better align model-generated decisions and human preferences by deliberately modulating uncertainty and reducing LLM’s confidence in morally complex scenarios.
