
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.

workshop paper
$NormAd$: A Benchmark for Measuring the Cultural Adaptability of Large Language Models
keywords:
contexts
adaptability
culture
Integrating Large Language Models (LLMs) into various global cultures presents a cultural challenge: they must respect social norms, and avoid transgressing cultural boundaries. This calls for LLMs to be able to $\textit{adapt}$ their outputs to diverse cultural norms. However, the extent of this $\textit{cultural adaptability}$ remains unclear. To this end, we introduce $NormAd$, a dataset of 2.6k stories that represent social and cultural norms from 75 countries to assess the ability of LLMs to adapt to different granular contexts. Evaluation on $NormAd$ suggests that LLMs struggle with cultural reasoning across all contextual granularities -- Mistral-7b-Instruct, one of the top performing models, achieves only 81.8\% accuracy versus 95.6\% achieved by humans. Additionally, LLMs show stronger adaptability to English-centric cultures over those from the Global South, and find it considerably easier to assess the social acceptability of stories that adhere to cultural norms than those that deviate from them. Our benchmark emphasizes the potential to make models more equitable towards global audiences.