Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
The obligatory use of third-person honorifics is a distinctive feature of several South Asian languages, encoding nuanced socio-pragmatic cues such as power, age, gender, fame, and social distance. In this work, (i) We present the first large-scale study of third-person honorific pronoun and verb usage across 10,000 Hindi and Bengali Wikipedia articles with annotations linked to key socio-demographic attributes of the subjects, including gender, age group, fame, and cultural origin. (ii) We uncover systematic intra-language patterns but cross-linguistic inconsistencies: honorifics are more common in Bengali than Hindi, while non-honorifics dominate for infamous, juvenile, and exotic entities. Notably, in both languages, and more prominently in Hindi, men are more frequently addressed with honorifics than women. (iii) To examine whether LLMs internalize similar norms, we probe six LLMs using controlled generation and translation tasks over 1,000 culturally balanced entities. We find that LLMs diverge from Wikipedia trends, exhibiting alternative patterns, raising questions about the real-world validity of their learned patterns, a direction that we leave for future exploration. Our code and data are publicly available at https://anonymous.4open.science/r/honorific-wiki-llm-480F/.