Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Machine translation (MT) research addressing gender inclusivity has gained attention for promoting non-exclusionary language representing all genders. However, existing resources are limited to short sources, most often single sentences, or single gender-fair formulation types, leaving questions about MT models' ability to use context and diverse inclusive forms. We introduce Glitter, a new English-German benchmark featuring extended passages with professional translations implementing three gender-fair alternatives: neutral rephrasing, typographical solutions (gender star), and neologistic forms (-ens endings). Our experiments reveal significant limitations in state-of-the-art language models, which default to masculine generics, struggle to interpret explicit gender cues in context, and rarely produce gender-fair translations. Through systematic prompting analysis designed to elicit fair language, we demonstrate that current models lack a fundamental understanding of source gender phenomena, failing to implement inclusive forms even when explicitly instructed. Glitter establishes a challenging benchmark, advancing research in gender-fair English-German MT. It highlights substantial room for improvement even among leading models and can serve to guide development of future MT models capable of accurately representing gender diversity.