Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Existing stereotype auditing methods for large language models (LLM) typically rely on isolated rating schemes or task-specific probes, lacking a theoretical grounding and failing to reveal the internal organization beyond surface-level output patterns. In this paper, we introduce SCoUT (Stereotype Content oriented Utility structure via Thurstonian modeling), a closed-loop framework that structurally models, explicitly probes, and causally intervenes on stereotype dimensions(warmth and competence) in LLMs. SCoUT first reconstructs a global stereotype utility structure aligned with Stereotype Content Model theory via Thurstonian comparative judgments. Across multiple open-source LLMs, this modeling achieves high pairwise-preference prediction accuracy ($\ge0.90$ on larger-scale models) and exhibits strong cross-model consistency. Probing internal attention mechanisms localizes this structure to specific heads (Spearman’s $\rho$ up to 0.83 for warmth and 0.90 for competence) and surfaces a salient asymmetry between warmth and competence. Further, targeted inference-time activation modifications on these dimension-sensitive heads consistently steer model outputs along the intended axes. By bridging behavioral measurement with internal representation and controllable steering, SCoUT offers an end-to-end framework that uncovers and interprets the latent structure of stereotypes, advancing stereotype auditing from surface detection to structural analysis.