
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Hate speech detection is a socially sensitive yet inherently subjective task, where individual judgments can vary widely based on personal traits. While recent work has explored how socio-demographic factors shape annotation behavior, the role of personality in Large Language Models (LLMs) remains underexplored. In this paper, we present the first comprehensive study of persona prompt in hate speech classification, focusing on MBTI-based personas. We begin with a human annotation survey demonstrating that MBTI traits significantly affect labeling behavior. Extending this to LLMs, we prompt four open-source LLMs with MBTI personas and evaluate their responses across three hate speech datasets. Our analysis reveals substantial persona-induced shifts, including inconsistencies with ground truth, disagreement across personas, and logit-level biases. These findings highlight the importance of defining persona prompt in LLM-based annotation tasks, with implications for model fairness and alignment with human values.
