Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Auditing large language models (LLMs) for biases is an ongoing and dynamic process, resembling a proverbial cat-and-mouse game. As researchers identify new vulnerabilities in LLMs, guardrails are updated to address them, prompting the need for innovative approaches to audit the increasingly fortified LLMs for biases. This paper makes three contributions. First, it introduces a scalable, explainable framework to measure biases against various identity groups across multiple open large language models. Second, it conducts a bias audit considering five well-known open LLMs and demonstrates their bias inclinations towards several historically disadvantaged groups. Our audit reveals disturbing antisemitic, Islamophobic, and xenophobic biases present in several well-known LLMs. Finally, we release a dataset of 1,000 probes curated under the supervision of an expert social scientist that can facilitate similar audits.
