
10
presentations
31
number of views
SHORT BIO
Hannah Rose Kirk is a PhD student at the University of Oxford, UK, and visiting academic at NYU’s Center for Data Science. Hannah's research centres on the role of granular and diverse human feedback for aligning large language models. Her body of published work spans computational linguistics, computer vision, ethics and sociology, addressing a broad range of issues such as AI alignment, bias, fairness, and hate speech from a multidisciplinary perspective. Hannah holds degrees from the University of Oxford, the University of Cambridge and Peking University. Alongside academia, she collaborates often with industry projects at Google, OpenAI and MetaAI.
Presentations

Adversarial Nibbler - A novel crowdsourcing procedure for detecting harmful content in t2i models
Jessica Quaye and 14 other authors

XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
Paul Röttger and 5 other authors

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values | VIDEO
Hannah Rose Kirk and 4 other authors

SemEval-2023 Task 10: Explainable Detection of Online Sexism
Hannah Rose Kirk and 3 other authors

Handling and Presenting Harmful Text in NLP Research
Hannah Rose Kirk and 3 other authors

A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning
Hugo Berg and 5 other authors

Is More Data Better? Re-thinking the Importance of Efficiency in Abusive Language Detection with Transformers-Based Active Learning
Hannah Rose Kirk

Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-Based Hate
Hannah Rose Kirk and 4 other authors

Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset
Hannah Rose Kirk and 9 other authors

Handling and Presenting Harmful Text in NLP Research
Hannah Rose Kirk and 3 other authors