profile picture

Hannah Rose Kirk

guidelines

ethical research

large language models

data harms

multimodal

survey

data efficiency

safety

social bias

abusive language

vision-language

adversarial learning

alignment

sexism detection

debiasing

10

presentations

31

number of views

SHORT BIO

Hannah Rose Kirk is a PhD student at the University of Oxford, UK, and visiting academic at NYU’s Center for Data Science. Hannah's research centres on the role of granular and diverse human feedback for aligning large language models. Her body of published work spans computational linguistics, computer vision, ethics and sociology, addressing a broad range of issues such as AI alignment, bias, fairness, and hate speech from a multidisciplinary perspective. Hannah holds degrees from the University of Oxford, the University of Cambridge and Peking University. Alongside academia, she collaborates often with industry projects at Google, OpenAI and MetaAI.

Presentations

Adversarial Nibbler - A novel crowdsourcing procedure for detecting harmful content in t2i models

Jessica Quaye and 14 other authors

XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models

Paul Röttger and 5 other authors

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values | VIDEO

Hannah Rose Kirk and 4 other authors

SemEval-2023 Task 10: Explainable Detection of Online Sexism

Hannah Rose Kirk and 3 other authors

Handling and Presenting Harmful Text in NLP Research

Hannah Rose Kirk and 3 other authors

A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning

Hugo Berg and 5 other authors

Is More Data Better? Re-thinking the Importance of Efficiency in Abusive Language Detection with Transformers-Based Active Learning

Hannah Rose Kirk

Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-Based Hate

Hannah Rose Kirk and 4 other authors

Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset

Hannah Rose Kirk and 9 other authors

Handling and Presenting Harmful Text in NLP Research

Hannah Rose Kirk and 3 other authors

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved