profile picture

Anca Dragan

Associate Professor @ University of California Berkeley

pragmatics

instruction following

interactive

human-in-the-loop machine learning

learning human values and preferences

imitation learning & inverse reinforcement learning

humans and ai(hai) -> hai: learning human values and preferences

machine learning (ml) -> ml: imitation learning & inverse reinforcement learning

machine learning (ml) -> ml: reinforcement learning

6

presentations

18

number of views

SHORT BIO

I am an Associate Professor in the EECS Department at UC Berkeley, currently on leave to head AI Safety and Alignment at Google DeepMind.

The goal of my research at UC Berkeley has been to enable AI agents (from robots to cars to LLMs to recommender systems) to work with, around, and in support of people. I run the InterACT Lab, where we focus on algorithms for human-AI and human-robot interaction. One of the core problems we have worked on since the lab's inception is AI alignment: getting AI agents to do what people actually want -- this has meant learning reward functions interactively, from diverse human feedback forms, across different modalities, while maintaining uncertainty. We have also contributed to algorithms for human-AI collaboration and coordination, like agents fluently working together with human-driven avatars in games, assistance and adaption in brain-machine interfaces, and autonomous cars sharing the road with human drivers.

At Google DeepMind, I currently lead a collection of teams responsible both for safety of the current Gemini models, as well as preparing for Gemini capabilities to keep advancing and ensuring that safety advances hand-in-hand. This means ensuring Gemini models are and will be aligned with human goals and values, including avoiding present-day harms and catastrophic risks, enabling models to better and more robustly understand human preferences, enabling informed oversight, increasing robustness to adversarial attacks, and accounting for the plurality of human values and viewpoints.

Previously, I helped found and serve on the steering committee for the Berkeley AI Research (BAIR) Lab. I have been (and still am) a co-PI of the Center for Human-Compatible AI. I have consulted for Waymo for the past 6 years, helping with the roadmap for how to deploy an increasingly learning-based safety-critical system. I've been honored by the Sloan Fellowship, MIT TR35, the Okawa award, an NSF CAREER award, and the PECASE award. I take most pride in my former students, who have gone on to faculty positions at MIT, Stanford, CMU, and Princeton, and to industry positions at DeepMind, Waymo, and Meta.

Presentations

My Journey in AI Safety and Alignment

Anca Dragan

Learning Optimal Advantage from Preferences and Mistaking It for Reward

Brad Knox and 6 other authors

The Effect of Modeling Human Rationality Level on Learning Rewards from Multiple Feedback Types

Gaurav Ghosal and 3 other authors

Inferring Rewards from Language in Context

Jessy Lin and 3 other authors

Evaluating the Robustness of Collaborative Agents

Micah Carroll and 6 other authors

Evaluating the Robustness of Collaborative Agents

Paul Knott and 5 other authors

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved