
Anca Dragan
Associate Professor @ University of California Berkeley
pragmatics
instruction following
interactive
human-in-the-loop machine learning
learning human values and preferences
imitation learning & inverse reinforcement learning
humans and ai(hai) -> hai: learning human values and preferences
machine learning (ml) -> ml: imitation learning & inverse reinforcement learning
machine learning (ml) -> ml: reinforcement learning
6
presentations
18
number of views
SHORT BIO
I am an Associate Professor in the EECS Department at UC Berkeley, currently on leave to head AI Safety and Alignment at Google DeepMind.
The goal of my research at UC Berkeley has been to enable AI agents (from robots to cars to LLMs to recommender systems) to work with, around, and in support of people. I run the InterACT Lab, where we focus on algorithms for human-AI and human-robot interaction. One of the core problems we have worked on since the lab's inception is AI alignment: getting AI agents to do what people actually want -- this has meant learning reward functions interactively, from diverse human feedback forms, across different modalities, while maintaining uncertainty. We have also contributed to algorithms for human-AI collaboration and coordination, like agents fluently working together with human-driven avatars in games, assistance and adaption in brain-machine interfaces, and autonomous cars sharing the road with human drivers.
At Google DeepMind, I currently lead a collection of teams responsible both for safety of the current Gemini models, as well as preparing for Gemini capabilities to keep advancing and ensuring that safety advances hand-in-hand. This means ensuring Gemini models are and will be aligned with human goals and values, including avoiding present-day harms and catastrophic risks, enabling models to better and more robustly understand human preferences, enabling informed oversight, increasing robustness to adversarial attacks, and accounting for the plurality of human values and viewpoints.
Previously, I helped found and serve on the steering committee for the Berkeley AI Research (BAIR) Lab. I have been (and still am) a co-PI of the Center for Human-Compatible AI. I have consulted for Waymo for the past 6 years, helping with the roadmap for how to deploy an increasingly learning-based safety-critical system. I've been honored by the Sloan Fellowship, MIT TR35, the Okawa award, an NSF CAREER award, and the PECASE award. I take most pride in my former students, who have gone on to faculty positions at MIT, Stanford, CMU, and Princeton, and to industry positions at DeepMind, Waymo, and Meta.
Presentations

My Journey in AI Safety and Alignment
Anca Dragan

Learning Optimal Advantage from Preferences and Mistaking It for Reward
Brad Knox and 6 other authors

The Effect of Modeling Human Rationality Level on Learning Rewards from Multiple Feedback Types
Gaurav Ghosal and 3 other authors

Inferring Rewards from Language in Context
Jessy Lin and 3 other authors

Evaluating the Robustness of Collaborative Agents
Micah Carroll and 6 other authors

Evaluating the Robustness of Collaborative Agents
Paul Knott and 5 other authors