VIDEO DOI: https://doi.org/10.48448/wfvb-sa55

keynote

AAAI 2025

March 02, 2025

Philadelphia, United States

Propositional Interpretability in Humans and AI Systems

Mechanistic interpretability is one of the most exciting and important research programs in current AI. My aim is to build some philosophical foundations for the program, along with setting out some concrete challenges and assessing progress to date. I will argue for the importance of propositional interpretability, which involves interpreting a system’s mechanisms and behavior in terms of propositional attitudes: attitudes (such as belief, desire, or subjective probability) to propositions (e.g. the proposition that it is hot outside). Propositional attitudes are the central way that we interpret and explain human beings and they are likely to be central in AI too. A central challenge is what I call thought logging: creating systems log all of the relevant propositional attitudes in an AI system over time. I will examine currently popular methods of interpretability (such as probing, sparse auto-encoders, and chain of thought methods) as well as philosophical methods of interpretation (including psychosemantics and representation theorems) to assess their strengths and weaknesses as methods of propositional interpretability.

Downloads

Transcript English (automatic)

Next from AAAI 2025

Predicting Career Transitions and Estimating Wage Disparities Using Foundation Models
keynote

Predicting Career Transitions and Estimating Wage Disparities Using Foundation Models

AAAI 2025

Susan Athey
Susan Athey

27 February 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Lectures
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved