United States

Mechanistic interpretability is one of the most exciting and important research programs in current AI. My aim is to build some philosophical foundations for the program, along with setting out some concrete challenges and assessing progress to date. I will argue for the importance of propositional interpretability, which involves interpreting a system’s mechanisms and behavior in terms of propositional attitudes: attitudes (such as belief, desire, or subjective probability) to propositions (e.g. the proposition that it is hot outside). Propositional attitudes are the central way that we interpret and explain human beings and they are likely to be central in AI too. A central challenge is what I call thought logging: creating systems log all of the relevant propositional attitudes in an AI system over time. I will examine currently popular methods of interpretability (such as probing, sparse auto-encoders, and chain of thought methods) as well as philosophical methods of interpretation (including psychosemantics and representation theorems) to assess their strengths and weaknesses as methods of propositional interpretability.

AAAI 2025

Propositional Interpretability in Humans and AI Systems

keynote

We are pleased to announce the Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25), which will be held in Philadelphia, Pennsylvania at the Pennsylvania Convention Center from February 25 to March 4, 2025.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.

### [Invited Speakers](https://aaai.org/conference/aaai/aaai-25/aaai-25-invited-speakers/)

Register [here](https://aaai.org/conference/aaai/aaai-25/registration/)

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.



This talk focuses on the use of foundation models to study problems in labor economics, in particular problems relating to the progression of worker careers and wages. In Vafa et al. (2024a), we introduced a transformer-based predictive model, CAREER, that predicts a worker’s next job as a function of career history (an “occupation model”). CAREER was initially estimated (“pre-trained”) using a large, unrepresentative resume dataset, which served as a “foundation model,” and parameter estimation was continued (“fine-tuned”) using data from a representative survey. CAREER had better predictive performance than benchmarks. Athey et al (2024) considers an alternative where the resume-based foundation model is replaced by a large language model (LLM). We convert tabular data from the survey into text files that resemble resumes and fine-tune the LLMs using these text files with the objective to predict the next token (word). The resulting fine-tuned LLM is used as an input to an occupation model. Its predictive performance surpasses all prior models. We demonstrate the value of fine-tuning and further show that by adding more career data from a different population, fine-tuning smaller LLMs surpasses the performance of fine-tuning larger models. In Vafa et al. (2024b), we apply and adapt this framework to the problem of gender wage decompositions, which require estimating the portion of the gender wage gap explained by career histories of workers. Classical methods for decomposing the wage gap employ simple linear models, and the resulting decompositions thus suffer from omitted variable bias (OVB), where covariates that are correlated with both gender and wages are not included in the model. We explore an alternative methodology for wage gap decomposition that employs CAREER as a foundation model. We prove that the way foundation models are usually trained might still lead to OVB, but develop fine-tuning algorithms that empirically mitigate this issue. We first provide a novel set of conditions under which an estimator of the wage gap based on a fine-tuned foundation model is root-n-consistent. Building on the theory, we then propose methods for fine-tuning foundation models that minimize OVB. Using data from the Panel Study of Income Dynamics, we find that history explains more of the gender wage gap than standard econometric models can measure, and we identify elements of history that are important for reducing OVB.

Predicting Career Transitions and Estimating Wage Disparities Using Foundation Models

In this keynote, Andrew will explore how current technologies, particularly agentic workflow, are revolutionizing the development of AI products. He will also delve into the role of AI in prototyping, showcasing how AI lowers the cost of software development and accelerates the prototyping process.

AI, Agents and Applications

Spatial reasoning is a core component of an agent’s ability to operate in or reason about the physical world. LLMs are widely promoted as having abilities to reason about a wide variety of domains, including commonsense. In this talk I will discuss the ability of state-of-the-art LLMs to perform commonsense reasoning, particularly with regard to spatial information. Across a wide range of LLMs, although they show abilities rather better than chance, they still struggle with many questions and tasks, for example when reasoning about directions, or topological relations.

Propositional Interpretability in Humans and AI Systems

Downloads

Next from AAAI 2025

Predicting Career Transitions and Estimating Wage Disparities Using Foundation Models

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Propositional Interpretability in Humans and AI Systems

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2025

Predicting Career Transitions and Estimating Wage Disparities Using Foundation Models

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads