Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Large language models are increasingly tailored to individual users, responding to differences in preferences, values, goals, and communication styles. In order to assess whether a model can meet the needs of diverse individuals, a foundational requirement is the ability to evaluate whether it can accurately infer and internally represent user-specific traits. We introduce MTPA, a Multitask Personalization Assessment benchmark, designed to test whether models can accurately infer user traits from partial profiles. Compiled from large-scale survey data, MTPA defines a set of trait prediction tasks across beliefs, values, and demographic attributes. MTPA evaluates whether models can accurately infer user traits from partial profiles and quantifies performance across diverse user subgroups. We then evaluate the contribution of trait inference to personalized text generation and empirically validate the relevance of trait inference for downstream alignment. Alongside the benchmark, we release a dataset, toolkit, and baseline evaluations. MTPA is designed with extensibility and sustainability in mind: as the underlying survey datasets are regularly updated, MTPA supports regular integration of new populations and user traits.