Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
We propose a method for extracting monosemantic neurons, defined as latent dimensions aligned with coherent and interpretable concepts, from user and item embeddings in recommender systems. Our approach uses a Sparse Autoencoder (SAE) to disentangle semantic structure from pretrained representations. Unlike prior work on language models, monosemanticity in recommendation requires preserving interactions between distinct user and item embeddings. To address this, we introduce a prediction-aware training objective that backpropagates through a frozen recommender, aligning latent structure with affinity behavior. The resulting neurons capture actionable properties, such as genre, popularity, and recency, and enable post hoc control operations like targeted filtering or promotion without modifying the base model. Our method generalizes across recommendation models and datasets, offering a practical tool for interpretable and controllable personalization.