Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Generative recommendation as a new paradigm is influencing the current development of recommender systems. It aims to assign identifiers that capture richer semantic and collaborative information to items, and subsequently predict item identifiers via autoregressive generation using Large Language Models (LLMs). Existing approaches primarily tokenize item text into codebooks with preserved semantic IDs through RQ-VAE, or separately tokenize different modality features of items. However, existing tokenization methods face two major challenges: $\textbf{(1)}$ Learning decoupled multi-modal features limits the quality of the semantic representation. $\textbf{(2)}$ Ignoring collaborative signals from interaction history limits the comprehensiveness of identifiers. To address these limitations, we propose a $\underline{\textbf{mu}}$lti-modal $\underline{\textbf{s}}$emantic-enhanced $\underline{\textbf{i}}$dentifier with $\underline{\textbf{c}}$ollaborative signals for generative $\underline{\textbf{rec}}$ommendation, named MusicRec. In MusicRec, we propose a tokenization approach based on shared-specific modal fusion, enabling the generated identifiers to preserve semantic information more comprehensively from all modalities. In addition, we incorporate collaborative signals from user interactions to guide identifier generation, preserving collaborative patterns in the semantic representation space. Extensive experiments on three public datasets demonstrate that MusicRec achieves state-of-the-art performance compared to existing baseline methods.