Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
With the booming development of multimodal data (e.g., image, text) on internet platforms, multimodal sequential recommendation methods continue to emerge. Most existing methods incorporate item modal features as auxiliary information, typically concatenating them to learn unified user representations. However, these methods directly use modal features for representation learning, neglecting the impact of inherent modality noise. We argue that internal-modality noise and cross-modality noise hinder the acquisition of more accurate user representations. To address this problem, we propose SGP4SR - Seperated-modality Guided user Perference learning for multimodal Sequential Reconmmendation. Globally, the user preference modeling is carried out from a separated-modality perspective to alleviate cross-modality noise. Locally, for each individual modality, we use item relationship graphs and user interest centers, aggregated with ID embeddings, to replace direct modal features, thereby mitigating internal-modality noise. Finally, user representations from both separated-modality and multimodal perspectives participate in prediction independently. In experiments conducted on four real-world datasets, our method outperforms state-of-the-art approaches, achieving an average performance improvement of up to 8.84\% over the best baseline. The comprehensive experiments further validate the superior noise tolerance and robustness of our method. The source code will be available in the supplementary materials.