
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Modeling user interest based on lifelong user post-click behavior sequences is crucial for enhancing Click-Through Rate (CTR) prediction. However, long post-click behavior sequences themselves pose severe performance issues: the sheer volume of data leads to high computational costs and inefficiencies in model training and inference. Traditional methods address this by introducing two-stage approaches (e.g., first searching for relevant subsequences and then applying attention), but this compromises model effectiveness due to incomplete utilization of the full sequence context. More importantly, integrating multimodal embeddings into existing Large Recommendation Models (LRMs) presents significant challenges—these embeddings often exacerbate computational burdens, mismatch with LRM architectures, and fail to capture the nuances of lifelong post-click behaviors. Thus, how to efficiently model the application of multimodal large language models (LLMs) in LRMs to leverage lifelong user post-click behaviors has become an urgent problem. To address this issue and enhance the model's efficiency and accuracy, we introduce Deep Multimodal Group Interest Network (DMGIN). Given the observation that user post-click behavior sequences contain a large number of repeated items with varying behaviors and timestamps, DMGIN employs multimodal LLMs for grouping to organize complete lifelong post-click behavior sequences more effectively, with almost no additional computational overhead, as opposed to directly introducing multimodal embeddings. This method meaningfully organizes a user's lifelong post-click behavior into groups based on specific interest categories, significantly reducing the sequence length from tens of thousands to hundreds. To mitigate the potential information loss from grouping, we have implemented two key strategies. First, we analyze behaviors within each group using both interest statistics and intra-group transformers to capture group traits, then apply inter-group transformers to temporally ordered groups to capture the evolution of user group interests. Second, we refine the user's decision-making process by employing an attention mechanism to identify candidate-specific interests, focusing on behavior subsequences that share the same group as the candidate shop. Our extensive experiments on both industrial and public datasets confirm the effectiveness and efficiency of DMGIN. The A/B test in our LBS advertising system shows that DMGIN improves CTR by 4.5% and Revenue per Mile by 2.0%.