Content not yet available

This lecture has no active video or poster.

AAAI 2026

January 22, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Video Large Language Models (Video LLMs) have achieved significant success by adopting the paradigm of large-scale pre-training followed by supervised fine-tuning (SFT). However, existing approaches struggle with temporal reasoning due to weak temporal correspondence in the data and over-reliance on the next-token prediction paradigm, which collectively result in the absence temporal supervision. To address these limitations, we propose TEMPLE (TEMporal Preference Learning), a systematic framework that enhances temporal reasoning capabilities through Direct Preference Optimization (DPO). To address temporal information scarcity in data, we introduce an automated pipeline for systematically constructing temporality-intensive preference pairs comprising three steps: selecting temporally rich videos, designing video-specific perturbation strategies, and evaluating model responses on clean and perturbed inputs. Complementing this data pipeline, we provide additional supervision signals via preference learning and propose a novel Progressive Pre-SFT Alignment strategy featuring two key innovations: a curriculum learning strategy which progressively increases perturbation difficulty to maximize data efficiency; and applying preference optimization before instruction tuning to incentivize fundamental temporal alignment. Extensive experiments demonstrate that our approach consistently improves Video LLM performance across multiple benchmarks with a relatively small set of self-generated DPO data. Our findings highlight TEMPLE as a scalable and efficient complement to SFT-based methods, paving the way for developing reliable Video LLMs.

Downloads

Paper

Next from AAAI 2026

PEOD: A Pixel-Aligned Event-RGB Benchmark for Object Detection Under Challenging Conditions
poster

PEOD: A Pixel-Aligned Event-RGB Benchmark for Object Detection Under Challenging Conditions

AAAI 2026

+4
Luoping Cui and 6 other authors

22 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved