EMNLP 2025

November 06, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Vision Language Models (VLMs) face significant computational challenges with long-form videos due to attention mechanisms' quadratic complexity. We propose Language-Guided Temporal Token Pruning (LGTTP), which extracts temporal cues from queries to adaptively assign pruning rates across video frames, preserving contextual continuity while reducing computational overhead. Unlike existing methods that apply uniform token pruning or disruptive keyframe selection, LGTTP maintains higher token density in temporally relevant segments. Our model-agnostic approach integrates with TimeChat and LLaVA-Video architectures, demonstrating substantial efficiency gains across diverse benchmarks. Experiments show LGTTP reduces computation by 65% while preserving 97-99% of original performance. On highlight detection, LGTTP achieves +9.5% HIT@1 on QVHighlights compared to baseline methods, and for temporal grounding, it maintains 99.6% of original R@1 on Charades-STA. LGTTP shows particular strength for queries with explicit temporal markers while remaining effective for general video understanding tasks.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

A Fully Probabilistic Perspective on Large Language Model Unlearning: Evaluation and Optimization
poster

A Fully Probabilistic Perspective on Large Language Model Unlearning: Evaluation and Optimization

EMNLP 2025

Anda Cheng and 2 other authors

06 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2026 Underline - All rights reserved