Content not yet available

This lecture has no active video or poster.

AAAI 2026

January 23, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Long-Form Video Question Answering (LVQA) poses challenges beyond traditional visual question answering (VQA), which is often limited to static images or short video clips. While current vision-language models (VLMs) perform well in those settings, they struggle with answering complex queries in LVQA over long videos involving multi-step temporal reasoning and causality. Vanilla approaches, which simply sample frames uniformly and feed them to a VLM along with the question, incur significant token overhead, forcing severe downsampling of long videos. As a result, the model often misses fine-grained visual structure, subtle event transitions, or key temporal cues—ultimately leading to incorrect answers. To address these limitations, recent works have explored query-adaptive frame sampling, hierarchical keyframe selection, and agent-based iterative querying. However, these methods remain fundamentally heuristic: they lack explicit temporal representations and cannot enforce or verify logical event relationships (e.g., "before X," "after Y"). As a result, there are no formal guarantees that the sampled context actually encodes the compositional or causal logic demanded by the question. To address these foundational gaps, we introduce NeuS-QA, a training-free, plug-and-play neuro-symbolic pipeline for LVQA. NeuS-QA translates a natural language question into a formal temporal logic expression, constructs a video automaton from frame-level semantic propositions, and applies model checking to rigorously identify video segments that satisfy the question's logical requirements. Only these logic-verified segments are submitted to the VLM, thus improving interpretability, reducing hallucinations, and enabling compositional reasoning without modifying or fine-tuning the model. Experiments on the LongVideoBench and CinePile long-form VQA benchmarks show that NeuS-QA significantly improves performance by over 10%, particularly on questions involving event ordering, causality, and multi-step compositional reasoning.

Downloads

Paper

Next from AAAI 2026

RAG-Enhanced Collaborative LLM Agents for Drug Discovery
poster

RAG-Enhanced Collaborative LLM Agents for Drug Discovery

AAAI 2026

+3
Tommaso Biancalani and 5 other authors

23 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved