IJCNLP-AACL 2025

December 20, 2025

Mumbai, India

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

keywords:

vision question answering

video processing

multimodality

The proliferation of educational videos on the Internet has changed the educational landscape by enabling students to learn complex concepts at their own pace. Our work outlines the vision of an automated tutor – a multimodal question answering (QA) system to answer questions from students watching a video. This can make doubt resolution faster and further improve learning experience. In this work, we take first steps towards building such a QA system. We curate and release a dataset named EduVidQA, with 3,158 videos and 18,474 QA-pairs. However, building and evaluating an educational QA system is challenging because (1) existing evaluation metrics do not correlate with human judgments, and (2) a student question could be answered in many different ways, training on a single gold answer could confuse the model and make it worse. We conclude with important research questions to develop this research area further.

Downloads

SlidesTranscript English (automatic)

Next from IJCNLP-AACL 2025

Mark My Words: A Robust Multilingual Model for Punctuation in Text and Speech Transcripts
poster

Mark My Words: A Robust Multilingual Model for Punctuation in Text and Speech Transcripts

IJCNLP-AACL 2025

Raj Dabre
Raj Dabre and 2 other authors

20 December 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved