
Xudong Lin
pre-training
continual learning
event extraction
script
events
multimodal
prompting
synthetic data
model evaluation
video understanding
multimedia
hierarchy
large language model
cv: language and vision
cv: multi-modal vision
10
presentations
7
number of views
SHORT BIO
Xudong Lin is a fifth-year Ph.D. candidate at the Department of Computer Science, Columbia University. His research interests broadly lie in building AI assistants for video+x'' tasks and representation learning. His research on video-based text generation has been featured in media like VentureBeat. He has more than 20 publications about multimedia content understanding and representation learning in leading CV and NLP venues.
Presentations

VIEWS: Entity-Aware News Video Captioning
Hammad Abdullah Ayyubi and 11 other authors

Training-free Deep Concept Injection Enables Language Models for Video Question Answering
Xudong Lin and 4 other authors

Personalized Video Comment Generation
Xudong Lin and 5 other authors

Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses
Hung-Ting Su and 7 other authors

Non-Sequential Graph Script Induction via Multimedia Grounding
Yu Zhou and 6 other authors

Video Event Extraction via Tracking Visual States of Arguments
Guang Yang and 5 other authors

Video-Text Pre-training with Learned Regions for Retrieval
Rui Yan and 6 other authors

MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding
Revant Gangi Reddy and 11 other authors

Joint Multimedia Event Extraction from Video and Article
Brian Chen and 7 other authors

Beyond Grounding: Extracting Fine-Grained Event Hierarchies across Modalities
Hammad Abdullah Ayyubi and 10 other authors