Underline digital video library
356 results

BeautifulPrompt: Towards Automatic Prompt Engineering for Text-to-Image Synthesis
Chengyu Wang

DUBLIN: Visual Document Understanding By Language-Image Network
Kumar Tanmay and 2 other authors

Large Language Models and Multimodal Retrieval for Visual Word Sense Disambiguation
Anastasia Kritharoula and 2 other authors

Sound of Story: Multi-modal Storytelling with Audio
Jaeyeon Bae and 6 other authors

Granularity Matters: Pathological Graph-driven Cross-modal Alignment for Brain CT Report Generation
Yanzhao Shi and 4 other authors

Visual Storytelling with Question-Answer Plans
Danyang Liu and 2 other authors

CLAIR: Evaluating Image Captions with Large Language Models
David Chan and 4 other authors

A Suite of Generative Tasks for Multi-Level Multimodal Webpage Understanding
Andrea Burns and 7 other authors

Descriptive Prompt Paraphrasing for Target-Oriented Multimodal Sentiment Classification
Dan Liu and 4 other authors

STAIR: Learning Sparse Text and Image Representation in Grounded Tokens | VIDEO
Chen Chen and 10 other authors

Multimodal Automated Fact-Checking: A Survey
Mubashara Akhtar and 5 other authors

A Multi-Modal Multilingual Benchmark for Document Image Classification
Yoshinari Fujinuma and 5 other authors