AAAI 2026

January 25, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Cross-lingual, cross-task transfer is challenged by task-specific data scarcity which becomes more severe as language support grows. Both challenges are amplified within vision-language models (VLMs). We investigate multilingual generalization in encoder-decoder transformer VLMs to enable zero-shot image captioning in a language that was only paired with machine translations during training. In this setting, the encoder must learn to generate generalizable, latent task-aware vision representations to instruct the decoder via inserted cross-attention layers. We study scaling laws by training models based on Florence-2 and Gemma-2 that range from 0.4B to 11.2B parameters. The training is performed on a synthetic dataset using varying compute budgets. While all languages in the dataset have image-aligned translations, only a subset of them include image captions. Notably, we show that captioning can emerge in a language after training on only translation data. We find that this indirect learning of unseen task-language pairs adheres to scaling laws that are governed by the multilinguality of the model, its model size and seen training samples. Finally, we demonstrate that our observed scaling laws extend to a variety of downstream tasks, achieving competitive performance through finetuning in multimodal machine translation (Multi30K, CoMMuTE), lexical disambiguation (CoMMuTE), and image captioning (Multi30K, XM3600, COCO Karpathy).

Downloads

PaperTranscript English (automatic)

Next from AAAI 2026

HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval
poster

HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval

AAAI 2026

+4Yupeng Hu
Zhiwei Chen and 6 other authors

25 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved