AAAI 2026

January 23, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Jigsaw puzzle solving remains difficult because models must reconcile local fragment cues with global structure. Most prior work leans solely on visual signals (edge or texture coherence) and rarely exploits natural-language descriptions, which are especially helpful for puzzles with eroded gaps. We introduce a vision–language framework that uses textual context to guide assembly. At its core, the Vision–Language Hierarchical Semantic Alignment (VLHSA) module aligns image patches with text via multi-level matching—from local tokens to global summaries—within a multimodal design that couples dual visual encoders with language features for cross-modal reasoning. Across multiple datasets, the method surpasses the state of the art, including a 14.2 percentage point gain in piece accuracy; ablations identify VLHSA as the principal source of improvement. These results suggest a practical shift for jigsaw solving: augmenting vision with language to resolve ambiguous placements

Downloads

SlidesPaperTranscript English (automatic)

Next from AAAI 2026

The Tatort Test of Intelligence: Towards Narrative Comprehension as a Benchmark for AI
technical paper

The Tatort Test of Intelligence: Towards Narrative Comprehension as a Benchmark for AI

AAAI 2026

Lennart Baur and 2 other authors

23 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved