S³-MSD: Large Vision-Language Model for Explainable and Generalizable Multi-modal Sarcasm Detection

Content not yet available

This lecture has no active video or poster.

AAAI 2026

January 23, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Multimodal sarcasm detection (MSD) aims to identify sarcasm polarity through diverse modalities (i.e., image-text pairs), which gains increasing attention. While significant advancements have been witnessed, the existing approaches still face two major issues: lack of explainability and weak generalizability. In this paper, we introduce a new large vision-language model (LVLM) dubbed S³-MSD for explainable and generalizable MSD through three key components. For explainability, we develop (1) a self-training paradigm bootstrapping answers with explanations automatically, and (2) a self-calibrating mechanism rectifying flawed explanations. For generalizability, we design (3) a self-focusing module amplifying visual semantic entities through preference optimization, to mitigate textual over-reliance. Experimental results on both in-distribution and out-of-distribution (OOD) benchmarks demonstrate that S³-MSD consistently outperforms state-of-the-art methods in detection performance. Furthermore, the proposed S³-MSD provides persuasive explanations, as validated by quantitative and human evaluations.

Downloads

Paper

Next from AAAI 2026

Beyond World Models: Rethinking Understanding in AI Models
poster

Beyond World Models: Rethinking Understanding in AI Models

AAAI 2026

Danish Pruthi
Tarun Gupta and 1 other author

23 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2026 Underline - All rights reserved