AAAI 2026

January 24, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Visual Autoregressive (VAR) models adopt a next-scale prediction paradigm, offering high-quality content generation with substantially fewer decoding steps. However, existing VAR models suffer from significant attention complexity and severe memory overhead due to the accumulation of key-value (KV) caches across scales. In this paper, we tackle this challenge by introducing KV cache compression into the next-scale generation paradigm. We begin with a crucial observation: attention heads in VAR models can be divided into two functionally distinct categories: Contextual Heads focus on maintaining semantic consistency, while Structural Heads are responsible for preserving spatial coherence. This structural divergence causes existing one-size-fits-all compression methods to perform poorly on VAR models. To address this, we propose HACK, a training-free Head-Aware KV cache Compression frameworK. HACK utilizes an offline classification scheme to separate head types, enabling it to apply pattern-specific compression strategies with asymmetric cache budgets for each category. By doing so, HACK effectively constrains the average KV cache length within a fixed budget $B$, reducing the theoretical attention complexity from $\mathcal{O}(n^4)$ to $\mathcal{O}(Bn^2)$. Extensive experiments on six VAR models across text-to-image, class-conditional, and 3D generation tasks validate the effectiveness and generalizability of HACK. It achieves up to 70\% KV cache compression without degrading output quality, resulting in both memory savings and faster inference. For example, HACK delivers a $1.75\times$ memory reduction and a $1.57\times$ speedup on Infinity-8B.

Downloads

Paper

Next from AAAI 2026

Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latent Space
poster

Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latent Space

AAAI 2026

+6
Jiaxin Deng and 8 other authors

24 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved