Content not yet available
This lecture has no active video or poster.
Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Despite the remarkable progress of Auto-Regressive (AR) image generation, its inference latency remains high due to the AR nature and the ambiguity of image tokens—even when employing Speculative Decoding (SD). Recent works have empirically addressed this issue using relaxed SD, but without theoretical grounding. In this paper, we establish the theoretical foundations of relaxed SD and propose Annealed Relaxation of Speculative Decoding (AnnealRSD), grounded in two key insights. First, by analyzing the total variation (TV) distance between the target model and relaxed SD, we derive the optimal resampling distribution that minimizes an upper bound of the TV distance. Second, perturbation analysis reveals an inherent annealing property of relaxed SD, motivating our annealed design. Together, these components enable AnnealRSD to achieve faster generation with comparable quality, or improved quality at the same latency, compared to existing methods. Extensive experiments on image generation validate the effectiveness of AnnealRSD, showing consistent improvements over prior approaches in speed and quality trade-offs.