Content not yet available
This lecture has no active video or poster.
Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
As an essential component of fine-tuning, warm-up plays a crucial role in promoting stability and generalization. Many studies have examined its underlying mechanisms from different aspects. However, most of the studies focus on incorporating these insights into optimizers to reduce the reliance on warm-up. Little attention has been paid to addressing the inherent limitations of the warm-up itself, which restricts its effectiveness. In this work, we revisit warm-up from a loss landscape perspective and identify several limitations with existing warm-up, including: (1) susceptibility to nearby suboptimal traps, (2) sensitivity to hyperparameters and random seeds, and (3) inefficiency during the early stages of training. To overcome these limitations, we propose Sensitivity-Aware Warm-Up (SAWU), a lightweight and adaptive strategy that dynamically leverages learning sensitivity during warm-up to guide updates toward better and more stable basins. In addition, SAWU also introduces an adaptive scheduling mechanism and phase transition strategy across warm-up, stable, and decay phases to further enhance robustness and efficiency. Extensive experiments on various downstream tasks show that SAWU significantly outperforms the vanilla method (e.g., average 3.43\% improvement on RoBerta). Moreover, SAWU can be easily combined with various optimizers and remains effective even when warm-up-based methods fail (e.g, it lifts RAdam from 49.46\% to 91.78\% on qnli. Thanks to its lightweight nature, SAWU introduces minimal overhead and even reduces training time by over 5\% compared to other methods.