Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
While adapting pretrained vision models to downstream dense prediction tasks is widely used, current methods often overlook adaptation efficiency, especially in the context of multi-task learning (MTL). Although parameter-efficient fine-tuning (PEFT) methods can enhance parameter efficiency, broader aspects such as GPU memory and training time efficiency remain underexplored. In this paper, we propose a new paradigm that simultaneously achieves efficiency in Parameters, GPU Memory, and Training Time for Multi-Task Dense Vision Adaptation. Specifically, we propose a dual-branch framework, in which a frozen pretrained backbone serves as the generic main branch, and the proposed Bi-Directional Task Adaptation (BDTA) modules are integrated in parallel to form a task bypass branch that extracts adaptation features required by multiple specific tasks. This adaptation module is lightweight, efficient, and does not require backpropagation through the large pre-trained backbone, thus avoiding resource-intensive gradient computations. Moreover, a Mixture of Task Experts mechanism (MoTE) is further proposed to integrate adaptation features across tasks and scales, thereby obtaining more robust representations tailored for dense prediction tasks. On the PASCAL-Context benchmark, our method achieves over 2× relative performance improvement compared to the best prior multi-task PEFT method, while using only $\sim$30% of the parameters, $\sim$50% of the memory, and $\sim$60% of the training time, demonstrating superior overall adaptation efficiency.