Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Large language models (LMs) excel across diverse domains, yet require substantial computational resources during inference. Model compression has proved a successful direction towards reducing these costs, exemplified by the performance preservation on general-purpose benchmarks. However, model compression methods can substantially degrade performance for specialized domains, such as law and healthcare. We therefore present JointCal, a novel approach that simultaneously models the importance of weights to both specialized and general capabilities. This leverages a unique layer-wise reconstruction loss formulation that captures the activations from different domains and the interactions between them. Using a battery of experiments across tasks and models, we empirically show that JointCal offers consistent improvements on specialized benchmarks, while preserving overall performance. In contrast to prior work towards domain-adapted compression, our approach does not require any resource-intensive model training procedures, reducing the duration of compression from hours to minutes.
