
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.

poster
$\rm SP^3$: Enhancing Structured Pruning via PCA Projection
keywords:
structured pruning
language model
model compression
Structured pruning is a widely used technique for reducing the size of pre-trained language models (PLMs), but current methods often overlook the potential of compressing the hidden dimension $d$ in PLMs, a dimension critical to model size and efficiency. This paper introduces a novel structured pruning approach, Structured Pruning with PCA Projection ($\rm SP^3$), targeting the effective reduction of $d$ by projecting features into a space defined by principal components before masking. Extensive experiments on benchmarks (GLUE and SQuAD) show that \smodel can reduce $d$ by 70\%, compress 94\% of the $\rm BERT_{base}$ model, and maintain over 96\% accuracy and outperform other methods that compress $d$ by 6\% in accuracy at the same compression ratio. $\rm SP^3$ has also proven effective with other models, including OPT and Llama. Our data and code are available at https://github.com/hyx1999/SP3