
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.
Monthly subscription - $9.99Pay per view - $4.99Access through your institutionLogin with Underline account
Need help?
Contact us
poster
STEP: Staged Parameter-Efficient Pre-training for Large Language Models
keywords:
efficiency
pre-training
language models
Pre-training large language models faces significant memory challenges due to the large size of model weights. We propose STaged parameter-Efficient Pre-training (STEP), which combines ideas from parameter-efficient tuning and staged training. We conduct experiments on pre-training models of various sizes and demonstrate that STEP can achieve up to a 40.4\% reduction in maximum memory requirement compared to vanilla pre-training while maintaining comparable performance.