Lecture image placeholder

Premium content

Access to this content requires a subscription. You must be a premium user to view this content.

Monthly subscription - $9.99Pay per view - $4.99Access through your institutionLogin with Underline account
Need help?
Contact us
Lecture placeholder background
VIDEO DOI: https://doi.org/10.48448/r5am-a372

poster

ACL 2024

August 14, 2024

Bangkok, Thailand

${\mathcal X}$FT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts

keywords:

model merging

instruction tuning

mixture-of-experts

code generation

large language models

We introduce ${\mathcal X}$FT, a simple yet powerful training scheme, by simply merging upcycled Mixture-of-Experts (MoE) to unleash the performance limit of instruction-tuned code Large Language Models (LLMs). While vanilla sparse upcycling fails to improve instruction tuning, ${\mathcal X}$FT introduces a shared expert mechanism with a novel routing weight normalization strategy into sparse upcycling, which significantly boosts instruction tuning. After fine-tuning the upcycled MoE model, ${\mathcal X}$FT introduces a learnable model merging mechanism to compile the upcycled MoE model back to a dense model, achieving upcycled MoE-level performance with only dense-model compute. By applying ${\mathcal X}$FT to a 1.3B model, we create a new state-of-the-art tiny code LLM (<3B) with 67.1 and 64.6 pass@1 on HumanEval and HumanEval+ respectively. With the same data and model architecture, ${\mathcal X}$FT improves supervised fine-tuning (SFT) by 13% on HumanEval+, along with consistent improvements from 2% to 13% on MBPP+, MultiPL-E, and DS-1000, demonstrating its generalizability. ${\mathcal X}$FT is fully orthogonal to existing techniques such as Evol-Instruct and OSS-Instruct, opening a new dimension for improving code instruction tuning. Codes are available at https://github.com/ise-uiuc/xft.

Downloads

SlidesTranscript English (automatic)

Next from ACL 2024

Annotating FrameNet via Structure-Conditioned Language Generation
poster

Annotating FrameNet via Structure-Conditioned Language Generation

ACL 2024

Swabha Swayamdipta
Xinyue Cui and 1 other author

14 August 2024

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Lectures
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2023 Underline - All rights reserved