Lecture image placeholder

Premium content

Access to this content requires a subscription. You must be a premium user to view this content.

Monthly subscription - $9.99Pay per view - $4.99Access through your institutionLogin with Underline account
Need help?
Contact us
Lecture placeholder background
VIDEO DOI: https://doi.org/10.48448/vhga-3c38

poster

ACL 2024

August 14, 2024

Bangkok, Thailand

Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning

keywords:

supervised finetuning

instruction tuning

data selection

large language models

data filtering

Instruction tuning is critical to improve LLMs but usually suffers from low-quality and redundant data. Data filtering for instruction tuning has proved important in improving both the efficiency and performance of the tuning process. But it also leads to extra cost and computation due to the involvement of LLMs in this process. To reduce the filtering cost, we study Superfiltering: Can we use a smaller and weaker model to select data for finetuning a larger and stronger model? Despite the performance gap between weak and strong language models, we find their highly consistent capability to perceive instruction difficulty and data selection results. This enables us to use a much smaller and more efficient model to filter the instruction data used to train a larger language model. Not only does it largely speed up the data filtering, but the filtered-data-finetuned LLM achieves even better performance on standard benchmarks. Extensive experiments validate the efficacy and efficiency of our approach.

Downloads

SlidesTranscript English (automatic)

Next from ACL 2024

DP-MLM: Differentially Private Text Rewriting Using Masked Language Models
poster

DP-MLM: Differentially Private Text Rewriting Using Masked Language Models

ACL 2024

+1Florian MatthesStephen Meisenbacher
Stephen Meisenbacher and 3 other authors

14 August 2024

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Lectures
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2023 Underline - All rights reserved