EMNLP 2025

November 07, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Recent breakthroughs in natural language processing (NLP) have come with escalating model sizes and computational costs, posing significant challenges for deployment in real-time and resource-constrained environments. We introduce EmByte, a novel byte-level NLP model that achieves substantial compression while preserving accuracy and enhancing privacy. At the core of EmByte is a new Decompose-and-Compress (DeComp) learning strategy that decomposes subwords into fine-grained byte embeddings and then compresses them via neural projection. This enables EmByte to be shrinked down to any vocabulary size (e.g., 128 or 256), drastically reducing parameter count up to 94% compared to subword-based models without increasing sequence length or degrading performance. Unlike conventional tokenization-based and model-resizing approaches, EmByte is resilient to privacy threats such as gradient inversion attacks, due to its byte-level many-to-one mapping structure. Empirical results on seven NLP tasks demonstrate that EmByte matches or exceeds the accuracy of much larger models while improving efficiency, leading to lightweight and generalized NLP models suitable for deployment in privacy-sensitive and low-resource settings.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

GUARD: Glocal Uncertainty-Aware Robust Decoding for Effective and Efficient Open-Ended Text Generation
poster

GUARD: Glocal Uncertainty-Aware Robust Decoding for Effective and Efficient Open-Ended Text Generation

EMNLP 2025

+6Matthias Aßenmacher
Matthias Aßenmacher and 8 other authors

07 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved