Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Recent breakthroughs in natural language processing (NLP) have come with escalating model sizes and computational costs, posing significant challenges for deployment in real-time and resource-constrained environments. We introduce EmByte, a novel byte-level NLP model that achieves substantial compression while preserving accuracy and enhancing privacy. At the core of EmByte is a new Decompose-and-Compress (DeComp) learning strategy that decomposes subwords into fine-grained byte embeddings and then compresses them via neural projection. This enables EmByte to be shrinked down to any vocabulary size (e.g., 128 or 256), drastically reducing parameter count up to 94% compared to subword-based models without increasing sequence length or degrading performance. Unlike conventional tokenization-based and model-resizing approaches, EmByte is resilient to privacy threats such as gradient inversion attacks, due to its byte-level many-to-one mapping structure. Empirical results on seven NLP tasks demonstrate that EmByte matches or exceeds the accuracy of much larger models while improving efficiency, leading to lightweight and generalized NLP models suitable for deployment in privacy-sensitive and low-resource settings.