EMNLP 2025

November 07, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

The specialized vocabulary and nuanced concepts of the telecommunications industry pose persistent challenges for standard Natural Language Processing (NLP) models. Generic embedding models often struggle to represent telecom-specific semantics, limiting their utility in retrieval and downstream tasks. We present T-VEC (Telecom Vectorization Model), a domain-adapted embedding model fine-tuned from the gte-Qwen2-1.5B-instruct backbone using a triplet loss objective over 100K curated telecom triplets. T-VEC sets a new benchmark in telecom retrieval, achieving CosineSim@1 of 0.8814, Recall@5 of 0.9249, and Top1 Exact Match of 0.9310—significantly outperforming leading general-purpose models like MPNet, BGE, and E5 by 20-30\% relative margin. These gains confirm T-VEC’s superior domain grounding and retrieval precision, with embedding visualizations further showcasing tight clustering of telecom-relevant concepts. We release T-VEC and its tokenizer to support more robust and semantically faithful NLP applications within the telecom domain.

Downloads

Paper

Next from EMNLP 2025

Semantic Agreement Enables Efficient Open-Ended LLM Cascades
poster

Semantic Agreement Enables Efficient Open-Ended LLM Cascades

EMNLP 2025

Duncan Soiffer
Steven Kolawole and 2 other authors

07 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved