EMNLP 2025

November 06, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Given the inherent subjectivity of similarity in text, fully unsupervised text clustering is unlikely to produce groupings that work across a variety of use cases. Traditional techniques to guide clustering rely on costly, time-consuming human feedback and/or pre-existing labels. Leveraging recent advancements in LLMs and decoder-only embedding models, we present techniques to effectively control text embeddings with minimal human input: prefix instructions and LLM preprocessing. We evaluate clustering performance for datasets with multiple independent ground-truth labels, or perspectives, and find that these techniques can be used to improve clustering for one perspective or use case, at the cost of a tradeoff in performance for another use case.

Downloads

Paper

Next from EMNLP 2025

Transparent Reference-free Automated Evaluation of Open-Ended User Survey Responses
poster

Transparent Reference-free Automated Evaluation of Open-Ended User Survey Responses

EMNLP 2025

+3
Yugyeong Ji and 5 other authors

06 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2026 Underline - All rights reserved