Austria

Neural transducers (NT) provide an effective framework for speech streaming, demonstrating strong performance in automatic speech recognition (ASR). However, the application of NT to speech translation (ST) remains challenging, as existing approaches struggle with word reordering and performance degradation when jointly modeling ASR and ST, resulting in a gap with attention-based encoder-decoder (AED) models. Existing NT-based ST approaches also suffer from high computational training costs. To address these issues, we propose HENT-SRT (Hierarchical Efficient Neural Transducer for Speech Recognition and Translation), a novel framework that factorizes ASR and translation tasks to better handle reordering. To ensure robust ST while preserving ASR performance, we use self-distillation with CTC consistency regularization. Moreover, we improve computational efficiency by incorporating best practices from ASR transducers, including a down-sampled hierarchical encoder, a stateless predictor, and a pruned transducer loss to reduce training complexity. Finally, we introduce a blank penalty during decoding, reducing deletions and improving translation quality. Our approach is evaluated on three conversational datasets Arabic, Spanish, and Mandarin achieving new state-of-the-art performance among NT models and substantially narrowing the gap with AED-based systems.

ACL 2025

HENT-SRT: Hierarchical Efficient Neural Transducer with Self-Distillation for Joint Speech Recognition and Translation

workshop paper

### Welcome to The 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025)

Message from the General Chair: 
*It is my great pleasure and honor to welcome you to the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), held in beautiful Vienna, Austria, from July 27 to August 1, 2025. ACL2025continues our field’s tradition of excellence in scholarship, innovation, and inclusivity, and I am deeply grateful to the many volunteers who have worked tirelessly to bring this event to life.* 
[Read more](https://drive.google.com/file/d/1GI_hvOpjswAuYdUTromfeDiPpCcqidwg/view?usp=sharing)

To access this event page, you need to log in with the **email address you registered with**. Access credentials will be sent to your email from Underline - subject line "Welcome to ACL 2025". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you need to log in with the **email address you registered with**. 

Welcome to The 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025)

With the rapid advancement of global digitaliza-
tion, users from different countries increasingly
rely on social media for information exchange.
In this context, multilingual multi-label emo-
tion detection has emerged as a critical research
area. This study addresses SemEval-2025 Task
11: Bridging the Gap in Text-Based Emotion
Detection. Our paper focuses on two sub-tracks
of this task: (1) Track A: Multi-label emo-
tion detection, and (2) Track B: Emotion in-
tensity. To tackle multilingual challenges, we
leverage pre-trained multilingual models and
focus on two architectures: (1) a fine-tuned
BERT-based classification model and (2) an
instruction-tuned generative LLM. Addition-
ally, we propose two methods for handling
multi-label classification: the base method,
which maps an input directly to all its cor-
responding emotion labels, and the pairwise
method, which models the relationship between
the input text and each emotion category indi-
vidually. Experimental results demonstrate the
strong generalization ability of our approach in
multilingual emotion recognition. In Track A,
our method achieved Top 4 performance across
10 languages, ranking 1st in Hindi. In Track B,
our approach also secured Top 5 performance
in 7 languages, highlighting its simplicity and
effectiveness1
.

JNLP at SemEval-2025 Task 11: Cross-Lingual Multi-Label Emotion Detection Using Generative Models

This paper presents a multi-step zero-shot system for SemEval-2025 Task 1 on Advancing Multimodal Idiomaticity Representation (AdMIRe). The system employs two state-of-the-art multimodal language models, Claude Sonnet 3.5 and OpenAI GPT-4o, to determine idiomaticity and rank images for relevance in both subtasks. A hybrid approach combining o1-preview for idiomaticity classification and GPT-4o for visual ranking produced the best overall results. The system demonstrates competitive performance on the English extended dataset for Subtask A, but faces challenges in cross-lingual transfer to Portuguese. Comparing Image+Text and Text-Only approaches reveals interesting trends and raises questions about the role of visual information in multimodal idiomaticity detection.

daalft at SemEval-2025 Task 1: Multi-step Zero-shot Multimodal Idiomaticity Ranking

We developed an ensemble learning system that integrates multiple transformer-
based models, including RoBERTa-large and DeBERTa-v3-large. These models were trained using diverse data augmentation strategies, both lightweight and intensive, to increase generalization and robustness. The ensemble approach allowed us to
combine the strengths of different models through weighted soft voting.

Anastasia at SemEval-2025 Task 9: Subtask 1, Ensemble Learning with Data Augmentation and Focal Loss for Food Risk Classification.

In this paper we present our participation in Subtask 2 of SemEval-2025 Task 10, focusing on the identification and classification of narratives in news of multiple languages, on climate change and the Ukraine-Russia war. To address this task, we employed a Zero-Shot approach using a generative Large Language Model (LLM) without prior training on the dataset. Our classification strategy is based on two steps: first, the system classifies the topic of each news item; subsequently, it identifies the sub-narratives directly at the finer granularity. We present a detailed analysis of the performance of our system compared to the best ranked systems on the leaderboard, highlighting the strengths and limitations of our approach.

UNEDTeam at SemEval-2025 Task 10: Zero-Shot Narrative Classification

This paper presents our system designed for Subtask 1 of SemEval-2025 Task 10, which focuses on multilingual entity framing in news articles. Given the complexity of the task—multi-label, multi-class classification across five languages—we propose an approach based on large language models (LLMs). Our method combines multilingual text translation, data augmentation, multi-model fine-tuning, and ensemble classification. First, we translate all texts into English to unify the datasets and apply synonym-based augmentation to address class imbalances. We then fine-tune multiple LLMs on the augmented data. Finally, a state-of-the-art LLM aggregates the individual model predictions for ensemble classification, yielding robust and accurate results. Our system achieved top positions in three languages (English, Portuguese, and Russian) and second place in Bulgarian.

DUTIR at SemEval-2025 Task 10: A Large Language Model-based Approach for Entity Framing in Online News

We present the Mu-SHROOM shared task which is focused on detecting hallucinations and other overgeneration mistakes in the output of instruction-tuned large language models (LLMs). Mu-SHROOM addresses general-purpose LLMs in 14 languages, and frames the hallucination detection problem as a span-labeling task.

iai_MSU at SemEval-2025 Task-3: Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes in English

This paper presents a zero-shot system for fact-checked claim retrieval. 
We employed several state-of-the-art large language models to obtain text embeddings. The models were then combined to obtain the best possible result.
Our approach achieved 7th place in monolingual and 9th in cross-lingual subtasks.
We used only English translations as an input to the text embedding models since multilingual models did not achieve satisfactory results. We identified the most relevant claims for each post by leveraging the embeddings and measuring cosine similarity. Overall, the best results were obtained by the NVIDIA NV-Embed-v2 model. For some languages, we benefited from model combinations (NV-Embed & GPT or Mistral).

UWBa at SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval

We present a similarity-based method for explainable classification in the context of the SemEval 2025 Task 9: The Food Hazard Detection Challenge. Our proposed system is essentially unsupervised, leveraging the semantic properties of the labels. This approach brings some key advantages over typical classification systems. First, similarity metrics offer a more intuitive interpretation. Next, this technique allows for inference on novel labels. Finally, there is a non-negligible amount of ambiguous labels, so learning a direct mapping does not lead to meaningful representations. Our method is generic and can be applied to any classification task.

UniBuc at SemEval-2025 Task 9: Similarity Approaches to Classification

This paper presents a framework for perceived emotion intensity prediction, focusing on SemEval-2025 Task 11 Track B. The task involves predicting the intensity of five perceived emotions—anger, fear, joy, sadness, and surprise—on an ordinal scale from 0 (no emotion) to 3 (high emotion). Our approach builds upon our method introduced in the WASSA workshop and enhances it by integrating ModernBERT in place of the traditional BERT model within a boosting-based ensemble framework. To address the difficulty in capturing fine-grained emotional distinctions, we incorporate class-preserving mixup data augmentation, a custom Pearson CombinLoss function, and fine-tuned transformer models, including ModernBERT, RoBERTa, and DeBERTa. Compared to individual fine-tuned transformer models (BERT, RoBERTa, DeBERTa, and ModernBERT) without augmentation or ensemble learning, our approach demonstrates significant improvements. The proposed system achieves an average Pearson correlation coefficient of 0.768 on the test set, outperforming the best individual baseline model. In particular, the model performs best for sadness (r = 0.808) and surprise (r = 0.770), highlighting its ability to capture subtle intensity variations in the text. Despite these improvements, challenges such as data imbalance, performance on low resource emotions (e.g., anger and fear), and the need for refined data augmentation techniques remain open for future research.

tinaal at SemEval-2025 Task 11: Enhancing Perceived Emotion Intensity Prediction with Boosting Fine-Tuned Transformers

The SemEval-2025 Task 11, Bridging the Gap in Text-Based Emotion Detection, introduces an emotion recognition challenge span- ning over 28 languages. This competition encourages researchers to explore more advanced approaches to address the challenges posed by the diversity of emotional expressions and background variations. It features two tracks: multi-label classification (Track A) and emotion intensity prediction (Track B), covering six emotion categories: anger, fear, joy, sadness, surprise, and disgust. In our work, we systematically explore the benefits of two contrastive learning approaches: sample-based (Contrastive Reasoning Calibration) and generation-based (DPO, SimPO) contrastive learning. The sample-based contrastive approach trains the model by comparing two sam- ples to generate more reliable predictions. The generation-based contrastive approach trains the model to differentiate between correct and incorrect generations, refining its prediction. All models are fine-tuned from LLaMa3- Instruct-8B. Our system achieves 9th place in Track A and 6th place in Track B for English, while ranking among the top-tier performing systems for other languages.

Premium content

Downloads

Next from ACL 2025

JNLP at SemEval-2025 Task 11: Cross-Lingual Multi-Label Emotion Detection Using Generative Models

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES