Morocco

Machine translation (MT) systems universally degrade when faced with code-mixed text. This problem is more acute for low-resource languages that lack dedicated parallel corpora. This work directly addresses this gap for Vietnamese-English, a language context characterized by challenges including orthographic ambiguity and the frequent omission of diacritics in informal text. We introduce VietMix, the first expert-translated, naturally occurring parallel corpus of Vietnamese-English code-mixed text. We establish VietMix&#39;s utility by developing a data augmentation pipeline that leverages iterative fine-tuning and targeted filtering. Experiments show that models augmented with our data outperform strong back-translation baselines by up to +3.5 xCOMET points and improve zero-shot models by up to +11.9 points. Our work delivers a foundational resource for a challenging language pair and provides a validated, transferable framework for building and augmenting corpora in other low-resource settings.

EACL 2026 Main Conference

VietMix: A Naturally-Occurring Parallel Corpus and Augmentation Framework for Vietnamese-English Code-Mixed Machine Translation

Machine translation (MT) systems universally degrade when faced with code-mixed text. This problem is more acute for low-resource languages that lack dedicated parallel corpora. This work directly addresses this gap for Vietnamese-English, a language context characterized by challenges including orthographic ambiguity and the frequent omission of diacritics in informal text. We introduce VietMix, the first expert-translated, naturally occurring parallel corpus of Vietnamese-English code-mixed text. We establish VietMix's utility by developing a data augmentation pipeline that leverages iterative fine-tuning and targeted filtering. Experiments show that models augmented with our data outperform strong back-translation baselines by up to +3.5 xCOMET points and improve zero-shot models by up to +11.9 points. Our work delivers a foundational resource for a challenging language pair and provides a validated, transferable framework for building and augmenting corpora in other low-resource settings.

workshop paper

#### *Message from the General Chair, Aline Villavicencio*
I’m delighted and honoured to welcome you to the 19th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2026), taking place in the beautiful city of Rabat, in Morocco, in March 24-29, 2026. EACL is the flagship European conference of the Association and EACL 2026 proudly continues our field’s tradition of excellence in scholarship, innovation, and inclusivity. I am deeply grateful to the many volunteers whose dedication, generosity, and tireless efforts have made this conference possible.
For the first time EACL is being hosted in the African continent. This is an important milestone for our community, and we are grateful to our Moroccan hosts for enabling this historic moment by bringing this edition of EACL to Rabat. We are also delighted that the Second Arabic NLP School is co-located with EACL. We hope attendees enjoy this wonderful opportunity to strengthen ties with the Computational Linguistics communities across the African continent. *[Read full message](https://drive.google.com/file/d/14NlmHvwM6fPJuMmOvVh7K0vtQbEyv3SZ/view?usp=sharing)*<br><br>

<html><button style="display: inline-flex; align-items: center; justify-content: center; white-space: nowrap; border-radius: 9999px; font-weight: bold; background: #7c3aed; color: white; font-family: 'Space Grotesk', sans-serif; height: 40px; font-size: 16px; padding: 0 20px; border: none; cursor: pointer" onclick="window.open('https://underline.io/events/522/reception','_blank')">Go to Workshops and Tutorials Program</button></html>
<br><br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to EACL 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://2026.eacl.org/registration/) first.

**Online Registration Form**: https://acl.swoogo.com/eacl2026

Registration Required

Welcome to the 19th Conference of the European Chapter of the Association for Computational Linguistics (EACL). EACL 2026 will be held in Rabat, Morocco, from March 24–29, 2026. 

In this paper, we study the capabilities of large language models (LLMs) to adapt a concert moderation to diverse expertise levels of listeners. Our proof-of-concept concert moderator is based on retrieval-augmented generation (RAG) and uses few-shot audience modelling to infer listener's expertise. We study the capabilities of the system to adapt to three different listener's expertise levels. Two open domain LLMs are compared: gpt-oss:20b and llama3. The recognised differences among the models suggest that they vary in how directly they reproduce versus paraphrase retrieved information while maintaining semantic alignment.

From Novice to Expert: Generating Audience-Dependent Concert Moderations with RAG-LLMs

The advancement of Machine learning (ML), Large Audio Language Models (LALMs), and autonomous AI agents in Music Information Retrieval (MIR) necessitates a shift from static tagging to rich, human-aligned representation learning. However, the scarcity of open-source infrastructure capable of capturing the subjective nuances of audio annotation remains a critical bottleneck. This paper introduces LabelBuddy, an open-source collaborative auto-tagging audio annotation tool designed to bridge the gap between human intent and machine understanding. Unlike static tools, it decouples the interface from inference via containerized backends, allowing users to plug in custom models for AI-assisted pre-annotation. We describe the system architecture, which supports multi-user consensus, containerized model isolation, and a roadmap for extending agents and LALMs. Code available at https://github.com/GiannisProkopiou/gsoc2022-Label-buddy.

LabelBuddy: An Open Source Music and Audio Language Annotation Tagging Tool Using AI Assistance

Audio-video question answering (AVQA) systems for music show signs of multimodal "understanding", but it is unclear which inputs they rely on or whether their behavior reflects genuine audio-video reasoning. Existing evaluations focus on overall accuracy and rarely examine modality dependence. We address this gap by suggesting a method of using counterfactual evaluations to analyse the audio-video understanding of the models, illustrated with a case study on the audio-video spatial-temporal (AVST) architecture. This includes interventions that zero out or swap audio, video, or both, where results are benchmarked against a baseline based on linguistic patterns alone. Results show stronger reliance on audio than video, yet performance persists when either modality is removed, indicating learned cross-modal representations. The AVQA system studied thus exhibits non-trivial multimodal integration, though its "understanding" remains uneven.

Stochastic Parrots or True Virtuosos? Digging Deeper Into the Audio-Video Understanding of AVQA Models

Although annotated music descriptor datasets for user queries are increasingly common, few consider the user’s intent behind these descriptors, which is essential for effectively meeting their needs. We introduce MusicRecoIntent, a manually annotated corpus of 2,291 Reddit music requests, labeling musical descriptors across seven categories with positive, negative, or referential preference-bearing roles. We then investigate how reliably large language models (LLMs) can extract these music descriptors, finding that they do capture explicit descriptors but struggle with context-dependent ones. This work can further serve as a benchmark for fine-grained modeling of user intent and for gaining insights into improving LLM-based music understanding systems.

Beyond Musical Descriptors: Extracting Preference-Bearing Intent in Music Queries

Music often shares notable parallels with language, motivating the use of pretrained large language models (LLMs) for symbolic music understanding and generation. Despite growing interest, the practical effectiveness of adapting instruction-tuned LLMs to symbolic music remains insufficiently characterized. We present a controlled comparative study of finetuning strategies for ABC-based generation and understanding, comparing an off-the-shelf instruction-tuned backbone to domain-adapted variants and a music-specialized LLM baseline. Across multiple symbolic music corpora and evaluation signals, we provide some insights into adaptation choices for symbolic music applications. We highlight the domain adaptation vs.~preserving prior information tradeoff as well as the distinct behaviour of metrics used to measure the domain adaptation for symbolic music.

How Far Can Pretrained LLMs Go in Symbolic Music? Controlled Comparisons of Supervised and Preference-based Adaptation

Text-only training is a promising new method for training multimodal machine learning models without data from every modality. However, few studies have explored its use as an approximation of missing data for supervised learning in data-scarce environments. In this work, we examine techniques to acquire text-based training data, address the modality gap, and present a case study on classifying subjective audio timbre descriptions based on three kinds of text-only training data and six augmentation methods on eight audio-timbre datasets. We find text-only training successfully trains supervised audio classifiers without audio that are able to compete with a zero-shot baseline and training on real audio.

Audio Timbre Classification With Text-only Training

A central limitation of current music understanding frameworks is the reliance on audio embeddings, which frequently yields interpretations lacking traceable ties to explicit musical elements such as notes, dynamics, and instrumentation. In this work, we address this gap with MIDI-PHOR, a midi-first framework that converts symbolic data into structured, queryable representations for reasoning. MIDI-PHOR distills each piece into three complementary views: a symbolic view capturing pitch, meter, and key; a time-series view that tracks rhythmic salience, texture, and role activity via symbolic proxies rather than unstable audio rendering; and an instrument-role graph encoding ensemble interactions. We distill these views into evidence-linked captions and evaluate them using a novel claim-verification protocol. Experiments demonstrate that MIDI-PHOR significantly reduces hallucinations compared to raw-midi baselines and offers a robust, auditable bridge between symbolic data and semantic music understanding.

MIDI-PHOR: Multi-View Distillation for Music Understanding and Captioning

This paper evaluates the effectiveness of large language models (LLMs) on the task of context-aware music recommendation, specifically focusing on the alignment of music tracks with a listening intent, in addition to user preferences. We present a preliminary investigation in which five LLMs (variants of LLama, Qwen, and Mistral) are tasked with ranking a candidate set of tracks containing both ground-truth items (associated with specific user-intent pairs) and distractor items (containing user-relevant, intent-relevant, or non-user and non-intent relevant items). Our results show that LLMs rank intent-user-relevant items higher than the distract items, with "Llama-3.1-8B-Instruct" having the best performance (NDCG of 0.32 vs. 0.20). We further investigate whether performance differs when mentioning the listening intent explicitly in the prompt vs. implicitly given solely music preferences. Surprisingly, the LLMs achieved the best performance through an implicit indication of intent, versus explicitly adding it to the prompt, with "Mistral-7B-Instruct-v0.3" performing the best (NDCG of 0.37 vs. 0.29).

Read Between the Tracks: Exploring LLM-driven Intent-based Music Recommendations

Playlist generation based on textual queries using large language models (LLMs) is becoming an important interaction paradigm for music streaming platforms. User queries span a wide spectrum from highly personalized intent to essentially catalog-style requests. Existing systems typically rely on non-personalized retrieval/ranking or apply a fixed level of preference conditioning to every query, which can overfit catalog queries to a single user or under-personalize explicitly listener-dependent requests. We present an industrial-scale LLM-based playlist generation system with dynamic personalization that adapts the personalization strength to the query type. We define a query taxonomy, train a query-type classifier on 5,000 manually labeled queries, and use its predicted probability to modulate the mixture of LLM-based semantic scoring and personalized evaluation. In a blind user study with pairwise comparisons and ELO aggregation, this approach consistently outperforms both non-personalized and fixed-personalization baselines.

Learning When to Personalize: LLM Based Playlist Generation via Query Taxonomy and Classification

The evaluation of music understanding in Large Audio-Language Models (LALMs) requires a rigorously defined benchmark that truly tests whether models can perceive and interpret music, a standard that current data methodologies frequently fail to meet. This paper introduces a meticulously structured approach to music evaluation, proposing a new dataset of 320 hand-written questions curated and validated by experts with musical training, arguing that such focused, manual curation is superior for probing complex audio comprehension. To demonstrate the use of the dataset, we benchmark six state-of-the-art LALMs and additionally test their robustness to uni-modal shortcuts.

Premium content

Next from EACL 2026 Main Conference

From Novice to Expert: Generating Audience-Dependent Concert Moderations with RAG-LLMs

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES