Morocco

Subword tokenization critically affects Natural Language Processing (NLP) performance, yet its behavior in morphologically rich and low-resource language families remains under-explored. This study systematically compares three subword paradigms—Byte Pair Encoding (BPE), Overlap BPE (OBPE), and Unigram Language Model—across six Uralic languages with varying resource availability and typological diversity. Using part-of-speech (POS) tagging as a controlled downstream task, we show that OBPE consistently achieves stronger morphological alignment and higher tagging accuracy than conventional methods, particularly within the Latin-script group. These gains arise from reduced fragmentation in open-class categories and a better balance across the frequency spectrum. Transfer efficacy further depends on the downstream tagging architecture, interacting with both training volume and genealogical proximity. Taken together, these findings highlight that morphology-sensitive tokenization is not merely a preprocessing choice but a decisive factor in enabling effective cross-lingual transfer for agglutinative, low-resource languages.

EACL 2026 Main Conference

Tokenization and Morphological Fidelity in Uralic NLP: A Cross-Lingual Evaluation

workshop paper

#### *Message from the General Chair, Aline Villavicencio*
I’m delighted and honoured to welcome you to the 19th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2026), taking place in the beautiful city of Rabat, in Morocco, in March 24-29, 2026. EACL is the flagship European conference of the Association and EACL 2026 proudly continues our field’s tradition of excellence in scholarship, innovation, and inclusivity. I am deeply grateful to the many volunteers whose dedication, generosity, and tireless efforts have made this conference possible.
For the first time EACL is being hosted in the African continent. This is an important milestone for our community, and we are grateful to our Moroccan hosts for enabling this historic moment by bringing this edition of EACL to Rabat. We are also delighted that the Second Arabic NLP School is co-located with EACL. We hope attendees enjoy this wonderful opportunity to strengthen ties with the Computational Linguistics communities across the African continent. *[Read full message](https://drive.google.com/file/d/14NlmHvwM6fPJuMmOvVh7K0vtQbEyv3SZ/view?usp=sharing)*<br><br>

<html><button style="display: inline-flex; align-items: center; justify-content: center; white-space: nowrap; border-radius: 9999px; font-weight: bold; background: #7c3aed; color: white; font-family: 'Space Grotesk', sans-serif; height: 40px; font-size: 16px; padding: 0 20px; border: none; cursor: pointer" onclick="window.open('https://underline.io/events/522/reception','_blank')">Go to Workshops and Tutorials Program</button></html>
<br><br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to EACL 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://2026.eacl.org/registration/) first.

**Online Registration Form**: https://acl.swoogo.com/eacl2026

Registration Required

Welcome to the 19th Conference of the European Chapter of the Association for Computational Linguistics (EACL). EACL 2026 will be held in Rabat, Morocco, from March 24–29, 2026. 

Emoji reactions are a frequently used feature of messaging platforms, yet their communicative role remains understudied. Prior work on emojis has focused predominantly on in-text usage, showing that emojis embedded in messages tend to amplify and mirror the author's affective tone. This evidence has often been extended to emoji reactions, treating them as indicators of emotional resonance or user sentiment. However, they may reflect broader social dynamics. Here, we investigate the communicative function of emoji reactions on Telegram. We analyze over 650k crypto-related messages that received at least one reaction, annotating each with sentiment, emotion, persuasion strategy, and speech act labels, and inferring the sentiment and emotion of emoji reactions using both lexicons and LLMs. We uncover a systematic mismatch between message and reaction sentiment, with positive reactions dominating even for neutral or negative content. This pattern persists across rhetorical strategies and emotional tones, indicating that emojis used as reactions do not reliably function as indicators of emotional mirroring or resonance of the content, in contrast to findings reported for in-text emojis. Finally, we identify the features that most predict emoji engagement. Overall, our findings caution against treating emoji reactions as sentiment labels, highlighting the need for more nuanced approaches in sentiment and engagement analysis.

Emoji Reactions on Telegram: Unreliable Indicators of Emotional Resonance

Large language models (LLMs) are now widely used in applications that depend on closed-ended decisions, including automated surveys, policy screening, and decision-support tools. In such contexts, these models are typically expected to produce consistent binary or ternary responses (for example, Yes, No, or Neither) when presented with questions that are semantically equivalent. However recent studies shows that LLM outputs can be influenced by relatively minor changes in prompt wording, raising concerns about the reliability of their decisions under paraphrasing. In this paper, we conduct a systematic analysis of paraphrase robustness across five widely used LLMs. To support this evaluation, we develop a controlled dataset consisting of 200 opinion-based questions drawn from multiple domains, each accompanied by five human-validated paraphrases. All models are evaluated under deterministic inference settings and constrained to a fixed Yes/No/Neither response format. We assess model behavior using a set of complementary metrics that capture the stability of each evaluated model. DeepSeek Reasoner and Gemini 2.0 Flash show the highest stability when responding to paraphrased inputs, whereas Claude 3.7 Sonnet exhibits strong internal consistency but produces judgments that differ more frequently from those of other models. By contrast, GPT-3.5 Turbo and LLaMA 3 70B display greater sensitivity to surface-level variations in prompt phrasing. Overall, these findings suggest that robustness to paraphrasing is driven more by alignment strategies and reasoning design choices than by model size alone.

Measuring LLMs’ Sensitivity to Paraphrased Opinion Prompts

Emotional tone plays a central role in persuasion, yet its impact on computational assessments of political argument quality in real world election campaign speeches remains understudied. In this work, we investigate whether positive emotional framing correlates with higher perceived convincingness in political arguments. We fine-tune language models on argument quality datasets and test their ability to transfer convincingness predictions to real-world campaign speeches. Using a corpus of U.S. presidential campaign speeches, we analyze emotional polarity in relation to predicted persuasive strength to test whether positively framed arguments are judged more convincing than neutral or negative ones. Our empirical analysis shows that political parties rely heavily on argumentation during their election campaigns. Also, we found the evidence that politicians strategically employ emotional cues within their arguments during these campaign speeches, with positive emotions being more strongly associated with persuasive strength, for example in topics such as USMCA’s Effect on American Jobs and Agriculture, Border Control Policies, Progressive Tax Reforms. At the same time, we find that negative emotions have a weaker yet still non-negligible influence on voter persuasion in topics such as City Crime and Civil Unrest and White Supremacist Violence (Charlottesville Incident).

Predicting Convincingness in Political Speech: How Emotional Tone Shapes Persuasive Strength

This study examines the capability of LLMs to predict emotional ratings of Russian words by comparing their assessments with both native speakers' ratings and expert evaluations. The research utilises two datasets: the ENRuN database containing associative emotional ratings of Russian nouns by native speakers, and RusEmoLex, an expert-compiled lexicon. Various open-source LLMs were evaluated, including international models (Llama-3, Qwen 2.5), Russian-developed models, and Russian-adapted variants, representing three parameter scales. The findings reveal distinct patterns in model performance: Russian-adapted models demonstrated superior alignment with native speakers' ratings, whilst model size was not a decisive factor. Conversely, larger models showed better performance in matching expert assessments, with language adaptation having minimal impact. Emotional or sensitive lexis with strong connotations produce a more substantial human-model gap.

Emotional Lexicons: How Large Language Models Predict Emotional Ratings of Russian Words

Data annotation is essential for supervised natural language processing tasks but remains labor-intensive and expensive. Large language models (LLMs) have emerged as promising alternatives, capable of generating high-quality annotations either autonomously or in collaboration with human annotators. However their use in autonomous annotations is often questioned for their ethical take on subjective matters. This study investigates the effectiveness of LLMs in a autonomous, and hybrid annotation setups in propaganda detection. We evaluate GPT and open-source models on two datasets from different domains, namely, Propaganda Techniques Corpus (PTC) for news articles and the Journalist Media Bias on X (JMBX) for social media. Our results show that LLMs, in general, exhibit high recall but lower precision in detecting propaganda, often over-predicting persuasive content. Multi-annotator setups did not outperform the best models in single-annotator setting although it helped reasoning models boost their performance. Hybrid annotation, combining LLMs and human input, achieved the highest overall accuracy than LLM-only settings. We further analyze misclassifications and found that LLM have higher sensitivity towards certain propaganda techniques like loaded language, name calling, and doubt. Finally, using error typology analysis, we explore the reasoning provided on misclassifications by the LLM. Our result shows that although some studies report LLM outperforming manual annotations and it could prove useful in hybrid annotation, its incorporation in the human annotation pipeline must be implemented with caution.

Council of LLMs: Evaluating Capability of Large Language Models to Annotate Propaganda

This paper presents a domain-specific transformer pipeline for quantifying social atmosphere in hostel reviews, an experiential dimension that travelers consistently prioritize but that existing NLP methods and booking platforms fail to capture. We train a cross-encoder on 4,994 manually annotated reviews and use it to pseudo-label 162,840 additional reviews; these labels are then distilled into a sentence-transformer bi-encoder, producing embeddings where proximity reflects social interaction level rather than generic sentiment. On held-out human-labeled data, the domain-adapted embeddings achieve F1 = 0.826, outperforming generic sentence embeddings (0.671) and zero-shot GPT-4o (0.774), with a 40-fold improvement in intra-class versus inter-class similarity. Aggregating predictions to the property level reveals that hostel socialness follows an approximate exponential distribution, confirming that highly social hostels are rare. This work formalizes socialness as a measurable semantic construct and provides a general template for extracting implicit experiential attributes from text at scale.

Quantifying Social Sentiment in Hostels Using A Domain-Specific Transformer Pipeline

Understanding emotion responses relies on reconstructing how individuals appraise events. While prior work has studied emotion trajectories and inherent correlations with appraisals, it has considered appraisals only in a snapshot analysis. However, because appraisal is a complex, sequential process, we argue that it should be analyzed based on how it unfolds throughout a narrative. In this study, we investigate whether trajectories of appraisals are distinctive for different emotions in five-event stories -- narratives where each of five sentences describes an event. We employ zero-shot prompting with a large language model to predict appraisals on sub-sequences of a narrative. We find that this approach is effective in identifying relevant appraisals in narratives, without prior knowledge of the evoked emotion, enabling a comprehensive analysis of appraisal trajectories. Furthermore, we are the first to quantitatively identify typical patterns of appraisal trajectories that distinguish emotions. For example, a rising trajectory for self-responsibility indicates trust, while a falling trajectory suggests anger.

Appraisal Trajectories in Narratives Reveal Distinct Patterns of Emotion Evocation

Given Farsi's speaker base of over 127 million people and the growing availability of digital text, including more than 1.3 million articles on Wikipedia, it is considered a middle-resource language. However, this label quickly crumbles when the situation is examined more closely. We focus on three subjective tasks (Sentiment Analysis, Emotion Analysis, and Toxicity Detection) and identify significant challenges in data availability and quality, despite overall increases in data availability. We review 110 publications on subjective tasks in Farsi and observe a lack of publicly available datasets. Furthermore, existing datasets often lack essential demographic factors, such as age and gender, that are crucial for accurately modeling subjectivity in language. When evaluating prediction models using the few available datasets, the results are highly unstable across both datasets and models. Our findings show that the volume of data alone is insufficient to improve a language's standing in NLP.

Exploring Subjective Tasks in Farsi: A Survey Analysis and Evaluation of Language Model

Digital inclusion increasingly supports adults with intellectual disabilities (ID) to participate online, yet social media posts can be difficult to understand, particularly when they contain strong emotions, slang, or non-standard writing. This paper investigates whether large language models (LLMs) can simplify social media texts to improve cognitive accessibility and preserve emotional meaning. Using an accessibility-oriented prompt based on existing guidance, posts are simplified and emotion preservation is assessed. The results suggest that many simplified posts retain the same emotions, though changes occur, especially when emotions are weakly expressed or ambiguous. Qualitative analysis shows that simplification improves fluency and structure but can also shift perceived emotion through changes to tone, formatting, and other affective cues common in social media text. The research has also revealed that different LLMs produce very different outputs.

Emotion-aware text simplification of user generated content using LLMs

Machine translation (MT) systems perform well on standard benchmarks, yet their ability to preserve emotional meaning in informal user-generated content—particularly for low-resource languages—remains underexplored. We investigate the preservation of emotion intensity in Spanish–Basque tweet translation, focusing on Basque, an under-represented language in MT research. We compile a small, controlled corpus of Spanish reaction tweets and evaluate Basque translations from three publicly available systems through a crowd-based study. While all systems achieve comparable and above mid-range accuracy and fluency, emotion intensity is systematically attenuated in the translations, with greater loss for more emotionally intense inputs. A follow-up on highly emotional tweets shows that LLM prompting reduces emotion loss, yet substantial attenuation remains, highlighting emotion preservation as a persistent challenge in Spanish--Basque MT.

Premium content

Downloads

Next from EACL 2026 Main Conference

Emoji Reactions on Telegram: Unreliable Indicators of Emotional Resonance

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES