China

Large vision-language models (LVLMs) have achieved remarkable performance on multimodal tasks such as visual question answering (VQA) and image captioning. However, they still suffer from hallucinations, generating text inconsistent with visual input, posing significant risks in real-world applications. Existing approaches to address this issue focus on incorporating external knowledge bases, alignment training, or decoding strategies, all of which require substantial computational cost and time. Recent works try to explore more efficient alternatives by adjusting LVLMs&#39; internal representations. Although promising, these methods may cause hallucinations to be insufficiently suppressed or lead to excessive interventions that negatively affect normal semantics. In this work, we leverage sparse autoencoders (SAEs) to identify semantic directions closely associated with either hallucinations or actuality, realizing more precise and direct hallucination-related representations. Our analysis demonstrates that interventions along the faithful direction we identified can mitigate hallucinations, while those along the hallucinatory direction can exacerbate them. Building on these insights, we propose **S**teering LVLMs via **S**AE **L**atent Directions (SSL), a training-free method based on SAE-derived latent directions to mitigate hallucinations in LVLMs. Extensive experiments demonstrate that SSL significantly outperforms existing decoding approaches in mitigating hallucinations, while maintaining transferability across different model architectures with negligible additional time overhead.

EMNLP 2025

Steering LVLMs via Sparse Autoencoder for Hallucination Mitigation

representation steering

semantic directions

sparse autoencoders

hallucination mitigation

large vision-language models

Large vision-language models (LVLMs) have achieved remarkable performance on multimodal tasks such as visual question answering (VQA) and image captioning. However, they still suffer from hallucinations, generating text inconsistent with visual input, posing significant risks in real-world applications. Existing approaches to address this issue focus on incorporating external knowledge bases, alignment training, or decoding strategies, all of which require substantial computational cost and time. Recent works try to explore more efficient alternatives by adjusting LVLMs' internal representations. Although promising, these methods may cause hallucinations to be insufficiently suppressed or lead to excessive interventions that negatively affect normal semantics. In this work, we leverage sparse autoencoders (SAEs) to identify semantic directions closely associated with either hallucinations or actuality, realizing more precise and direct hallucination-related representations. Our analysis demonstrates that interventions along the faithful direction we identified can mitigate hallucinations, while those along the hallucinatory direction can exacerbate them. Building on these insights, we propose **S**teering LVLMs via **S**AE **L**atent Directions (SSL), a training-free method based on SAE-derived latent directions to mitigate hallucinations in LVLMs. Extensive experiments demonstrate that SSL significantly outperforms existing decoding approaches in mitigating hallucinations, while maintaining transferability across different model architectures with negligible additional time overhead.

poster

## Welcome!
"I am excited to welcome you to this year’s edition of the Conference on Empirical Methods in Natural Language Processing! Importantly, it marks the 30th edition of EMNLP. With over 8,000 submissions, more than 3,000 accepted papers, and thousands of attendees, we have come a long way from that first
workshop, which had 14 accepted papers. As the field looks ahead, Suzhou is the fitting location for celebrating this milestone: rooted in a long literary tradition, yet modern and forward-looking, and home to a large share of the NLP community."<br>

*Message from the General Chair, Dirk Hovy*

[**Link to Conference Handbook**](https://drive.google.com/file/d/1johU5QqVVYO4RfH7QcIORr7qrVBdzdwC/view?usp=sharing)





<br>

Celebrate 30 Years of EMNLP! 
EMNLP 2025 will be held in Suzhou, China from November 5th to November 9th, 2025.

Large language models (LLMs) based Multilingual Knowledge Graph Completion (MKGC) aim to predict missing facts by leveraging LLMs' multilingual understanding capabilities, improving the completeness of multilingual knowledge graphs (KGs). However, existing MKGC research underutilizes the multilingual capabilities of LLMs and ignores the shareability of cross-lingual knowledge. In this paper, we propose a novel MKGC framework that leverages multilingual shared knowledge to significantly enhance performance through two components: Knowledge-level Grouped Mixture of Experts (KL-GMoE) and Iterative Entity Reranking (IER). KL-GMoE efficiently models shared knowledge, while IER significantly enhances its utilization. To evaluate our framework, we constructed a mKG dataset containing 5 languages and conducted comprehensive comparative experiments with existing state-of-the-art (SOTA) MKGC method. The experimental results demonstrate that our framework achieves improvements of 5.47\%, 3.27\%, and 1.01\% in the Hits@1, Hits@3, and Hits@10 metrics, respectively, compared with SOTA MKGC method. Further experimental analysis revealed the properties of knowledge sharing in settings of unseen and unbalanced languages. We have released the dataset and code for our work on https://anonymous.4open.science/r/KL-GMoE-A8E4.

Multilingual Knowledge Graph Completion via Efficient Multilingual Knowledge Sharing

Transformer-based self-attention mechanism serves as the core of modern language models, yet it often suffers from *localization*, where attentions collapse onto a limited subset of tokens and fail to capture long-range dependencies. To address this issue, we propose **Self-Attention One-step Belief Propagation (SAOBP)**, a refinement framework that injects *multi-hop* relationships through a belief propagation process. To interpret and quantify these interactions, we introduce **Global Token Dependency (GTD)** that captures the relative contribution of multi-hop connections within the attention graph. Empirical results indicate that SAOBP helps prevent entropy collapse in deeper layers and adaptively maintains GTD at task-appropriate levels, thereby supporting improvements in model performance. Importantly, we observe competitive gains in small-scale models, highlighting its potential for improving inference quality in resource-constrained scenarios.

Mitigating Attention Localization in Small Scale: Self-Attention Refinement via One-step Belief Propagation

The generation of incorrect images—such as depictions of people of color in Nazi-era uniforms by Gemini—frustrated users and harmed Google’s reputation, motivating us to investigate the relationship between accurately reflecting factuality and promoting diversity and equity. In this study, we focus on 19 real-world statistics collected from authoritative sources. Using these statistics, we develop a checklist comprising objective and subjective queries to analyze behavior of large language models (LLMs) and text-to-image (T2I) models. Objective queries assess the models’ ability to provide accurate world knowledge. In contrast, the design of subjective queries follows a key principle: statistical or experiential priors should not be overgeneralized to individuals, ensuring that models uphold diversity. These subjective queries are based on three common human cognitive errors that often result in social biases. We propose metrics to assess factuality and fairness, and formally prove the inherent trade-off between these two aspects. Results show that GPT-4o and DALL-E 3 perform notably well among six LLMs and four T2I models. Our code is in the supplementary materials and will be open-source upon publication.

Where Fact Ends and Fairness Begins: Redefining AI Bias Evaluation through Cognitive Biases

Due to the widespread dissemination of rumors on social media platforms, detecting rumors has been a long-standing concern for various communities. However, existing rumor detection methods rarely consider the fairness issues inherent in the model, which can lead to biased predictions across different stakeholder groups (e.g., domains and originating platforms of the detected content), also undermining their detection effectiveness. In this work, we propose a two-step framework to address this issue. First, we perform unsupervised partitioning to dynamically identify potential unfair data patterns without requiring sensitive attribute annotations. Then, we apply invariant learning to these partitions to extract fair and informative feature representations that enhance rumor detection. Extensive experiments show that our method outperforms strong baselines regarding detection and fairness performance, and also demonstrate robust performance on out-of-distribution samples. Further empirical results indicate that our learned features remain informative and fair across stakeholder groups and can correct errors when applied to existing baselines.

Equal Truth: Rumor Detection with Invariant Group Fairness

Large Language Models store extensive factual knowledge acquired during large-scale pre-training. However, this knowledge is inherently static, reflecting only the state of the world at the time of training. Knowledge editing has emerged as a promising solution for updating outdated or incorrect facts without full retraining. However, most existing locate-and-edit methods primarily focus on token-level likelihood optimization without addressing semantic coherence. Our analysis reveals that such edited knowledge is often encoded as isolated residual streams in the model's latent space, distinct from pre-existing knowledge and bypassing natural reasoning process. To address this, we propose STEAM, a semantic-level knowledge editing framework that enhances integration of updated knowledge into the model's knowledge structure. STEAM first identifies target representations as semantic anchors for the updated factual association, then guides the internal representation of the edited fact towards these anchors through an alignment loss during optimization. Experimental results demonstrate that STEAM improves model’s ability to reason with edited knowledge and enhances semantic coherence, underscoring the importance of latent-space alignment for reliable and coherent knowledge editing.

STEAM: A Semantic-Level Knowledge Editing Framework for Large Language Models

Personalized Large Language Models (LLMs) are increasingly used in diverse applications, where they are assigned a specific persona—such as a happy high school teacher—to guide their responses. While prior research has examined how well LLMs adhere to predefined personas in writing style, a comprehensive analysis of consistency across different personas and task types is lacking. In this paper, we introduce a new standardized framework to analyze consistency in persona-assigned LLMs. We define consistency as the extent to which a model maintains coherent responses when assigned the same persona across different tasks and runs. Our framework evaluates personas across four different categories (happiness, occupation, personality, and political stance) spanning multiple task dimensions (survey writing, essay generation, social media post generation, single turn, and multi-turn conversations). Our findings reveal that consistency is influenced by multiple factors, including the assigned persona, stereotypes, and model design choices. Consistency also varies across tasks, increasing with more structured tasks and additional context. All code is available on GitHub.

Are Economists Always More Introverted? Analyzing Consistency in Persona-Assigned LLMs

Retrieving entity knowledge that aligns with user intent is essential for task-oriented dialogue (TOD) systems to support personalization and localization, especially under large-scale knowledge bases. However, generative models tend to suffer from implicit association preference, while retrieval-generation approaches face knowledge transfer discrepancies. To address these challenges, we propose CaTER, a Context-aware Topology Entity Retrieval Contrastive Learning Framework. CaTER introduces a cycle context-aware distilling attention mechanism, which employs context-independent sparse pooling to suppress noise from weakly relevant attributes. We further construct topologically hard negative samples by decoupling entity information from generated responses and design a topology entity retrieval contrastive loss to train the retriever by reverse distillation. Extensive experiments on three standard TOD benchmarks with both small and large-scale knowledge bases show that CaTER consistently outperforms strong baselines such as MAKER and MK-TOD, achieving state-of-the-art performance in TOD system.

CaTER: A Framework for Context-aware Topology Entity Retrieval Contrastive Learning in End-to-End Task-Oriented Dialogue Systems

Multimodal Large Language Models (MLLMs) have demonstrated exceptional performance across various tasks. However, the internal mechanisms by which they interpret and integrate cross-modal information remain insufficiently understood. In this paper, to address the limitations of prior studies that could only identify neurons corresponding to single-token and rely on the vocabulary of LLMs, we propose a novel method to identify multimodal neurons in Transformer-based MLLMs. Then we introduce fuzzy set theory to model the complex relationship between neurons and semantic concepts and to characterize how multiple neurons collaboratively contribute to semantic concepts. Through both theoretical analysis and empirical validation, we demonstrate the effectiveness of our method and present some meaningful findings. Furthermore, by modulating neuron activation values based on the constructed fuzzy sets, we enhance performance on the Visual Question Answering (VQA) task, showing the practical value of our approach in downstream applications in MLLMs.

Attribution and Application of Multiple Neurons in Multimodal Large Language Models

Current LLMs are trained to refuse potentially harmful input queries regardless of whether users actually had harmful intents, causing a tradeoff between safety and user experience. Through a study of 480 participants evaluating 3,840 query-response pairs, we examine how different refusal strategies affect user perceptions across varying motivations. Our findings reveal that response strategy largely shapes user experience, while actual user motivation has negligible impact. Partial compliance---providing general information without actionable details---emerges as the optimal strategy, reducing negative user perceptions by over 50% to flat-out refusals. Complementing this, we analyze response patterns of 9 state-of-the-art LLMs and evaluate how 6 reward models score different refusal strategies, demonstrating that models rarely deploy partial compliance naturally and reward models currently undervalue it. This work demonstrates that effective guardrails require focusing on crafting thoughtful refusals rather than detecting intent, offering a path toward AI safety mechanisms that ensure both safety and sustained user engagement.

Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences

Large language models are increasingly tailored to individual users, responding to differences in preferences, values, goals, and communication styles. In order to assess whether a model can meet the needs of diverse individuals, a foundational requirement is the ability to evaluate whether it can accurately infer and internally represent user-specific traits. We introduce MTPA, a Multitask Personalization Assessment benchmark, designed to test whether models can accurately infer user traits from partial profiles. Compiled from large-scale survey data, MTPA defines a set of trait prediction tasks across beliefs, values, and demographic attributes. MTPA evaluates whether models can accurately infer user traits from partial profiles and quantifies performance across diverse user subgroups. We then evaluate the contribution of trait inference to personalized text generation and empirically validate the relevance of trait inference for downstream alignment. Alongside the benchmark, we release a dataset, toolkit, and baseline evaluations. MTPA is designed with extensibility and sustainability in mind: as the underlying survey datasets are regularly updated, MTPA supports regular integration of new populations and user traits.

Downloads

Next from EMNLP 2025

Multilingual Knowledge Graph Completion via Efficient Multilingual Knowledge Sharing

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES