Singapore

Recent advancements in multimodal large language models (MLLMs) have shown remarkable progress in video understanding. However, video MLLMs (VideoMLLMs) still suffer from hallucinations, generating nonsensical or irrelevant content. This issue partly stems from over-reliance on pre-trained knowledge, sometimes neglecting the rich visual information present in the video. Additionally, many existing methods rely on uniform frame sampling, which can overlook critical visual cues. To address these challenges, we present EchoBat, a novel approach that leverages audio information as well as video temporal and logical consistency to improve preference data construction and keyframe extraction. Our method integrates Direct Preference Optimization (DPO) to mitigate hallucinations by leveraging high-quality, contextually rich preference feedback. Specifically, we use GPT-4o to generate high-quality video descriptions and integrate visually relevant segments from Whisper-derived transcripts to construct preference responses. Correspondingly, we use the reference model itself to describe the reversed video, and use GPT-4o to flashback the text and fill in the hallucination to produce non-preferred responses. This strategy enhances the model’s ability to better understand visual content and temporal, logical relationships within videos. Furthermore, we propose an echo-layered sampling strategy for keyframe extraction from videos, which can provide more precise visual supervision compared to uniform sampling. Experimental results on the three latest video hallucination benchmarks demonstrate the effectiveness of our approach.

AAAI 2026

EchoBat: Echo-Vision Enhancement and Echo-Layered Sampling for Video LLMs Hallucination Mitigation

peai: artificial general intelligence

nlp: language grounding & multi-modal nlp

robustness & trustworthiness

peai: safety

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Growing concerns over data privacy underscore the need for deep learning methods capable of processing sensitive information without compromising confidentiality. Among privacy-enhancing technologies, Homomorphic Encryption (HE) stands out by offering post-quantum cryptographic security and end-to-end data protection, safeguarding data even during computation. Prior research on encrypted training has primarily focused on logistic regression, model fine-tuning, or relied on multi-party computation. This is largely due to the substantial computational overhead and algorithmic complexity involved in training deep Neural Networks (NNs) under HE. In this paper, we present ReBoot, the first framework to enable fully encrypted and non-interactive training of Multi-Layer Perceptrons (MLPs) using CKKS bootstrapping. ReBoot introduces a novel HE-compliant NN architecture based on local error signals, specifically designed to minimize multiplicative depth and reduce noise accumulation during training. It employs a tailored packing strategy that leverages real-number arithmetic through CKKS \textit{SIMD} operations, significantly lowering both computational and memory overhead. We evaluate ReBoot on both image and tabular benchmarks, demonstrating up to $+6.83\%$ improvement in test accuracy over existing solutions, while reducing training latency by up to $8.83\times$. ReBoot is made available to the scientific community as a public repository.

ReBoot: Encrypted Training of Deep Neural Networks with CKKS Bootstrapping

The proliferation of multimodal fake news across various domains presents a significant challenge to information ecosystems. While existing multi-domain fake news detection methods attempt to leverage data from multiple domains to improve generalization, they often suffer from a critical drawback: negative transfer. This problem arises from their common practice of indiscriminately aggregating information, failing to account for the complex, often asymmetric, relationships between domains. Consequently, knowledge from irrelevant or conflicting domains can severely degrade detection performance.

To address this challenge, we propose a novel framework named \textbf{PANDA: Prototype-driven Asymmetric Neighbor-Domain Adaptation}. PANDA is the first framework, to our knowledge, that explicitly models the directional transferability between news domains to mitigate negative transfer. At its core, PANDA learns a set of compact and representative \textit{prototypes} for each domain to encapsulate its core characteristics. Based on these prototypes, we devise a novel \textbf{P}rototype-based \textbf{A}symmetric \textbf{D}istance (PAD) metric to quantify the potential benefit of transferring knowledge from a source domain to a target one. Guided by this metric, a \textbf{G}umbel-based \textbf{N}eighbor \textbf{S}elector (GNS) dynamically identifies the most beneficial neighbor domains for each instance. Finally, a \textbf{D}omain-\textbf{C}ollaborative \textbf{A}ttention (DCA) module adaptively fuses knowledge from the selected domains. Extensive experiments on benchmark datasets demonstrate that PANDA significantly outperforms state-of-the-art methods and effectively mitigates negative transfer, showcasing its superior adaptability and robustness in real-world scenarios.

From Blind Transfer to Wise Selection: Prototype-Driven Neighbor-Domain Adaptation for Fake News Detection

The recent success of machine learning models, especially large-scale classifiers and language models, relies heavily on training with massive data. These data are often collected from online sources. This raises serious concerns about the protection of user data, as individuals may not have given consent for their data to be used in training. To address this concern, recent studies introduce the concept of unlearnable examples, i.e., data instances that appear natural but are intentionally altered to prevent models from effectively learning from them. While existing methods demonstrate empirical effectiveness, they typically rely on heuristic trials and lack formal guarantees. Besides, when unlearnable examples are mixed with clean data, as is often the case in practice, their unlearnability disappears. In this work, we propose a novel approach to constructing unlearnable examples by systematically maximising the Bayes error, a measurement of irreducible classification error. We develop an optimisation-based approach and provide an efficient solution using projected gradient ascent. Our method provably increases the Bayes error and remains effective when the unlearning examples are mixed with clean samples. Experimental results across multiple datasets and model architectures are consistent with our theoretical analysis and show that our approach can restrict data learnability, effectively in practice.

Towards Provably Unlearnable Examples via Bayes Error Optimization

Multivariate Time-Series (MTS) clustering is essential for uncovering latent temporal pattern distributions, yet it is significantly challenged by the prevalence of redundant, low-discriminative information in real-world data. That is, redundancy refers to repetitive, meaningless temporal patterns, while low-discriminative information denotes components within representations that hinder the differentiation of sample differences. Such redundancy dilutes cluster-specific temporal signatures, thereby impairing clustering performance. Although mainstream approaches mitigate redundancy through static masking, they lack the ability to refine adaptively driven by training, limiting their focus to truly discriminative characteristics. To address this limitation, an Evolving Masking Representation Learning model for Multivariate Time-Series Clustering (EMTC) is proposed, adopting evolving masking that leverages model training to mask redundant timestamps while highlighting cluster-friendly temporal information through adaptive refinement and multi-endogenous view contrastive training for capturing localized nuances and global coherence. At its core is an Evolving Masking Mechanism via dynamic self-attention: the module observes multi-endogenous view representations to identify discriminative features across local perspectives, pinpointing timestamps critical for high-discriminability representations. These insights refine masking by suppressing redundant timestamps and enhancing inter-sample discriminability. Furthermore, a Consistency and Reconstruction Adjustment module enhances training stability by enforcing cross-view consistency and facilitating MEV reconstruction, ensuring the learning of robust representations. Comprehensive experiments conducted on 13 real-world benchmark datasets demonstrate that EMTC significantly outperforms state-of-the-art methods in both redundancy reduction and overall clustering performance.

Mask the Redundancy: Evolving Masking Representation Learning for Multivariate Time-Series Clustering

Reward-model-based fine-tuning is a central paradigm in aligning Large Language Models with human preferences. However, such approaches critically rely on the assumption that proxy reward models accurately reflect intended supervision, a condition often violated due to annotation noise, bias, or limited coverage. This misalignment can lead to undesirable behaviors, where models optimize for flawed signals rather than true human values. In this paper, we investigate a novel framework to identify and mitigate such misalignment by treating the fine-tuning process as a form of knowledge integration. We focus on detecting instances of \emph{proxy-policy conflicts}, cases where the base model strongly disagrees with the proxy. We argue that such conflicts often signify areas of \emph{shared ignorance}, where neither the policy nor the reward model possesses sufficient knowledge, making them especially susceptible to misalignment. To this end, we propose two complementary metrics for identifying these conflicts: a localized \textit{Proxy-Policy Alignment Conflict Score (PACS)} and a global \textit{Kendall-Tau Distance} measure. Building on this insight, we design an algorithm named \textbf{Selective Human-in-the-loop Feedback via Conflict-Aware Sampling (SHF-CAS)} that targets high-conflict QA pairs for additional feedback, refining both the reward model and policy efficiently. Experiments on two alignment tasks demonstrate that our approach enhances general alignment performance, even when trained with a biased proxy reward. Our work provides a new lens for interpreting alignment failures and offers a principled pathway for targeted refinement in LLM training.

Targeting Misalignment: A Conflict-Aware Framework for Reward-Model-based LLM Alignment

Generative vision-language models like Stable Diffusion demonstrate remarkable capabilities in creative media synthesis, but they also pose substantial risks of producing unsafe, offensive, or culturally inappropriate content when prompted adversarially. Current defenses struggle to align outputs with human values without sacrificing generation quality or incurring high costs.
To address these challenges, we introduce VALOR (Value-Aligned LLM-Overseen Rewriter), a modular, zero-shot agentic framework for safer and more helpful text-to-image generation. VALOR integrates layered prompt analysis with human-aligned value reasoning: a multi-level NSFW detector filters lexical and semantic risks; a cultural value alignment module identifies violations of social norms, legality, and representational ethics; and an intention disambiguator detects subtle or indirect unsafe implications. When unsafe content is detected, prompts are selectively rewritten by a large language model under dynamic, role-specific instructions designed to preserve user intent while enforcing alignment. If the generated image still fails a safety check, VALOR optionally performs a stylistic regeneration to steer the output toward a safer visual domain without altering core semantics. Experiments across adversarial, ambiguous, and value-sensitive prompts show that VALOR significantly reduces unsafe outputs by up to 100.00% while preserving prompt usefulness and creativity. These results highlight VALOR as a scalable and effective approach for deploying safe, aligned, and helpful image generation systems in open-world settings.

Value-Aligned Prompt Moderation via Zero-Shot Agentic Rewriting for Safe Image Generation

We introduce PASTA (Perceptual Assessment System for explanaTion of Artificial Intelligence), a novel human-centric framework for evaluating eXplainable AI (XAI) techniques in computer vision. Our first contribution is the creation of the PASTA-dataset, the first large-scale benchmark that spans a diverse set of models and both saliency-based and concept-based explanation methods. This dataset enables robust, comparative analysis of XAI techniques based on human judgment. Our second contribution is an automated, data-driven benchmark that predicts human preferences using the PASTA-dataset. This scoring called PASTA-score method offers scalable, reliable, and consistent evaluation aligned with human perception. Additionally, our benchmark allows for comparisons between explanations across different modalities, an aspect previously unaddressed. We then propose to apply our scoring method to probe the interpretability of existing models and to build more human interpretable XAI methods.

Benchmarking XAI Explanations with Human-Aligned Evaluations

Quality of datasets plays an important role in large language model (LLM) alignment.
In collecting human feedback, however, preference flipping is ubiquitous and causes corruption in data annotation;
the issue necessitates the alignment algorithms with improved robustness against potential flipped pairs.
To this end, this paper introduces a Flipping-Aware Direct Preference Optimization (FA-DPO) algorithm tailored to preference flipping from a reinforcement learning with human feedback (RLHF) perspective. 
We dissect the inherent human intention model and the preference flipping mechanism introduced by external factors as two distinct stages;
in the latter, we introduce an instance-dependent flipping probability on the basis of the Bradley-Terry (BT) model.
Further, by leveraging features relevant to preference annotation, we capture uncertainty in judgments and model preference flipping patterns.
In practice, we design a simple yet efficient iterative optimization algorithm compatible with the original RLHF and DPO algorithms.
In our experiments, we investigate the instance-dependent preference flipping model under multiple circumstances for evaluation of our proposed method, as well as other baseline methods.
The model implementation details and the code, as well as a complete manuscript with colored hyperlinks and technical appendix for better digital viewing, are included as supplementary materials and scheduled to be open-sourced upon publication.

When Human Preferences Flip: An Instance-Dependent Robust Loss for RLHF

Safe Multi-Agent Reinforcement Learning (MARL) typically requires specifying numerical cost functions to ensure policy behaviors adhere to safety constraints. As systems scale and human-defined constraints become diverse, context-dependent, and frequently updated, manual crafting of these numerical cost definitions becomes prohibitively complex, tedious, and error-prone. Natural language presents an intuitive yet powerful alternative for defining constraints, enabling broader accessibility and easier adaptability to new scenarios and evolving rules. However, current MARL frameworks lack effective mechanisms to incorporate free-form textual constraints intelligently and robustly. To bridge this gap, we introduce Safe Multi-Agent ReinforcementLearning with natural Language constraints (SMALL), a novel approach leveraging fine-tuned language models to parse and encode textual constraints into semantically meaningful embeddings. These embeddings reflect prohibited states or behaviors, thus allowing automated and accurate prediction of constraint violations. We integrate these learned embeddings directly into MARL frameworks, enabling agents to optimize task performance while simultaneously minimizing constraint violations, all without relying upon explicitly defined numeric penalties. To rigorously evaluate our method, we also propose the LaMaSafe benchmark—a set of diverse multi-agent tasks uniquely designed to assess the capability of MARL algorithms in understanding and adhering to realistic, human-provided natural language constraints. Experimental results across various LaMaSafe environments demonstrate that SMALL achieves comparable task performance to state-of-the-art baselines while significantly reducing constraint violations.

Safe Multi-agent Reinforcement Learning with Natural Language Constraints

Humans display significant uncertainty when faced with moral dilemmas, yet the extent of such uncertainty in large language models (LLMs) remains underexplored. In contrast, studies have confirmed the tendency of LLMs to be overly confident in their judgments, even as they are embedded in ethical decision-making frameworks, necessitating a deeper understanding of their moral reasoning and inherent uncertainties for building reliable AI systems. This work examines how uncertainties affect moral decisions in trolley problems across 32 open-source LLMs, spanning 9 distinct moral dimensions. Our analysis reveals that the variance in LLM confidence is greater among different models than it is within moral dimensions, indicating that moral uncertainty is predominantly shaped by the LLM architecture and training methodology. Next, we measure uncertainty via binary entropy and decompose it into total entropy, conditional entropy, and mutual information. To explore the effect of uncertainty in models, we deliberately added stochasticity in models via “dropout” at inference time. Our findings indicate that this intervention leads to a higher total entropy, primarily through an increase in mutual information, while conditional entropy remains largely unchanged. This intervention further yields significant improvements in human-LLM moral alignment, with correlations in mutual information and alignment score shifts. Our results highlight the potential to better align model-generated decisions and human preferences by deliberately modulating uncertainty and reducing LLM’s confidence in morally complex scenarios.

Downloads

Next from AAAI 2026

ReBoot: Encrypted Training of Deep Neural Networks with CKKS Bootstrapping

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

ReBoot: Encrypted Training of Deep Neural Networks with CKKS Bootstrapping

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads