Singapore

The emergent capabilities of large language models (LLMs) have prompted interest in using them as surrogates for human subjects in opinion surveys. However, prior evaluations of LLM-based opinion simulation have relied heavily on costly, domain-specific survey data, and mixed empirical results leave their reliability in question. To enable cost-effective, early-stage evaluation, we introduce a quality control assessment designed to test the viability of LLM-simulated opinions on Likert-scale tasks without requiring large-scale human validation. This assessment comprises two key tests: logical consistency and alignment with stakeholder expectations, offering a low-cost, domain-adaptable validation tool. We apply our quality control assessment to an opinion simulation task relevant to AI-assisted content moderation and fact-checking workflows---a socially impactful use case---and evaluate seven LLMs using a baseline prompt engineering method (backstory prompting), as well as fine-tuning and in-context learning variants. None of the models or methods pass the full assessment, revealing several failure modes. We conclude with a discussion of the risk management implications and release TopicMisinfo, a benchmark dataset with paired human and LLM annotations, to support future research.

AAAI 2026 Main Conference

Should You Use LLMs to Simulate Opinions? Quality Checks for Early-Stage Deliberation

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Palm vein recognition has emerged as a promising biometric technology, yet its development remains constrained by the scarcity of large-scale publicly available datasets. Several methods of palm vein image generation have been proposed to address this issue. These methods usually focus on the anatomical realism of palm vein patterns, but overlook the biophysical correlation between identities and vein patterns, particularly in simulating identity-specific vein contrast. To tackle this limitation, we propose a novel biophysics-driven synthesis method. 
Our method constructs a 3D palm vascular tree via established modeling method. Then, a projection model is proposed to map the 3D tree into 2D space to derive palm vein patterns. The projection model is based on skin spectral absorption and simulates the natural attenuation of light passing through the skin using a layer integration method. For different identities, we sample different skin parameters, resulting in varying degrees of attenuation. This method effectively simulates the variation in vein contrast across different identities. Furthermore, we introduce a conditional diffusion model that uses the projected patterns as identity conditions to generate palm vein images. To the best of our knowledge, this is the first palm vein generation method based on the diffusion model. 
Experimental results demonstrate that our method not only outperforms existing methods, but also enables a recognition model trained on our synthetic data to achieve superior performance compared to a model trained on real-world data at a scale of 2,000 IDs under an open-set protocol with a TAR@FAR=1:1 of 1e-4.

LSAP-PV: High-Fidelity Palm Vein Image Synthesis via Layered Spectral Absorption Projection-Guided Diffusion Model

Multi-source knowledge graph completion (MKGC) seeks to predict missing triples in a target KG by leveraging triples from multi-source KGs (e.g., different languages or domains). 
Existing studies typically learn and fuse multi-source KG representations solely with alignments or fusion modules, which can be affected by redundant information within KGs.
This issue can conceal task-relevant information in representations, impeding further improvements when scaling to numerous KGs.
To this end, we propose IMKGC, an information-theoretic MKGC framework to learn minimal sufficient representations. 
In particular, IMKGC learns entity representations explicitly preserving endogenous contextual information within each KG, exogenous complementary information from other KGs, and consistent information of equivalent entities, while suppressing redundant information through variational constraints. 
Furthermore, we achieve compressed relation representations with a devised relation reasoning decoder that captures relatedness among relations, also improving triple prediction.
Extensive experiments on 14 KGs across three multilingual and multi-domain benchmarks demonstrate that IMKGC significantly outperforms previous state-of-the-art methods, especially in redundant scenarios.
Our code will be released at \url{https://xxx} for the research community and now in the supplementary material.

Information-Theoretic Minimal Sufficient Representation for Multi-Domain Knowledge Graph Completion

World-model-based reinforcement learning achieves high sample efficiency by learning from imagined rollouts. However, its success critically depends on the accuracy of the learned world model, which is prone to producing unrealistic or hallucinated rollouts when queried beyond its domain of competence. These flawed predictions can trap the agent in a vicious cycle: by misleading exploration toward implausible or uninformative regions, they degrade the quality of collected data, which in turn corrupts policy learning with inaccurate rollouts.
To break this cycle, we introduce the notion of a knowledge boundary—the region within which the world model provides reliable predictions—and propose a unified framework that both identifies and leverages this boundary. Concretely, we approximate the boundary using model uncertainty, quantified via disagreement across an ensemble of lightweight predictors, which serves as a practical proxy. This uncertainty signal is used in two complementary ways: as an intrinsic reward to guide exploration toward under-explored yet learnable regions, and as a dynamic filter to exclude unreliable imagined rollouts from policy optimization.
Extensive experiments across diverse benchmarks—including CARLA, DeepMind Control Suite, Atari, and MemoryMaze—demonstrate that our approach consistently outperforms prior state-of-the-art methods.

Perceiving the Knowledge Boundary: Uncertainty-Guided Exploration and Imagination for World Models

Person re-identification (ReID) aims to retrieve target pedestrian images based on either visual queries (I2I) or textual descriptions (T2I). Although both tasks share the same retrieval objective, they face distinct challenges: I2I focuses on learning discriminative identity representations, while T2I emphasizes cross-modal semantic alignment. Existing approaches typically handle these tasks separately or naively combine them, which often leads to task interference and performance degradation.
To address this, we propose a unified framework that leverages task-aware prompt learning to jointly optimize both tasks. Specifically, we design a Task-Routed Transformer that introduces dual classification tokens within a shared visual encoder to decouple task-specific representations. On top of this, we develop a Task-Conditioned Prompt Alignment module that constructs hierarchical prompts by integrating identity-level learnable tokens with sample-level pseudo-text tokens. These pseudo-tokens are converted from image or text features via modality-specific decoders, injecting fine-grained instance-level semantics into the prompts. Furthermore, we introduce a Cross-Modal Prompt Regularization strategy to enforce semantic alignment in the prompt token space, encouraging pseudo-prompts to preserve source-modality semantics while enhancing cross-modal transferability.
Extensive experiments on multiple benchmark datasets demonstrate that our approach effectively mitigates task interference and achieves state-of-the-art performance on both I2I and T2I person ReID tasks.

Hierarchical Prompt Learning for Image- and Text-Based Person Re-Identification

Asynchronous Federated Learning (AFL) is acclaimed for accelerating collaborative training on heterogeneous systems by eliminating the wait for stragglers. While current solutions focus on improving convergence amidst update delays, they neglect how delayed aggregation fosters free-riding attacks, allowing malicious clients to easily extract the global model without contribution. This behavior results in significant fairness issues and performance degradation. To address this challenge, we propose OPTION, the first online pricing strategy tailored to mitigate free-riding in AFL. OPTION establishes an economic model in which access to model updates is purchased using credits earned from verified contributions. Specifically, OPTION values each model update according to its marginal performance gain and training cost, and subsequently necessitates a download fee from each client based on the Hotelling model to prevent zero-cost acquisition. Moreover, OPTION rewards clients for successful updates under non-arbitrage constraints, effectively balancing individual utility and task budget. To maximize the average model performance while satisfying these conditions, OPTION leverages the Lyapunov drift framework and a probabilistic sampling-based algorithm to optimize the pricing parameters. Extensive experimental results on three real-world datasets demonstrate that OPTION effectively mitigates free-riding attacks in AFL, increases the number of valid updates by at least 23.97%, and achieves a model accuracy improvement of at least 3.01% compared to state-of-the-art baselines.

OPTION: An Online Pricing Strategy for Asynchronous Federated Learning Against Free-Riding Attacks

Visual abductive reasoning (VAR) is a challenging task that requires AI systems to infer the most likely explanation for incomplete visual observations. While recent MLLMs develop strong general-purpose multimodal reasoning capabilities, they remain fall short in abductive inference, as compared to human beings. To bridge this gap, we draw inspiration from the interplay between verbal and pictorial abduction in human cognition, and propose to strengthen abduction of MLLMs by mimicking such dual-mode behavior. Concretely, we introduce **AbductiveMLLM** comprising of two synergistic components: REASONER and IMAGINER. The REASONER operates in the verbal domain. It first explores a broad space of possible explanations using a blind LLM and then prunes visually incongruent hypotheses based on cross-modal causal alignment. The remaining hypotheses are introduced into the MLLM as targeted priors, steering its reasoning toward causally coherent explanations. The IMAGINER, on the other hand, further guides MLLMs by emulating human-like pictorial thinking. It conditions a text-to-image diffusion model on both the input video and the REASONER’s output embeddings to “imagine” plausible visual scenes that correspond to verbal explanation, thereby enriching MLLMs' contextual grounding. The two components are trained jointly in an end-to-end manner. Experiments on standard VAR benchmarks show that **AbductiveMLLM** achieves state-of-the-art performance, consistently outperforming traditional solutions and advanced MLLMs. Our code will be released.

AbductiveMLLM: Boosting Visual Abductive Reasoning Within MLLMs

Solving the problem of cooperation is fundamentally
important for the creation and maintenance of functional
societies. Problems of cooperation are omnipresent within
human society, with examples ranging from navigating busy
road junctions to negotiating treaties. As the use of AI
becomes more pervasive throughout society, the need for
socially intelligent agents capable of navigating these
complex cooperative dilemmas is becoming increasingly
evident. Direct punishment is a ubiquitous social mechanism
that has been shown to foster the emergence of cooperation
in both humans and non-humans. In the natural world, direct
punishment is often strongly coupled with partner selection
and reputation and used in conjunction with third-party
punishment. The interactions between these mechanisms could
potentially enhance the emergence of cooperation within
populations. However, no previous work has evaluated the
learning dynamics and outcomes emerging from multi-agent
reinforcement learning populations that combine these
mechanisms. This paper addresses this gap. It presents a
comprehensive analysis and evaluation of the behaviors and
learning dynamics associated with direct punishment,
third-party punishment, partner selection, and reputation.
Finally, we discuss the implications of using these
mechanisms on the design of cooperative AI systems.

Investigating the Impact of Direct Punishment on the
Emergence of Cooperation in Multi-agent Reinforcement
Learning Systems

Dataset distillation (DD) aims to generate a compact synthetic dataset that enables efficient training of neural networks while maintaining performance comparable to that achieved with the original dataset. However, existing methods often suffer from two main limitations. They either rely on computationally intensive iterative optimization procedures or depend heavily on architecture-specific designs. These issues limit their practicality for large-scale datasets and hinder generalization across different model architectures. 
To overcome these challenges, recent research has explored the use of diffusion models as an architecture-agnostic approach to dataset distillation, offering improved scalability and generalization for large-scale datasets across diverse model architectures.
While diffusion-based dataset distillation methods have shown considerable potential, several challenges remain. Notably, certain approaches exhibit a distributional mismatch between the pre-trained diffusion model and the target dataset, which can adversely affect the fidelity and representativeness of the generated samples.
Others require substantial fine-tuning to achieve high fidelity, which negates the benefits of architectural flexibility. In this work, we propose a new diffusion-based dataset distillation framework that effectively preserves the characteristics of the original dataset without requiring any fine-tuning. Our method employs adaptive sampling and repulsion regularization to enhance both the fidelity and diversity of generated samples. As a result, the proposed approach outperforms state-of-the-art distillation methods across a wide range of datasets and model architectures.

An Adaptive Sampling Framework for Diffusion-based Dataset Distillation with High Fidelity and Diversity

Cued Speech (CS) enhances lipreading via hand coding, offering visual phonemic cues that support precise speech perception for the hearing-impaired. The task of CS Video-to-Speech generation (CSV2S) aims to convert CS videos into intelligible speech signals. Most existing research focuses on CS Recognition (CSR), which transcribes video content into text. Consequently, a common solution for CSV2S is to integrate CSR with a text-to-speech (TTS) system. However, this pipeline relies on text as an intermediate medium, which may lead to error propagation and temporal misalignment between speech and CS video dynamics. In contrast, directly generating audio speech from CS video (direct CSV2S) often suffer from the inherent multimodal complexity and the limited availability of CS data. To address these challenges, we propose UniCUE, the first unified framework for CSV2S that directly generates speech from CS videos without relying on intermediate text. The core innovation of UniCUE lies in integrating a understanding task (CSR) that provides fine-grained CS visual-semantic cues to to guide the speech generation. Specifically, UniCUE incorporates a pose-aware visual processor, a semantic alignment pool that enables precise visual–semantic mapping, and a VisioPhonetic adapter to bridge the understanding and generation tasks within a unified architecture. To support this framework, we construct UniCUE-HI, a large-scale Mandarin CS dataset containing 11,282 videos from 14 cuers, including both hearing-impaired and normal-hearing individuals. Extensive experiments conducted on this dataset demonstrate that UniCUE achieves state-of-the-art (SOTA) performance across multiple evaluation metrics.

UniCUE: Unified Recognition and Generation Framework for Chinese Cued Speech Video-to-Speech Generation

We study a matrix completion problem where both the ground truth $R$ matrix and the unknown sampling distribution $P$ over observed entries are low-rank matrices, and *share a common subspace*. We assume that a large amount $M$ of unlabeled data drawn from the sampling distribution $P$ is available, together with a small amount $N$ of "labeled" data drawn from the same distribution and noisy estimates of the corresponding ground truth entries. This setting is inspired by recommender systems scenarios where the unlabeled data corresponds to "implicit feedback" (consisting in interactions such as purchase, click, etc. ) and the labeled data corresponds to the `explicit feedback', consisting of interactions where the user has given an explicit rating to the item. Leveraging powerful results from the theory of low-rank subspace recovery, together with classic generalization bounds for matrix completion models, we show error bounds consisting of a sum of two error terms scaling as $O\left(\sqrt{\frac{nd}{M}}\right)$ and $O\left(\sqrt{\frac{dr}{N}}\right)$ respectively, where $d$ is the rank of $P$ and $r$ is the rank of $M$. In synthetic experiments, we confirm that the true generalization error naturally splits into independent error terms corresponding to the estimations of $P$ and and the ground truth matrix $G$ respectively. In real-life experiments on Douban and MovieLens with most explicit ratings removed, we demonstrate that the method can outperform baselines relying only on the explicit ratings, demonstrating that our assumptions provide a valid toy theoretical setting to study the interaction between explicit and implicit feedbacks in recommender systems.

Downloads

Next from AAAI 2026 Main Conference

LSAP-PV: High-Fidelity Palm Vein Image Synthesis via Layered Spectral Absorption Projection-Guided Diffusion Model

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026 Main Conference

LSAP-PV: High-Fidelity Palm Vein Image Synthesis via Layered Spectral Absorption Projection-Guided Diffusion Model

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads