Singapore

Recently, with the increasing capabilities of Large Language Models (LLMs), AI applications have gradually emerged to solve various problems in people&#39;s daily lives, so accurately measuring their performance and reliability is paramount. However, existing benchmarks predominantly rely on closed-ended, multiple-choice or short-answer question formats. While useful for assessment, these formats exhibit a significant gap compared to the diverse and open-ended nature of questions posed by real-world users. To bridge this gap, we produce OmniBench, a comprehensive open-domain benchmark. OmniBench is uniquely composed of authentic, user-generated questions harvested from real-world interactions on various websites and applications, covering 16 rigorously defined knowledge domains and 5 crucial user intents derived from a large-scale analysis of the mass corpus. Crucially, we propose three automated data construction pipelines that enable the continuous and periodic updating of the benchmark dataset. This approach not only ensures that the questions can keep up with current events, but also effectively mitigates the critical issue of data contamination prevalent in static benchmarks. Moreover, a multi-dimensional hybrid evaluation framework named OmniEval is proposed for evaluating the responses. This framework combines diverse metrics and evaluation methods to capture nuanced aspects of answer performance. Extensive validation demonstrates that this evaluation framework exhibits strong alignment with human judgments, ensuring the reliability of the benchmark results.

AAAI 2026

OmniBench: A Comprehensive Benchmark Integrating Real-World, Time-sensitive, and Multi-Hop Questions with a Multi-Dimensional Hybrid Evaluation Framework

nlp: fact-checking / misinformation detection (nlp focus)

nlp: question answering

nlp: prompt engineering / prompting

Recently, with the increasing capabilities of Large Language Models (LLMs), AI applications have gradually emerged to solve various problems in people's daily lives, so accurately measuring their performance and reliability is paramount. However, existing benchmarks predominantly rely on closed-ended, multiple-choice or short-answer question formats. While useful for assessment, these formats exhibit a significant gap compared to the diverse and open-ended nature of questions posed by real-world users. To bridge this gap, we produce OmniBench, a comprehensive open-domain benchmark. OmniBench is uniquely composed of authentic, user-generated questions harvested from real-world interactions on various websites and applications, covering 16 rigorously defined knowledge domains and 5 crucial user intents derived from a large-scale analysis of the mass corpus. Crucially, we propose three automated data construction pipelines that enable the continuous and periodic updating of the benchmark dataset. This approach not only ensures that the questions can keep up with current events, but also effectively mitigates the critical issue of data contamination prevalent in static benchmarks. Moreover, a multi-dimensional hybrid evaluation framework named OmniEval is proposed for evaluating the responses. This framework combines diverse metrics and evaluation methods to capture nuanced aspects of answer performance. Extensive validation demonstrates that this evaluation framework exhibits strong alignment with human judgments, ensuring the reliability of the benchmark results.

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Existing reinforcement learning methods for Chain-of-Thought reasoning suffer from two critical limitations. First, they operate as monolithic black boxes that provide undifferentiated reward signals, obscuring individual step contributions and hindering error diagnosis. Second, sequential decoding has O(n) time complexity. This makes real-time deployment impractical for complex reasoning tasks.
We present DeCoRL (Decoupled Reasoning Chains via Coordinated Reinforcement Learning), a novel framework that transforms reasoning from sequential processing into collaborative modular orchestration. DeCoRL trains lightweight specialized models to generate reasoning sub-steps concurrently, eliminating sequential bottlenecks through parallel processing. To enable precise error attribution, the framework designs modular reward functions that score each sub-step independently. Cascaded DRPO optimization then coordinates these rewards while preserving inter-step dependencies.
Comprehensive evaluation demonstrates state-of-the-art results across RM-Bench, RMB, and RewardBench, outperforming existing methods including large-scale models. DeCoRL delivers 3.8 times faster inference while maintaining superior solution quality and offers a 22.7\% improvement in interpretability through explicit reward attribution. These advancements, combined with a 72.4\% reduction in energy consumption and a 68\% increase in throughput, make real-time deployment of complex reasoning systems a reality.

DeCoRL: Decoupling Reasoning Chains via Parallel Sub-Step Generation and Cascaded Reinforcement for Interpretable and Scalable RLHF

Aligning large language models with multiple human expectations and values is crucial for ensuring that they adequately serve a variety of user needs. To this end, offline multiobjective alignment algorithms such as the Rewards-in-Context algorithm have shown strong performance and efficiency. However, inappropriate preference representations and training with imbalanced reward scores limit the performance of such algorithms. In this work, we introduce ParetoHqD that addresses the above issues by representing human preferences as preference directions in the objective space and regarding data near the Pareto front as ''high-quality'' data. For each preference, ParetoHqD follows a two-stage supervised fine-tuning process, where each stage uses an individual Pareto high-quality training set that best matches its preference direction. The experimental results have demonstrated the superiority of ParetoHqD over five baselines on two multiobjective alignment tasks.

ParetoHqD: Fast Offline Multiobjective Alignment of Large Language Models Using Pareto High-Quality Data

Zero-shot object navigation (ZSON) in unseen environments remains a challenging problem for household robots, requiring strong perceptual understanding and decision-making capabilities. While recent methods leverage metric maps and Large Language Models (LLMs), they often depend on depth sensors or prebuilt maps, limiting the spatial reasoning ability of Multimodal Large Language Models (MLLMs). Mapless ZSON approaches have emerged to address this, but they typically make short-sighted decisions, leading to local deadlocks due to a lack of historical context. We propose PanoNav, a fully RGB-only, mapless ZSON framework that integrates a Panoramic Scene Parsing module to unlock the spatial parsing potential of MLLMs from panoramic RGB inputs, and a Memory-guided Decision-Making mechanism enhanced by a Dynamic Bounded Memory Queue to incorporate exploration history and avoid local deadlocks. Experiments on the public navigation benchmark show that PanoNav significantly outperforms representative baselines in both SR and SPL metrics.

PanoNav: Mapless Zero-Shot Object Navigation with Panoramic Scene Parsing and Dynamic Memory

While text embeddings enable efficient semantic processing in LLMs, they remain vulnerable to inversion attacks that reconstruct sensitive original text. However, current defense methods typically treat text embeddings from the feature level independently, ignoring the exploitation of the mutual relation among the embedding construction pipeline. To address this limitation, we propose Eguard, a framework that effectively disrupts chains of relationships between the original semantic space and defended functional space. Our improvements manifest at two levels, i.e., the global-level and local-level mutual information. At the global level, we propose to minimize the statistical dependency between protected embeddings and their original inputs, effectively decoupling sensitive content from the semantic space accessible to adversaries. At the local level, we apply keyword-antonym contrastive learning to enforce semantic discriminability within the space of downstream utility. This synergy of global privacy control and local semantic alignment allows Eguard to achieve a superior privacy-utility trade-off than traditional defenses. Our approach significantly reduces privacy risks, protecting over 95\% of tokens from inversion while maintaining high performance across downstream tasks consistent with original embeddings.

Eguard: Defending LLM Embeddings Against Inversion Attacks via Text Mutual Information Optimization

Recent advances in Transformer-based Neural Operators have enabled significant progress in data-driven solvers for Partial Differential Equations (PDEs). Most current research has focused on reducing the quadratic complexity of attention to address the resulting low training and inference efficiency. 
Among these works, Transolver stands out as a representative method that introduces Physics-Attention to reduce computational costs. Physics-Attention projects grid points into slices for slice attention, then maps them back through deslicing.
However, we observe that Physics-Attention can be reformulated as a special case of linear attention, and that the slice attention may even hurt the model performance.
Based on these observations, we argue that its effectiveness primarily arises from the slice and deslice operations rather than interactions between slices. 
Building on this insight, we propose a two-step transformation to redesign Physics-Attention into a canonical linear attention, which we call $Linear Attention Neural Operator$ (LinearNO).
Our method achieves state-of-the-art performance on six standard PDE benchmarks, while reducing the number of parameters by an average of 40.0\% and computational cost by 36.2\%. Additionally, it delivers superior performance on two challenging, industrial-level datasets: AirfRANS and Shape-Net Car.

Transolver Is a Linear Transformer: Revisiting Physics-Attention Through the Lens of Linear Attention

JPEG, as a widely used image compression standard, often introduces severe visual artifacts when achieving high compression ratios. Although existing deep learning-based restoration methods have made considerable progress, they often struggle to recover complex texture details, resulting in over-smoothed outputs. To overcome these limitations, we propose SODiff, a novel and efficient semantic-oriented one-step diffusion model for JPEG artifacts removal. Our core idea is that effective restoration hinges on providing semantic-oriented guidance to the pre-trained diffusion model, thereby fully leveraging its powerful generative prior. To this end, SODiff incorporates a semantic-aligned image prompt extractor (SAIPE). SAIPE extracts rich features from low-quality (LQ) images and projects them into an embedding space semantically aligned with that of the text encoder. Simultaneously, it preserves crucial information for faithful reconstruction. Furthermore, we propose a quality factor-aware time predictor that implicitly learns the compression quality factor (QF) of the LQ image and adaptively selects the optimal denoising start timestep for the diffusion process. Extensive experimental results show that our SODiff outperforms recent leading methods in both visual quality and quantitative metrics. The code and models will be publicly available soon.

SODiff：Semantic-Oriented Diffusion Model for JPEG Compression Artifacts Removal

Reinforcement learning (RL) has demonstrated considerable potential for enhancing reasoning in large language models (LLMs). 
However, existing methods suffer from poor data efficiency and degraded reasoning capabilities when training directly on samples with mixed difficulty. To mitigate this, prior approaches leverage Chain-of-Thought (CoT) data, but the construction of high-quality CoT annotations remains labor-intensive. Alternatively, curriculum learning strategies have been explored but frequently encounter challenges, such as difficulty mismatch, reliance on manual curriculum design, and catastrophic forgetting.
To address these issues, we propose AdaCuRL, a Adaptive Curriculum Reinforcement Learning framework that integrates coarse-to-fine difficulty estimation with adaptive curriculum scheduling. This approach dynamically aligns data difficulty with model capability and incorporates a data revisitation mechanism to mitigate catastrophic forgetting. Furthermore, AdaCuRL employs adaptive reference and sparse KL regularization strategies to prevent reasoning degradation. Extensive experiments across diverse reasoning benchmarks demonstrate that AdaCuRL consistently achieves significant performance improvements on both LLMs and MLLMs.

AdaCuRL: Adaptive Curriculum Reinforcement Learning with Invalid Sample Mitigation and Historical Revisiting

Video-based visible-infrared person re-identification (VVI-ReID) aims to match pedestrian video sequences captured across different modalities and viewpoints, and plays a critical role in all-day intelligent surveillance. While recent supervised methods have shown promising results, they rely on large-scale cross-modal video annotations, which are expensive and difficult to obtain in practice.
To address this limitation, we introduce the task of unsupervised domain adaptation for video-based visible-infrared person re-identification (UDA-VVI-ReID), where a model is transferred from a labeled source domain to an unlabeled target domain. This setting presents unique challenges, including modality discrepancies, temporal variations, and the difficulty of generating reliable pseudo-labels under occlusion or motion noise.
To tackle these issues, we propose a Dynamic-Static Collaboration (DSC) framework that integrates two key modules. The Dynamic-Static Label Unification (DSLU) module refines pseudo-labels by enforcing consistency between appearance and motion features across modalities. The Dynamic-Static Joint Learning (DSJL) module further enhances representation learning through contrastive objectives and neighbor-based feature alignment guided by both dynamic and static cues. Experimental results on the HITSZ-VCM and BUPTCampus datasets demonstrate that this method achieves state-of-the-art performance among unsupervised approaches without relying on target domain labels.

Dynamic-Static Collaboration for Unsupervised Domain Adaptive Video-Based Visible-Infrared Person Re-Identification

To achieve immersive spatial audio rendering on VR/AR devices, high-quality Head-Related Transfer Functions (HRTFs) are essential. In general, HRTFs are subject-dependent and position-dependent, and their measurement is time-consuming and tedious. To address this challenge, we propose the Graph Neural Field with Spatial-Correlation Augmentation (GraphNF-SCA) for HRTF personalization, which can be used to generate individual HRTFs for unseen subjects. The GraphNF-SCA consists of three key components: an HRTF personalization (HRTF-P) module, an HRTF upsampling (HRTF-U) module, and a fine-tuning stage. In the HRTF-P module, we predict HRTFs of the target subject via the Graph Neural Network (GNN) with an encoder-decoder architecture, where the encoder extracts universal features and the decoder incorporates the target-relevant features and produces individualized HRTFs. The HRTF-U module employs another GNN to model spatial correlations across HRTFs. This module is fine-tuned using the output of the HRTF-P module, thereby enhancing the spatial consistency of the predicted HRTFs. Unlike existing methods that estimate individual HRTFs position-by-position without spatial correlation modeling, the GraphNF-SCA effectively leverages inherent spatial correlations across HRTFs to enhance the performance of HRTF personalization. Experimental results demonstrate that the GraphNF-SCA achieves state-of-the-art results.

Graph Neural Field with Spatial-Correlation Augmentation for HRTF Personalization

Graph anomaly detection (GAD), which aims to detect outliers in graph-structured data, has received increasing research attention recently. However, existing GAD methods assume identical training and testing distributions, which is rarely valid in practice. In real-world scenarios, unseen but normal samples may emerge during deployment, leading to a normality shift that degrades the performance of GAD models trained on the original data. Through empirical analysis, we reveal that the degradation arises from (1) semantic confusion, where unseen normal samples are misinterpreted as anomalies due to their novel patterns, and (2) aggregation contamination, where the representations of seen normal nodes are distorted by unseen normals through message aggregation. While retraining or fine-tuning GAD models could be a potential solution to the above challenges, the high cost of model retraining and the difficulty of obtaining labeled data often render this approach impractical in real-world applications. To bridge the gap, we proposed a lightweight and plug-and-play Test-time adaptation framework for correcting Unseen Normal pattErns (TUNE) in GAD. To address semantic confusion, a graph aligner is employed to align the shifted data to the original one at the graph attribute level. Moreover, we utilize the minimization of representation-level shift as a supervision signal to train the aligner, which leverages the estimated aggregation contamination as a key indicator of normality shift. Extensive experiments on 10 real-world datasets demonstrate that TUNE significantly enhances the generalizability of pre-trained GAD models to both synthetic and real unseen normal patterns.

Downloads

Next from AAAI 2026

DeCoRL: Decoupling Reasoning Chains via Parallel Sub-Step Generation and Cascaded Reinforcement for Interpretable and Scalable RLHF

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

DeCoRL: Decoupling Reasoning Chains via Parallel Sub-Step Generation and Cascaded Reinforcement for Interpretable and Scalable RLHF

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads