Singapore

Large Language Models (LLMs) demonstrate impressive capabilities across diverse tasks, yet their safety mechanisms remain susceptible to adversarial exploitation of cognitive biases---systematic deviations from rational judgment. Unlike prior studies focusing on isolated biases, this work highlights the overlooked power of multi-bias interactions in undermining LLM safeguards. Specifically, we propose CognitiveAttack, a novel red-teaming framework that adaptively selects optimal ensembles from 154 human behavioral economics-defined cognitive biases, engineering them into adversarial prompts to effectively compromise LLM safety mechanisms. Experimental results reveal systemic vulnerabilities across 30 mainstream LLMs, particularly open-source variants. CognitiveAttack achieves a substantially higher attack success rate than the SOTA black-box method PAP (60.1% vs. 31.6%), exposing critical limitations in current defenses. Through quantitative analysis of successful jailbreaks, we further identify vulnerability patterns in safety-aligned LLMs under synergistic cognitive biases, validating multi-bias interactions as a potent yet underexplored attack vector. This work introduces a novel interdisciplinary perspective by bridging cognitive science and LLM safety, paving the way for more robust and human-aligned AI systems.

AAAI 2026

Exploiting Synergistic Cognitive Biases to Bypass Safety in LLMs

nlp: ethics — bias

nlp: safety and robustness

cms: social cognition and interaction

transparency & privacy

fairness

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Many real-world decision-making problems involve optimizing multiple objectives simultaneously, rendering the selection of the most preferred solution a non-trivial problem: All Pareto optimal solutions are viable candidates, and it is typically up to a decision maker to select one for implementation based on their subjective preferences. To reduce the cognitive load on the decision maker, previous work has introduced the Pareto pruning problem, where the goal is to compute a fixed-size subset of Pareto optimal solutions that best represent the full set, as evaluated by a given quality measure. Reframing Pareto pruning as a multiwinner voting problem, we conduct an axiomatic analysis of existing quality measures, uncovering several unintuitive behaviors. Motivated by these findings, we introduce a new measure, directed coverage. We also analyze the computational complexity of optimizing various quality measures, identifying previously unknown boundaries between tractable and intractable cases depending on the number and structure of the objectives. Finally, we present an experimental evaluation, demonstrating that the choice of quality measure has a decisive impact on the characteristics of the selected set of solutions and that our proposed measure performs competitively or even favorably across a range of settings.

Picking a Representative Set of Solutions in Multiobjective Optimization: Axioms, Algorithms, and Experiments

With the daily influx of 3D data on the internet, text-3D retrieval has gained increasing attention. However, current methods face two major challenges: Hierarchy Representation Collapse (HRC) and Redundancy-Induced Saliency Dilution (RISD). HRC compresses abstract-to-specific and whole-to-part hierarchies in Euclidean embeddings, while RISD averages noisy fragments, obscuring critical semantic cues and diminishing the model’s ability to distinguish hard negatives. To address these challenges, we introduce the Hyperbolic Hierarchical Alignment Reasoning Network (H$^{2}$ARN) for text-3D retrieval. H$^{2}$ARN embeds both text and 3D data in a Lorentz-model hyperbolic space, where exponential volume growth inherently preserves hierarchical distances. A hierarchical ordering loss constructs a shrinking entailment cone around each text vector, ensuring that the matched 3D instance falls within the cone, while an instance-level contrastive loss jointly enforces separation from non-matching samples. To tackle RISD, we propose a contribution-aware hyperbolic aggregation module that leverages Lorentzian distance to assess the relevance of each local feature and applies contribution-weighted aggregation guided by hyperbolic geometry, enhancing discriminative regions while suppressing redundancy without additional supervision. We also release the expanded T3DR-HIT v2 benchmark, which contains 8,935 text-to-3D pairs, 2.6 times the original size, covering both fine-grained cultural artefacts and complex indoor scenes. **Our dataset and codes will be available after acceptance.**

Hyperbolic Hierarchical Alignment Reasoning Network for Text-3D Retrieval

Uncertainty Quantification (UQ) is critical for detecting hallucinations in black-box Large Vision-Language Models (LVLMs). However, prevailing methods like Discrete Semantic Entropy (DSE) are unreliable, as their scores are primarily dominated by the number of semantic clusters. This renders them incapable of distinguishing between benign semantic ambiguity (varied but coherent responses) and severe belief conflict (contradictory responses). We address this limitation by proposing a novel, black-box framework rooted in Dempster-Shafer evidence theory, built on the premise that not all inconsistency is equal. Our method decomposes uncertainty into two complementary metrics: Belief Divergence, which quantifies ambiguity by measuring the separation between viewpoints, and Belief Conflict, which captures direct logical contradictions. Extensive experiments demonstrate that our framework provides a more reliable measure of uncertainty.

Not All Inconsistency Is Equal: Decomposing LVLM Uncertainty into Belief Divergence and Belief Conflict

Multivariate motif discovery aims to identify frequently occurring subsequences within multi-dimensional time series, which is a critical machine learning task with wide applications. However, previous motif discovery algorithms often miss complex multivariate motifs and struggle with high computational costs as data scale and dimensionality grow. We propose a novel \underline{L}earnable \underline{M}ultivari\underline{A}te matrix \underline{P}rofile method (L-MAP) that captures inter-dimensional dependencies for comprehensive analysis of multivariate time series. The time series is partitioned into subsequences using the Fourier transform in the frequency domain, with locality-sensitive hashing (LSH) assigning them to buckets based on distinct patterns. Each subsequence is modeled as a graph for multivariate fusion, where triplet learning is used to capture cross-dimensional relationships and form graph embeddings. Unlike prior methods relying on Euclidean distance modeling, our graph-based approach computes all-pairs similarity in a latent space, which constructs the multivariate matrix profile from distributions formed by embedding clusters. Extensive experiments on multivariate datasets from diverse domains demonstrate that L-MAP outperforms state-of-the-art methods in motif discovery, offering superior quality, diversity, and scalability efficiency.

Learnable Matrix Profile for Motif Discovery on Multivariate Time Series

Multi-task imitation learning (MTIL) has shown significant potential in robotic manipulation by enabling agents to perform various tasks using a single policy. It simplifies the policy deployment and enhances the agent's adaptability across different scenarios. However, key challenges remain, such as maintaining action reliability (e.g., avoiding abnormal action sequences that deviate from nominal task trajectories) and generalizing to unseen tasks with a few expert demonstrations. To address these challenges, we introduce the Foresight-Augmented Manipulation (FoAM) policy, a novel MTIL policy that pioneers the use of multi-modal goal conditions as input and introduces a foresight augmentation in addition to the general action reconstruction. FoAM enables the agent to reason about its actions' visual consequences (foresight) and to be guided by these more expressive representations during task execution. Extensive experiments on over 100 tasks in simulation and real-world settings demonstrate that FoAM significantly enhances MTIL policy performance, outperforming state-of-the-art baselines by up to 41% in success rate. Meanwhile, we released our simulation suites, including 10 scenarios and over 80 challenging tasks designed for manipulation policy training and evaluation. Please see the supplementary materials for details.

FoAM: Foresight-Augmented Multi-Task Imitation Policy for Robotic Manipulation

Episodic tasks in Reinforcement Learning (RL) often pose challenges due to sparse reward signals and high-dimensional state spaces, which hinder efficient learning. Additionally, these tasks often feature hidden “trap states”—irreversible failures that prevent task completion but do not provide explicit negative rewards to guide agents away from repeated errors. To address these issues, we propose Time-Weighted Contrastive Reward Learning (TW-CRL), an Inverse Reinforcement Learning (IRL) framework that leverages both successful and failed demonstrations. By incorporating temporal information, TW-CRL learns a dense reward function that identifies critical states associated with success or failure. This approach not only enables agents to avoid trap states but also encourages meaningful exploration beyond simple imitation of expert trajectories. Empirical evaluations on navigation tasks and robotic manipulation benchmarks demonstrate that TW-CRL surpasses state-of-the-art methods, achieving improved efficiency and robustness.

TW-CRL: Time-Weighted Contrastive Reward Learning for Efficient Inverse Reinforcement Learning

With the rapid development of generative AI, image steganography has garnered widespread attention due to its unique concealment. Recent studies have demonstrated the practical advantages of Fixed Neural Network Steganography (FNNS), notably its ability to achieve stable information embedding and extraction without any additional network training. However, the stego images generated by FNNS still exhibit noticeable distortion and limited robustness. These drawbacks compromise the security of the embedded information and restrict the practical applicability of the method. To address these limitations, we propose Robust Fixed Neural Network Steganography (RFNNS). Specifically, a texture-aware localization technique selectively embeds perturbations carrying secret information into regions of complex textures, effectively preserving visual quality. Additionally, a robust steganographic perturbation generation (RSPG) strategy is designed to enhance the decoding accuracy, even under common and unknown attacks. These robust perturbations are combined with AI-generated cover images to produce stego images. Experimental results demonstrate that RFNNS significantly improves robustness compared to state-of-the-art FNNS methods, achieving an average increase in SSIM of 23\% for recovered secret images under common attacks. Furthermore, the LPIPS value of recovered secrets images against previously unknown attacks achieved by RFNNS was reduced to 39\% of the SOTA method, underscoring its practical value for covert communication.

RFNNS: Robust Fixed Neural Network Steganography with Universal Text-to-Image Models

Existing Grammatical Error Correction (GEC) systems suffer from limited reference diversity, leading to underestimated evaluation and restricted model generalization. 
To address this issue, we introduce the **Judge of Edit-Level Validity (JELV)**, an automated framework to validate correction edits from grammaticality, faithfulness, and fluency. 
Using our proposed human-annotated Pair-wise Edit-level Validity Dataset (PEVData) as benchmark, JELV offers two implementations: a multi-turn LLM-as-Judges pipeline achieving 90\% agreement with human annotators, and a distilled DeBERTa classifier with 85\% precision on valid edits. 
We then apply JELV to reclassify misjudged false positives in evaluation and derive a comprehensive evaluation metric by integrating false positive decoupling and fluency scoring, resulting in state-of-the-art correlation with human judgments.
We also apply JELV to filter LLM-generated correction candidates, expanding the BEA19's single-reference dataset containing 38,692 source sentences. Retraining top GEC systems on this expanded dataset yields measurable performance gains. JELV provides a scalable solution for enhancing reference diversity and strengthening both evaluation and model generalization.

JELV: A Judge of Edit-Level Validity for Evaluation and Automated Reference Expansion in Grammatical Error Correction

We present a method for the unattended gray-box
identification of sensor models commonly used by
localization algorithms in the field of robotics. The
objective is to determine the most likely sensor model for
a time series of unknown measurement data, given an
extendable catalog of predefined sensor models. Sensor
model definitions may require states for rigid-body
calibrations and dedicated reference frames to replicate a
measurement based on the robot’s localization state. A
health metric is introduced, which verifies the outcome of
the selection process in order to detect false positives
and facilitate reliable decision-making. In the second
stage, an initial guess for identified calibration states
is generated, and the necessity of sensor world reference
frames is evaluated. The identified sensor model with its
parameter information is then used to parameterize and
initialize a state estimation application, thus ensuring a
more accurate and robust integration of new sensor
elements. This method is helpful for inexperienced users
who want to identify the source and type of a measurement,
sensor calibrations, or sensor reference frames. It will
also be important in the field of modular multiagent
scenarios and modularized robotic platforms that are
augmented by sensor modalities during runtime. Overall,
this work aims to provide a simplified integration of
sensor modalities to downstream applications and circumvent
common pitfalls in the usage and development of
localization approaches.

Sensor Model Identification via Simultaneous Model
Selection and State Variable Determination

Estimating the geometric and volumetric properties of transparent deformable liquids is challenging due to optical complexities and dynamic surface deformations induced by container movements. Autonomous robots performing precise liquid manipulation tasks—such as dispensing, aspiration, and mixing—must handle containers in ways that inevitably induce these deformations, complicating accurate liquid state assessment. Current datasets lack comprehensive physics-informed simulation data representing realistic liquid behaviors under diverse dynamic scenarios. To bridge this gap, we introduce Phys-Liquid, a physics-informed dataset comprising 97,200 simulation images and corresponding 3D meshes, capturing liquid dynamics across multiple laboratory scenes, lighting conditions, liquid colors, and container rotations. To validate the realism and effectiveness of Phys-Liquid, we propose a four-stage reconstruction and estimation pipeline involving liquid segmentation, multi-view mask generation, 3D mesh reconstruction, and real-world scaling. Experimental results demonstrate improved accuracy and consistency in reconstructing liquid geometry and volume, outperforming existing benchmarks. The dataset and associated validation methods facilitate future advancements in transparent liquid perception tasks. The dataset and code will be released.

Downloads

Next from AAAI 2026

Picking a Representative Set of Solutions in Multiobjective Optimization: Axioms, Algorithms, and Experiments

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES