Singapore

Large language models (LLMs) often perform better when
prompted to explain their reasoning, but it remains unclear
how well such gains persist as reasoning depth increases.
In this work, we propose a depth-aware evaluation framework
alongside the performance results on two structured
datasets: CLUTRR (kinship reasoning) and ProofWriter
(logical entailment), comparing direct vs. reasoning
(reasoning depth = number of inference steps required)
prompts across five models. Reasoning gave small gains at
shallow depths but quickly weakened and often reversed as
tasks grew more complex. In ProofWriter, GPT-5 reached 90%
accuracy at depth four in direct model, yet its reasoning
accuracy fell below baseline after depth two. Smaller
open-source models showed only unstable or negligible
gains, underscoring that reasoning in LLMs remains brittle
with increased depth.

AAAI 2026

When Reasoning Collapses: A Depth-Aware Probe into LLM Reasoning (Student Abstract)

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

We tackle the challenge of setting smart prices and advertising budgets for dairy products sold through three retail channels—General Trade (GT), Modern Trade (MT), and E-commerce (EC)—in markets where data is scarce. Traditional economic models can capture the complex relationships between price, trust, and advertising, but solving these models every time a manager wants to ask a question is slow and impractical. Our solution is to turn a game-theoretic market simulator into training data: we generate 10,000 market scenarios, solve for the best pricing and ad strategies, and train an AI to imitate those decisions. On unseen scenarios, the AI remains accurate (price RMSE ≤ 0.047, ad RMSE ≤ 0.031) and economically sound (Ratio of Means = 1.0010, Regret = 0.40%). To make it easy to use, we add a simple natural-language interface: users can say things like “trust is low, ad cost is high,” and the system returns price and ad suggestions along with confidence ranges. This creates a practical bridge between AI and economic rigor—delivering defensible decisions in seconds, even when data is limited.

Game-Theoretic Simulations Meet AI: Fast Policy Recommendations Under Data Scarcity (Student Abstract)

Bias in Large Language Models (LLMs) is increasingly
addressed through fairness-oriented techniques. However, in
some cases, these approaches may inadvertently remove
genuine cultural differences between groups, leading to
“over-normalization” or models losing important
socio-cultural distinctions. In this work, we introduce
OverNormEval, a benchmark designed to detect when an LLM
exhibits such over-normalization. We further explore the
use of Direct Preference Optimization (DPO) to mitigate
over-normalization.

When Equal Isn’t Fair: Mitigating Over-Normalization in Large Language Models (Student Abstract)

Deep learning models are emerging as strong alternatives to numerical weather prediction, yet their
internal
representations remain poorly understood. We analyze the
latent
space of Microsoft’s Aurora model to test whether its embed-
dings align with known physical processes. First, we show
that
land–sea distinctions are strongly captured, with errors
mainly
at coastlines. Second, we examine extreme surface
temperatures
using percentile-based thresholds, finding that embeddings
reveal
a gradient from moderate to severe events, though recall
degrades
at the rarest percentiles. These results suggest that
Aurora’s
encoder encodes physically consistent features but
underestimates
rare extremes. Our study combines deep learning forecasting,
interpretable representation learning, and classical ML
probing,
illustrating how cross-disciplinary AI methods can yield
insight
into foundation models

Latent Representations of Land–Sea Boundaries and Extreme Temperature in Aurora’s Encoder (Student Abstract)

External incentive mechanisms have been studied as a method
to promote cooperation in sequential social dilemmas
involving multiple autonomous agents. Mutual Acknowledgment
Token Exchange (MATE) is one such approach: by enabling
agents to exchange acknowledgment tokens, it induces
cooperation without additional training. However, MATE’s
use of fixed, manually tuned token values limits
adaptability to nonstationary environments and can
constrain performance. To enable a dynamically adapted
token, we introduce Social Influence-based MATE (SI-MATE),
which allows agents to share their individual improvement
signals and to self-punishment in response to inequality.
Experiments in a four-agent environment show that SI-MATE
outperforms MATE across multiple metrics, including
learning speed.

Social Influence-Based Mutual Acknowledgement Token Exchange (Student Abstract)

The current standard for training brain-computer interface (BCI) models is user-specific. There is a high interest in developing generic models that are trained on data from other users to minimize BCI calibration time; however, this is limited by noisy, non-stationary brain signals and high inter-user variabilities. We investigate the trade-off between training data quality and quantity on P300 BCI performance in individuals with amyotrophic lateral sclerosis (ALS) with representative traditional and deep learning models. Results show that data quality and domain alignment are more critical than dataset size: user-specific models trained on significantly less data outperformed generic models; generic models trained on ALS data outperformed models trained on ALS data; dimensionality reduction with block averaging was detrimental to EEGNet; and ISI differences between ALS and non-ALS data had minimal effect. Our findings highlight the importance of individualized model tuning for reliable P300 BCIs.

A Data-Centric Analysis of the Impact of Training Data Quality vs. Quantity on P300 Brain-Computer Interface Performance (Student Abstract)

In environments with sparse or delayed rewards,
reinforcement learning (RL) incurs high sample complexity
due to the large number of interactions needed for
learning. This limitation has motivated the use of large
language models (LLMs) for subgoal discovery and trajectory
guidance. While LLMs can support exploration, frequent
reliance on LLM calls raises concerns about scalability and
reliability. We address these challenges by constructing a
memory graph that encodes subgoals and trajectories from
both LLM guidance and the agent’s own successful rollouts.
From this graph, we derive a utility function that
evaluates how closely the agent’s trajectories align with
prior successful strategies. This utility shapes the
advantage function, providing the critic with additional
guidance without altering the reward. Our method relies
primarily on offline input and only occasional online
queries, avoiding dependence on continuous LLM supervision.
Preliminary experiments in benchmark environments show
improved sample efficiency and faster early learning
compared to baseline RL methods, with final returns
comparable to methods that require frequent LLM interaction.

Memory Based Advantage Shaping for LLM-Guided Reinforcement Learning (Student Abstract)

Although centralized training with centralized execution
(CTCE) excels at multi-agent coordination, its reliance on
global information limits its use in the real world.
Conversely, the practical decentralized execution (CTDE)
paradigm often struggles with complex coordination. This
paper bridges this critical gap by introducing the
Centralized-to-Decentralized (CtoD) learning concept: a
novel framework for transferring the knowledge of a
powerful centralized policy into a robust, practical
decentralized policy. Our method, CtoD-MAT, realizes this
transition through a curriculum that gradually shifts
agents from centralized to decentralized control. A key
innovation is our dynamic scheduling mechanism, featuring a
mediator module, which ensures a robust and effective
knowledge transfer. Using challenging SMAC benchmarks, we
demonstrate that CtoD-MAT successfully produces competitive
decentralized policies, notably solving complex
coordination tasks that are difficult for standard CTDE
methods.

CtoD-MAT: Bridging Centralized and Decentralized Execution in Multi-Agent Reinforcement Learning (Student Abstract)

Automated mosquito species identification is critical for combating vector-borne diseases. We introduce Q-MoFusion, a novel hybrid quantum-classical framework that fuses deep features from pre-trained Audio Spectrogram Transformer (AST) and Whisper models using a Variational Quantum Circuit (VQC). Our approach significantly outperforms individual backbones and prior state-of-the-art benchmarks, demonstrating superior accuracy and robustness, particularly on imbalanced classes. Q-MoFusion demonstrates the potential of hybrid quantum computing to enhance bioacoustic surveillance for addressing critical public health challenges.

Q-MoFusion: A Quantum Classifier for Masquito Species Classification (Student Abstract)

Traditional recommenders often fail to disentangle the
motivations behind user choices. To address this, we
propose MV-LLMRec, a framework that models interactions
through three views: Structural, Intent, and Conformity.
MV-LLMRec leverages LLMs to generate rich semantic
representations for intent and conformity, which are
refined through graph propagation and dynamically fused via
an attention mechanism. We evaluate MV-LLMRec on the
Amazon-Movie and Amazon-Book datasets and show that it
significantly outperforms state-of-the-art baselines,
validating our approach.

MV-LLMRec: Multi-View Representation Learning with Large Language Models for Recommendation (Student Abstract)

Machine Learning (ML) models have significant potential
across research and industry to enable data-driven insights
and decision-making. Their performance relies on input data
quality, but real-world datasets often contain
imperfections, making data preprocessing essential yet
time-consuming. Our research proposes a proof-of-concept
model using Generative Artificial Intelligence (GenAI) to
analyze and transform data for supervised ML
classification. The results from the GenAI models will be
compared with traditionally preprocessed data to evaluate
effectiveness. Preliminary results indicate that
incorporating GenAI models into the preprocessing pipeline
show potential in improving ML's classification performance.

Downloads

Next from AAAI 2026

Game-Theoretic Simulations Meet AI: Fast Policy Recommendations Under Data Scarcity (Student Abstract)

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

Game-Theoretic Simulations Meet AI: Fast Policy Recommendations Under Data Scarcity (Student Abstract)

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads