Singapore

Large language models (LLMs) continually evolve through pre-training on ever-expanding web data, but this adaptive process also exposes them to subtle forms of misinformation. While prior work has explored data poisoning during static pre-training, the effects of such manipulations under continual pre-training remain largely unexplored. Drawing inspiration from the illusory truth effect in human cognition–where repeated exposure to falsehoods increases belief in their accuracy–we ask whether LLMs exhibit a similar vulnerability. We investigate whether repeated exposure to false but confidently stated facts can shift a model’s internal representation away from the truth. We introduce Layer of Truth, a framework and dataset for probing belief dynamics in continually trained LLMs. By injecting controlled amounts of poisoned data and probing intermediate representations across checkpoints, model scales, and question types, we quantify when and how factual beliefs shift. Our findings reveal that even minimal exposure can induce persistent representational drift in well-established facts, with susceptibility varying across layers and model sizes. These results highlight an overlooked vulnerability of continually updated LLMs: their capacity to internalize misinformation analogously to humans, underscoring the need for robust monitoring of factual integrity during model updates.

AAAI 2026

Layer of Truth: Probing Belief Shifts under Continual Pre-Training Poisoning

workshop paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Foundation models deployed in dynamic domains like robotics and autonomous systems suffer from critical reliability failures, including temporal inconsistencies and vulnerability to sensor noise, stemming from their training on static, disconnected images. To bridge this reliability gap, we propose a lightweight, reliability-aware training paradigm that distills temporal knowledge from video into a standard single-image encoder. By training a predictor to estimate the feature representation of a future frame, our method implicitly forces the backbone model to learn real-world dynamics, enhancing robustness to transient visual artifacts and promoting temporally stable representations. This self-supervised objective instills geometric and physical priors without relying on brittle external modules like optical flow estimators. Remarkably, when pre-trained on only a single, 2-hour uncurated video, our method achieves state-of-the-art among DINO-style approaches on downstream tasks like detection and segmentation, which we use as quantifiable proxies for robust scene understanding. Our work presents a practical and efficient approach for improving the trustworthiness and dependable performance of vision encoders for safe deployment in operational settings.

Next-Frame Prediction as a Reliability-Aware Training Paradigm for Robust Vision Encoders

Large Language Models are increasingly deployed as judges (LaaJ) in code generation pipelines. While attractive for scalability, LaaJs tend to overlook domain-specific issues raising concerns about their reliability in critical evaluation tasks. To better understand these limitations in practice, we examine LaaJ behavior in a concrete industrial use case: legacy code modernization via COBOL code generation. In this setting, we find that even production-deployed LaaJs can miss domain-critical errors, revealing consistent blind spots in their evaluation capabilities. To better understand these blind spots, we analyze generated COBOL programs and associated LaaJs judgments, drawing on expert knowledge to construct a preliminary taxonomy. Based on this taxonomy, we develop a lightweight analytic checker tool that flags over 30 domain-specific issues observed in practice. We use its outputs as {\it analytic hints}, dynamically injecting them into the judge’s prompt to encourage LaaJ to revisit aspects it may have overlooked. Experiments on a test set of 100 programs using four production-level LaaJs show that LaaJ alone detects only about 45% of the errors present in the code (in all judges we tested), while the analytic checker alone lacks explanatory depth. When combined, the LaaJ+Hints configuration achieves up to 94% coverage (for the best-performing judge and injection prompt) and produces qualitatively richer, more accurate explanations, demonstrating that analytic–LLM hybrids can substantially enhance evaluation reliability in deployed pipelines.

Beyond Blind Spots: Analytic Hints for Mitigating LLM-Based Evaluation Pitfalls

Large Language Models (LLMs) produce strong results but are costly to serve. Static post-training quantization reduces memory and compute, yet uses a single bit width for all prompts, wasting resources on easy inputs and degrading accuracy on harder ones. We introduce Prompt-Adaptive Quantization (PAQ), a per-prompt precision framework that requires no retraining of the underlying model. PAQ trains a lightweight BERT-based router with perplexity-guided supervision to select the smallest adequate quantization level (2, 4, 8, or 16 bits) per input. At inference, prompts are automatically routed to the appropriate pre-quantized LLM variant. Overall, PAQ serves as a novel framework for adaptive per-prompt quantization, reducing latency while maintaining strong accuracy across tasks.

Prompt-Adaptive Quantization: Adaptive Per-Prompt Routing for Efficient LLM Inference

Large language models (LLMs) often generate fluent but factually incorrect statements despite having access to relevant evidence, a failure mode rooted in how they allocate attention between contextual and parametric knowledge. Understanding and steering this internal behavior is key both for trustworthy deployment and for scientific interpretability of model mechanisms. We introduce COMPASS (Context-Modulated PID Attention Steering System), a lightweight, interpretable control framework that embeds a model-based feedback loop directly within decoding. COMPASS quantifies context reliance via a transparent metric, the Context Reliance Score (CRS), which serves as an online probe of how attention heads ground generation in evidence. Using this interpretable signal, a PID controller dynamically modulates attention heads to maintain factual consistency without retraining or multi-pass decoding. Across benchmarks (HotpotQA, XSum, HaluEval, RAGTruth), COMPASS consistently reduces contextual hallucination rates (2.8–5.8% absolute) while revealing how distinct attention heads contribute to evidence alignment. These results highlight feedback-driven interpretability as a pathway toward scientific understanding of LLM behavior.

COMPASS: Context-Modulated PID Attention Steering System for Hallucination Mitigation

BSLM: A Bi-Level Speech-Language Model for the Joint Modeling of Discrete and Continuous Tokens

Granular Control of Nonverbal Expressions for Achieving Natural Emotional Text-to-Speech

AudioBERTScore: Objective Evaluation of Environmental Sound Synthesis Based on Similarity of Audio Embedding Sequences

Can You Hear Naples? Building and Benchmarking a Neapolitan Speech Corpus

Optimization is a foundational pillar of artificial intelligence (AI), underpinning core techniques in planning, scheduling, decision-making, and machine learning. Yet despite decades of algorithmic advances, widespread adoption of state-of-the-art optimization solvers remains limited by the substantial expertise required for effective modeling and solving. This expertise barrier means that powerful optimization tools remain largely inaccessible to non-experts, with most users of leading solvers holding advanced degrees.

Recent advances in generative AI, particularly large language models (LLMs), offer a promising new path for democratizing optimization. By automating key steps in the optimization pipeline – from model formulation through solver configuration to model validation – LLMs promise to broaden access to powerful optimization tools. However, these models rarely work out of the box for complex reasoning tasks like optimization. 

This tutorial surveys emerging research at the intersection of LLMs and mathematical optimization, highlighting both practical systems and open research questions.  We will provide a comprehensive overview of how LLMs can support each stage of the optimization pipeline, including model formulation, solver configuration, and validation. The tutorial is designed to be accessible to attendees without prior experience in either field, offering both conceptual frameworks and practical insights for this rapidly evolving area of research.

LLMs for Optimization: Modeling, Solving, and Validating with Generative AI

There is a growing debate about the implications of multiplicity—conflicting behavior among a set of “good”’ models—for algorithmic decision-making. On one hand, there are concerns over unfair treatment due to conflicting predictions and explanations, further exacerbated in the generative AI ecosystem. Yet, on the other hand, multiplicity also offers the potential to find less discriminatory and more interpretable models. In this tutorial, we aim to increase awareness of different perspectives on multiplicity and position them among broader discussions in the community. Specifically, the main goals of our tutorial are:

* Highlighting the phenomenon of multiplicity in machine learning and calling attention to its growing literature.
* Linking multiplicity with uncertainty and churn, presenting a unified view of various measures of instability.
* Discussing the implications of multiplicity for fairness and explainability in algorithmic decision-making.
* Recognizing exacerbated concerns and new forms of multiplicity in the age of generative AI.
* Engaging the community on when and how to address multiplicity in various practical scenarios.
* Identifying open questions and motivating future research directions on multiplicity.

Premium content

Next from AAAI 2026

Next-Frame Prediction as a Reliability-Aware Training Paradigm for Robust Vision Encoders

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES