Singapore

Given society&#39;s increasing reliance on data, its collection
and processing into useful information is a technical
problem of growing focus, and perhaps paradoxically, a
critical bottleneck in many data science and machine
learning applications.
Yet, even for the most basic statistical problems such as
mean estimation, there is a theory-practice divide.
Conventional methods like the sample mean, while supported
by theoretical results under strong assumptions, are often
brittle in the presence of extreme data.
Practitioners thus often use ad-hoc and unprincipled
&quot;outlier removal&quot; heuristics, but which can lead to wrong
conclusions (e.g. Milikan&#39;s underestimation of the electron
charge (Holton 1978)).

In this talk, I will describe my work that essentially
resolves the fundamental 1-d mean estimation problem.
I will show the construction of a statistically-optimal and
computationally-efficient 1-dimensional mean estimator,
whose estimation error is optimal even in the leading
multiplicative constant, under bare minimum distributional
assumptions (FOCS 2021).
Furthermore, we will discuss its various robustness
properties (ICML 2025 Oral), in particular highlighting
robustness to adversarial sample corruption.
Depending on the allocated time, I will also show a rather
different but optimal mean estimator for the &quot;very
high-dimensional&quot; regime (ITCS 2022).

AAAI 2026

All-Purpose Mean Estimation over R

foundations of machine learning

learning theory

information theory

Given society's increasing reliance on data, its collection
and processing into useful information is a technical
problem of growing focus, and perhaps paradoxically, a
critical bottleneck in many data science and machine
learning applications.
Yet, even for the most basic statistical problems such as
mean estimation, there is a theory-practice divide.
Conventional methods like the sample mean, while supported
by theoretical results under strong assumptions, are often
brittle in the presence of extreme data.
Practitioners thus often use ad-hoc and unprincipled
"outlier removal" heuristics, but which can lead to wrong
conclusions (e.g. Milikan's underestimation of the electron
charge (Holton 1978)).

In this talk, I will describe my work that essentially
resolves the fundamental 1-d mean estimation problem.
I will show the construction of a statistically-optimal and
computationally-efficient 1-dimensional mean estimator,
whose estimation error is optimal even in the leading
multiplicative constant, under bare minimum distributional
assumptions (FOCS 2021).
Furthermore, we will discuss its various robustness
properties (ICML 2025 Oral), in particular highlighting
robustness to adversarial sample corruption.
Depending on the allocated time, I will also show a rather
different but optimal mean estimator for the "very
high-dimensional" regime (ITCS 2022).

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Unified video and action prediction models hold great potential for robotic manipulation, as future observations offer contextual cues for planning, while actions reveal how interactions shape the environment. However, most existing approaches treat observation and action generation in a monolithic and goal-agnostic manner, often leading to semantically misaligned predictions and incoherent behaviors.
To this end, we propose **H-GAR**, a **H**ierarchical interaction framework via **G**oal-driven observation-**A**ction **R**efinement. To anchor prediction to the task objective, H-GAR first produces a goal observation and a coarse action sketch that outline a high-level route toward the goal. To enable explicit interaction between observation and action under the guidance of the goal observation for more coherent decision-making, we devise two synergistic modules. **(1) Goal-Conditioned Observation Synthesizer (GOS)** synthesizes intermediate observations based on the coarse-grained actions and the predicted goal observation. **(2) Interaction-Aware Action Refiner (IAAR)** refines coarse actions into fine-grained, goal-consistent actions by leveraging feedback from the intermediate observations and a **Historical Action Memory Bank** that encodes prior actions to ensure temporal consistency. By integrating goal grounding with explicit action-observation interaction in a coarse-to-fine manner, H-GAR enables more accurate manipulation. Extensive experiments on both simulation and real-world robotic manipulation tasks demonstrate that H-GAR achieves state-of-the-art performance.

H-GAR: A Hierarchical Interaction Framework via Goal-Driven Observation-Action Refinement for Robotic Manipulation

Medical Vision-Language Models (MedVLMs) show immense promise in clinical applicability. However, their reliability is hindered by hallucinations, where models often fail to derive answers from visual evidence, instead relying on learned textual priors. Existing mitigation strategies for MedVLMs have distinct limitations: training-based methods rely on costly expert annotations, limiting scalability, while training-free interventions like contrastive decoding, though data-efficient, apply a global, untargeted correction whose effects in complex real-world clinical settings can be unreliable. To address these challenges, we introduce Anatomical Region-Guided Contrastive Decoding (ARCD), a plug-and-play strategy that mitigates hallucinations by providing targeted, region-specific guidance. Our module leverages an anatomical mask to direct a three-tiered contrastive decoding process. By dynamically re-weighting at the token, attention, and logits levels, it verifiably steers the model's focus onto specified regions, reinforcing anatomical understanding and suppressing factually incorrect outputs. Extensive experiments across diverse datasets, including chest X-ray, CT, brain MRI, and ocular ultrasound, demonstrate our method's effectiveness in improving regional understanding, reducing hallucinations, and enhancing overall diagnostic accuracy.

Anatomical Region-Guided Contrastive Decoding: A Plug-and-Play Strategy for Mitigating Hallucinations in Medical VLMs

Query rewriting is a crucial task for improving retrieval, especially in professional domains such as law and medicine, where user queries are often underspecified and ambiguous. While large language models (LLMs) offer strong understanding and generation capabilities, existing LLM-based approaches reduce the task to text transformation or expansion, neglecting reasoning to disambiguate queries, which fails to bridge the cognitive gap between user queries and specialized documents. In this paper, we propose Think-Then-Rewrite (TTR), a reinforcement learning based framework that unleashes LLMs' reasoning ability for domain-specific query rewriting. TTR introduces a contrastive mutual information reward to encourage the LLM to generate reasoning processes that effectively distinguish confusing distractors. To boost early-stage training, TTR also constructs golden query rewrites as off‑policy data, providing strong guidance for RL learning. A mixed-policy optimization then combines on-policy and off-policy signals, ensuring both effectiveness and stability. Extensive experiments on legal and medical retrieval benchmarks demonstrate that TTR achieves state-of-the-art performance.

Think Then Rewrite: Reasoning Enhanced Query Rewriting for Domain Specific Retrieval

Model Context Protocol (MCP) standardizes interface mapping for large language models (LLMs) to access external data and tools, which revolutionizes the paradigm of tool selection and facilitates the rapid expansion of the LLM agent tool ecosystem.
However, as the MCP is increasingly adopted, third-party customized versions of the MCP server expose potential security vulnerabilities.
In this paper, we first introduce a novel security threat, which we term the MCP Preference Manipulation Attack (MPMA).
An attacker deploys a customized MCP server to manipulate LLMs, causing them to prioritize it over other competing MCP servers.
This can result in economic benefits for attackers, such as revenue from paid MCP services or advertising income generated from free servers.
To achieve MPMA, we first design a Direct Preference Manipulation Attack ($\mathtt{DPMA}$) that achieves significant effectiveness by inserting the manipulative word and phrases into the tool name and description. 
However, such a direct modification is obvious to users and lacks stealthiness.
To address these limitations, we further propose Genetic-based Advertising Preference Manipulation Attack ($\mathtt{GAPMA}$). 
$\mathtt{GAPMA}$ employs four commonly used strategies to initialize descriptions and integrates a Genetic Algorithm (GA) to enhance stealthiness.
The experiment results demonstrate that \NameG balances high effectiveness and stealthiness.
Our study reveals a critical vulnerability of the MCP in open ecosystems, highlighting an urgent need for robust defense mechanisms to ensure the fairness of the MCP ecosystem.

MPMA: Preference Manipulation Attack Against Model Context Protocol

Large Language Models for Simulating Professions (SP-LLMs), particularly as teachers, are pivotal for personalized education. However, ensuring their professional competence and ethical safety remains a major challenge, as existing benchmarks fail to measure role-playing fidelity or address the unique teaching harms inherent in educational scenarios. To address this gap, we propose EduGuardBench, a dual-component benchmark that evaluates professional fidelity through the Role-playing Fidelity Score (RFS) and diagnoses harms specific to the teaching profession. It also probes safety vulnerabilities using persona-based adversarial prompts targeting both general harms and academic misconduct, with metrics such as Attack Success Rate (ASR) and a three-tier Refusal Quality assessment. Extensive experiments on 14 leading models reveal a stark polarization in performance. While reasoning-oriented models generally demonstrate higher fidelity, incompetence remains the dominant failure mode across most models. Adversarial testing uncovered a counterintuitive scaling paradox, where mid-sized models appear more vulnerable, challenging monotonic safety assumptions. Notably, we identify an Educational Transformation Effect, where the safest models convert harmful requests into teachable moments through ideal educational refusals. This ability is strongly negatively correlated with ASR, revealing a new dimension of advanced AI safety. EduGuardBench thus provides a reproducible framework for holistic assessment of professional, ethical, and pedagogical alignment, uncovering dynamics critical to deploying trustworthy AI in education. See https://github.com/YL1N/EduGuardBench for materials.

EduGuardBench: A Holistic Benchmark for Evaluating the Pedagogical Fidelity and Adversarial Safety of LLMs as Simulated Teachers

Large transformer models, trained on diverse datasets, have demonstrated impressive few-shot performance on previously unseen tasks without requiring parameter updates. This capability has also been explored in Reinforcement Learning (RL), where agents interact with the environment to retrieve context and maximize cumulative rewards, showcasing strong adaptability in complex settings. However, in cooperative Multi-Agent Reinforcement Learning (MARL), where agents must coordinate toward a shared goal, decentralized policy deployment can lead to mismatches in task alignment and reward assignment, limiting the efficiency of policy adaptation. To address this challenge, we introduce Multi-agent In-context Coordination via Decentralized Memory Retrieval (MAICC), a novel approach designed to enhance coordination by fast adaptation. Our method involves training a centralized embedding model to capture fine-grained trajectory representations, followed by decentralized models that approximate the centralized one to obtain team-level task information. Based on the learned embeddings, relevant trajectories are retrieved as context, which, combined with the agents' current sub-trajectories, inform decision-making. During decentralized execution, we introduce a novel memory mechanism that effectively balances test-time online data with offline memory. Based on the constructed memory, we propose a hybrid utility score that incorporates both individual- and team-level returns, ensuring credit assignment across agents. Extensive experiments on cooperative MARL benchmarks, including Level-Based Foraging (LBF) and SMAC (v1/v2), show that MAICC enables faster adaptation to unseen tasks compared to existing methods.

Multi-agent In-context Coordination via Decentralized Memory Retrieval

Personalization, while extensively studied in conventional autonomous driving pipelines, has been largely overlooked in the context of end-to-end autonomous driving (E2EAD), despite its critical role in fostering user trust, safety perception, and real-world adoption. A primary bottleneck is the absence of large-scale real-world datasets that systematically capture driving preferences, severely limiting the development and evaluation of personalized E2EAD models. In this work, we introduce the first large-scale real-world dataset explicitly curated for personalized E2EAD, integrating comprehensive scene topology with rich dynamic context derived from agent dynamics and semantics inferred via a fine-tuned vision-language model (VLM). We propose a hybrid annotation pipeline that combines behavioral analysis, rule-and-distribution-based heuristics, and subjective semantic modeling guided by VLM reasoning, with final refinement through human-in-the-loop verification. Building upon this dataset, we introduce the first standardized benchmark for systematically evaluating personalized E2EAD models. Empirical evaluations on state-of-the-art architectures demonstrate that incorporating personalized driving preferences significantly improves behavioral alignment with human demonstrations.

StyleDrive: Towards Driving-Style Aware Benchmarking of End-To-End Autonomous Driving

Deep unfolding networks (DUNs) have recently emerged as a promising approach for hyperspectral image super-resolution (HSISR) by combining the benefits of nonlinear deep learning architectures with interpretable optimization techniques. Despite their advantages, current DUNs face significant challenges, particularly in approximating degradation matrices across both spatial and spectral dimensions, which results in complex and cumbersome model construction. By analyzing the difference between the upsampled low-resolution hyperspectral images (LRHS) and the true target image, we observed that the residual image exhibits strong sparsity, akin to noise. Leveraging this insight, we reformulate the HSISR problem as a robust principal component analysis (RPCA)-based denoising task, effectively eliminating the need for the complex approximation of spatial degradation matrix and its transpose. In addition, we introduce a Tensor Ring Transformer based on multilinear products as the prior term, wherein tokens are mapped to a tensor ring factor domain and the traditional dot product is replaced with a multilinear tensor ring product. This significantly reduces the computational complexity of the Transformer model, from $ \mathcal{O}(N^2d) $ to $ \mathcal{O}(Nr^2) $, with $ r<<d $, while maintaining the expressive power. The proposed Tensor Ring Transformer integrates both Softmax and linear attention mechanisms, striking a balance between interpretability—characteristic of model-based approaches—and the efficiency inherent in deep learning techniques. Experimental results across multiple remote sensing datasets demonstrate the superiority of the designed Tensor Ring Transformer, achieving substantial improvements in image quality and computational efficiency compared to current state-of-the-art methods.

TRT: Harnessing Tensor Ring Transformer for Hyperspectral Image Super-Resolution

The forecasting of irregular multivariate time series (IMTS) is a critical task in domains like healthcare and climate science. However, this task faces two significant hurdles: 1) the inherent non-uniformity and missing data in IMTS complicate the modeling of temporal dynamics, and 2) existing methods often rely on computationally expensive architectures. To address these dual challenges, we introduce APN, a general and efficient forecasting framework. At the core of APN is a novel Time-Aware Patch Aggregation (TAPA) module that introduces an aggregation-based paradigm for adaptive patching, moving beyond the limitations of fixed-span segmentation and interpolation-based methods. TAPA first learns dynamic temporal boundaries to define data-driven segments. Crucially, instead of resampling or interpolating, it directly computes patch representations via a time-aware weighted aggregation of all raw observations, where weights are determined by each observation's temporal relevance to the segment. This approach provides two key advantages: it preserves data fidelity by avoiding the introduction of artificial data points and ensures complete information coverage by design.The resulting regularized and information-rich patch representations enable the use of a lightweight query module for historical context aggregation and a simple MLP for final prediction. Extensive experiments on multiple real-world datasets demonstrate that APN establishes a new state-of-the-art, significantly outperforming existing methods in both prediction accuracy and computational efficiency.

Rethinking Irregular Time Series Forecasting: A Simple Yet Effective Baseline

The Gromov--Wasserstein (GW) distance and its fused extension (FGW) are powerful tools for comparing heterogeneous data. Their computation is, however, challenging since both distances are based on non-convex, quadratic optimal transport (OT) problems. Leveraging 1D OT, a sliced version of GW has been proposed to lower the computational burden. Unfortunately, this sliced version is restricted to Euclidean geometry and loses invariance to isometries, strongly limiting its application in practice. To overcome these issues, we propose a novel slicing technique for GW as well as for FGW that is based on an appropriate lower bound, hierarchical OT, and suitable quadrature rules for the underlying 1D OT problems. Our novel sliced FGW significantly reduces the numerical effort while remaining invariant to isometric transformations and allowing the comparison of arbitrary geometries. We show that our new distance actually defines a pseudo-metric for structured spaces that bounds FGW from below and study its interpolation properties between sliced Wasserstein and GW. Since we avoid the underlying quadratic program, our sliced distance is numerically more robust and reliable than the original GW and FGW distance; especially in the context of shape retrieval and graph isomorphism testing.

Downloads

Next from AAAI 2026

H-GAR: A Hierarchical Interaction Framework via Goal-Driven Observation-Action Refinement for Robotic Manipulation

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

H-GAR: A Hierarchical Interaction Framework via Goal-Driven Observation-Action Refinement for Robotic Manipulation

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads