Singapore

Bandit multiple hypothesis testing has broad applications in biological sciences, clinical testing for drug discovery, and online A/B/n testing. The framework utilizes an adaptive sampling strategy for multiple testing which aims to maximize statistical power while ensuring anytime false discovery rate control. This paper proposes a robust approach for bandit multiple testing, allowing for (at most) $\varepsilon$ fraction of arbitrary distribution corruption, as in Huber&#39;s contamination model. Specifically, we introduce two adaptive sampling strategies designed to minimize the number of samples required to exceed a target true positive rate, while providing anytime control over the false discovery rate. We analyze the sample complexity of our proposed methods and perform numerical simulations to demonstrate their efficiency and robustness. Furthermore, we extend our methods to address scenarios where distributions have infinite variance and situations involving multiple agents collaborating on the same bandit task.

AAAI 2026

On the Robustness of Bandit Multiple Testing

online learning & bandits

sequential decision making

robustness & trustworthiness

safety

Bandit multiple hypothesis testing has broad applications in biological sciences, clinical testing for drug discovery, and online A/B/n testing. The framework utilizes an adaptive sampling strategy for multiple testing which aims to maximize statistical power while ensuring anytime false discovery rate control. This paper proposes a robust approach for bandit multiple testing, allowing for (at most) $\varepsilon$ fraction of arbitrary distribution corruption, as in Huber's contamination model. Specifically, we introduce two adaptive sampling strategies designed to minimize the number of samples required to exceed a target true positive rate, while providing anytime control over the false discovery rate. We analyze the sample complexity of our proposed methods and perform numerical simulations to demonstrate their efficiency and robustness. Furthermore, we extend our methods to address scenarios where distributions have infinite variance and situations involving multiple agents collaborating on the same bandit task.

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Hyperparameter Optimization (HPO) is crucial in machine learning, aiming to optimize hyperparameters to enhance model performance. Although existing methods that leverage prior knowledge—drawn from either previous experiments or expert insights—can accelerate optimization, acquiring a correct prior for a specific HPO task is non-trivial. In this work, we propose to relief the reliance on external knowledge by learning a reliable prior \emph{directly} from low-fidelity (LF) problems. We introduce $\texttt{Lamda}$ , an algorithm-agnostic framework designed to boost any baseline HPO algorithm. Specifically, $\texttt{Lamda}$ operates in two phases: (1) it learns a reliable prior by exploring the LF landscape under limited computational budgets, and (2) it leverages this learned prior to guide the HPO process. We showcase how the $\texttt{Lamda}$ framework can be integrated with various HPO algorithms to boost their performance, and further conduct theoretical analysis towards the integrated Bayesian optimization and bandit-based Hyperband. We conduct experiments on $56$ HPO problems spanning diverse domains and model scales. Results show that \texttt{Lamda} consistently enhances its baseline algorithms. Compared to nine state-of-the-art HPO algorithms, our $\texttt{Lamda}$ variant achieves the best performance in $51$ out of $56$ HPO tasks while it is the second best algorithm in the other $5$ cases.

LAMDA: Two-Phase HPO via Learning Prior from Low-Fidelity Data

Fine-tuning large language models (LLMs) in federated settings enables privacy-preserving adaptation but suffers from cross-client interference due to model aggregation. Existing federated LoRA fine-tuning methods, primarily based on FedAvg, struggle with data heterogeneity, leading to harmful cross-client interference and suboptimal personalization. In this work, we propose \textbf{FedALT}, a novel personalized federated LoRA fine-tuning algorithm that fundamentally departs from FedAvg. Instead of using an aggregated model to initialize local training, each client continues training its individual LoRA while incorporating shared knowledge through a separate Rest-of-World (RoW) LoRA component. To effectively balance local adaptation and global information, FedALT introduces an adaptive mixer that dynamically learns input-specific weightings between the individual and RoW LoRA components, drawing conceptual foundations from the Mixture-of-Experts (MoE) paradigm. Through extensive experiments on NLP benchmarks, we demonstrate that FedALT significantly outperforms state-of-the-art personalized federated LoRA fine-tuning methods, achieving superior local adaptation without sacrificing computational efficiency.

FedALT: Federated Fine-Tuning Through Adaptive Local Training with Rest-of-World LoRA

Recent studies have shown that deep learning models are very vulnerable to poisoning attacks. Many defense methods have been proposed to address this issue. However, traditional poisoning attacks are not as threatening as commonly believed. This is because they often cause differences in how the model performs on the training set compared to the validation set. Such inconsistency can alert defenders that their data has been poisoned, allowing them to take the necessary defensive actions. In this paper, we introduce a more threatening type of poisoning attack called the Deferred Poisoning Attack. This new attack allows the model to function normally during the training and validation phases but makes it very sensitive to evasion attacks or even natural noise. We achieve this by ensuring the poisoned model's loss function has a similar value as a normally trained model at each input sample but with a large local curvature. A similar model loss ensures that there is no obvious inconsistency between the training and validation accuracy, demonstrating high stealthiness. On the other hand, the large curvature implies that a small perturbation may cause a significant increase in model loss, leading to substantial performance degradation, which reflects a worse robustness. We fulfill this purpose by making the model have singular Hessian information at the optimal point via our proposed Singularization Regularization term. We have conducted both theoretical and empirical analyses of the proposed method and validated its effectiveness through experiments on image classification tasks. Furthermore, we have confirmed the hazards of this form of poisoning attack under more general scenarios using natural noise, offering a new perspective for research in the field of security.

Deferred Poisoning: Making the Model More Vulnerable via Hessian Singularization

Time series forecasting relies on predicting future values from historical data, yet most state-of-the-art approaches—including transformer and multilayer perceptron-based models—optimize using Mean Squared Error (MSE), which has two fundamental weaknesses: its point-wise error computation fails to capture temporal relationships, and it does not account for inherent noise in the data. To overcome these limitations, we introduce the Residual-Informed Loss (RI-Loss), a novel objective function based on the Hilbert-Schmidt Independence Criterion (HSIC). RI-Loss explicitly models noise structure by enforcing dependence between the residual sequence and a random time series, enabling more robust, noise-aware representations. Theoretically, we derive the first non-asymptotic HSIC bound with explicit double-sample complexity terms, achieving optimal convergence rates through Bernstein-type concentration inequalities and Rademacher complexity analysis. This provides rigorous guarantees for RI-Loss optimization while precisely quantifying kernel space interactions. Empirically, experiments across eight real-world benchmarks and five leading forecasting models demonstrate improvements in predictive performance, validating the effectiveness of our approach. Code will be made publicly available to ensure reproducibility.

RI-Loss: A Learnable Residual-Informed Loss for Time Series Forecasting

The safety alignment of large language models (LLMs) often relies on reinforcement learning from human feedback (RLHF), which requires human annotations to construct preference datasets. Given the challenge of assigning overall quality scores to data, recent works increasingly adopt fine-grained ratings based on multiple safety rules. In this paper, we discover a robust phenomenon: Rules with higher rating entropy tend to have lower accuracy in distinguishing human-preferred responses. Exploiting this insight, we propose ENCORE, a simple entropy-guided method to compose multi-head rewards by penalizing rules with high rating entropy. Theoretically, we show that such rules yield negligible weights under the Bradley–Terry loss during weight optimization, naturally justifying their penalization. Empirically, ENCORE consistently outperforms strong baselines, including random and uniform weighting, single-head Bradley–Terry, and LLM-as-a-judge, etc. on RewardBench safety tasks. Our method is completely training-free, generally applicable across datasets, and retains interpretability, making it a practical and effective approach for multi-attribute reward modeling.

ENCORE: Entropy-guided Reward Composition for Multi-head Safety Reward Models

Implicit Neural Representations (INRs) have revolutionized signal processing and computer vision by modeling signals as continuous, differentiable functions parameterized by neural networks. However, INRs are prone to the spectral bias problem, limiting their ability to retain high-frequency information, and often struggle with noise robustness. Motivated by recent trends in iterative refinement processes, we propose Iterative Implicit Neural Representations (I-INRs), a novel plug-and-play framework that iteratively refines signal reconstructions to restore high-frequency details, improve noise robustness, and enhance generalization, ultimately delivering superior reconstruction quality. I-INRs integrate seamlessly into existing INR architectures with only a 0.5–2\% increase in parameters. During reconstruction, the iterative refinement adds just 0.8–1.6\% additional FLOPs over the baseline while delivering a substantial performance boost of up to +2.0 PSNR. Extensive experiments demonstrate that I-INRs consistently outperform WIRE, SIREN, and Gauss across diverse computer-vision tasks, including image fitting, image denoising, and object-occupancy prediction.

I-INR: Iterative Implicit Neural Representations

Stochastic dynamical systems have emerged as fundamental models across numerous application domains, providing powerful mathematical representations for capturing uncertain system behavior. In this paper, we address the problem of runtime safety and reach-avoid probability prediction for discrete-time stochastic systems with online observations, i.e., estimating the probability that the system satisfies a given safety or reach-avoid specification. Unlike traditional approaches that rely solely on offline models, we propose a framework that incorporates real-time observations to dynamically refine probability estimates for safety and reach-avoid events. By introducing observation-aware barrier functions, our method adaptively updates probability bounds as new observations are collected, combining efficient offline computation with online backward iteration. This approach enables rigorous and responsive prediction of safety and reach-avoid probabilities under uncertainty. In addition to the theoretical guarantees, experimental results on benchmark systems demonstrate the practical effectiveness of the proposed method.

Runtime Safety and Reach-avoid Prediction of Stochastic Systems via Observation-aware Barrier Functions

Large Language Models (LLMs) have achieved impressive performance in complex reasoning problems. Their effectiveness highly depends on the specific nature of the task, especially the required domain knowledge. Existing approaches, such as mixture-of-experts, typically operate at the task level; they are too coarse to effectively solve the heterogeneous problems involving multiple subjects. This work proposes a novel framework that performs fine-grained analysis at subject level equipped with a designated multi-agent collaboration strategy for addressing heterogeneous problem reasoning. Specifically, given an input query, we first employ a Graph Neural Network to identify the relevant subjects and infer their interdependencies to generate an Subject-based Directed Acyclic Graph (S-DAG), where nodes represent subjects and edges encode information flow. Then we profile the LLM models by assigning each model a subject-specific expertise score, and select the top-performing one for matching corresponding subject of the S-DAG. Such subject-model matching enables graph-structured multi-agent collaboration where information flows from the starting model to the ending model over S-DAG. We curate and release multi-subject subsets of standard benchmarks (MMLU-Pro, GPQA, MedMCQA) to better reflect complex, real-world reasoning tasks. Extensive experiments show that our approach significantly outperforms existing task-level model selection and multi-agent collaboration baselines in accuracy and efficiency. These results highlight the effectiveness of subject-aware reasoning and structured collaboration in addressing complex and multi-subject problems.

S-DAG: A Subject-Based Directed Acyclic Graph for Multi-Agent Heterogeneous Reasoning

Dynamic LiDAR point cloud compression (LPCC) is crucial for the efficient transmission and storage of large-scale three-dimensional data in applications such as autonomous driving. However, many existing methods, which primarily focus on compressing geometric or motion information, face a fundamental limitation: they treat all points as equally important. This approach neglects the semantic priorities of a scene, resulting in inefficient bit allocation and particularly compromising the reconstruction quality of safety-critical regions, such as pedestrians and vehicles, which are vital to downstream perception tasks. To address these limitations, we propose R²D-LPCC, a relevance-ranking framework for region-adaptive LPCC that prioritizes fidelity in semantically important regions. Central to our approach is the Adaptive Relevance Learning (ARL) module, which integrates semantic context with uncertainty to evaluate regional significance and guide compression. We also introduce a Multi-scale Region-Adaptive Transform (MRAT) module to enhance semantic feature modeling and preserve fine-grained details in key areas. Additionally, we develop an adaptive multimodal motion estimation module to improve motion prediction in complex three-dimensional environments. Extensive experiments conducted on the SemanticKITTI benchmark demonstrate that R²D-LPCC significantly surpasses ten recent state-of-the-art methods, achieving a 45.48$\%$ BD-rate gain over the previous leading method, Unicorn, and a 98.58$\%$ gain over the GPCC standard, while ensuring superior reconstruction quality in semantically important regions.

R²D-LPCC: Relevance-Ranking Guided Region-Adaptive Dynamic LiDAR Point Cloud Compression

Solving energy-saving distributed heterogeneous flexible job shop scheduling problem (ES-DHFJSP) aims to enhance industrial production efficiency while minimizing energy consumption. State-of-the-art co-evolutionary algorithms have emerged as effective approaches for addressing ES-DHFJSP. However, existing methodologies demonstrate compromised convergence rates and excessive computational overhead when confronted with vast search spaces. In this work, we propose a novel solution space transformation-guided co-evolution algorithm (SSTCE) to overcome this limitation. In SSTCE, we first establish an inter-job similarity metric and incorporate constrained hierarchical clustering with optimal leaf ordering (CHC-OLO) to generate clustered job sets, which are subsequently utilized for population initialization that achieves a favorable balance between convergence and diversity. To enhance search capability in expansive solution spaces, we devise a dynamic solution space transformation mechanism that effectively reduces inefficient searches within the algorithm. Furthermore, we develop tailored local search strategies leveraging domain-specific knowledge of DHFJSP properties. Extensive experimental evaluations across 20 benchmark instances demonstrate that SSTCE significantly outperforms existing evolutionary algorithms in solving ES-DHFJSP.

Downloads

Next from AAAI 2026

LAMDA: Two-Phase HPO via Learning Prior from Low-Fidelity Data

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

LAMDA: Two-Phase HPO via Learning Prior from Low-Fidelity Data

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads