Singapore

Fair clustering has attracted increased attention in recent years. In this work, we study the individually fair $k$-means problem in Euclidean space. While single-swap local search methods have achieved near-linear running time and constant approximation guarantees, their performance often depends on the aspect ratio $\Delta$ of the dataset (the ratio between the diameter and the minimum interpoint distance of the dataset). How to apply multi-swap local search while obtaining linear running time with better approximation ratio is still a challenging task. To address this, we introduce a collaborative initialization framework for individually fair $k$-means that integrates greedy with sampling techniques. This framework eliminates dependence on the aspect ratio $\Delta$ and yields an $(O(1), 4)$-bicriteria approximation in linear time. While the current state-of-the-art near-linear time algorithm achieves a $(2000, 6)$-bicriteria approximation in $O(ndk^2 \log(n\Delta))$ time under the assumption that optimal centers are identical to their corresponding centroids, this assumption is generally not satisfied under individual fairness constraint. In contrast, we propose a multi-swap local search algorithm that improves the approximation guarantee to $(62, 7)$. Our method runs in linear time $O(nd \cdot \mathrm{poly}(k))$ with constant probability and eliminates the need for this restrictive assumption. We validate our theoretical results through extensive experiments on both real-world and synthetic datasets, including large-scale benchmarks with up to 100 million points. Our empirical evaluation demonstrates superior performance in terms of clustering quality and computational efficiency, along with scalability under varying parameter settings.

AAAI 2026

Linear Time Algorithms for Individually Fair k-means via Multi-Swap Local Search

individually fair

local search

clustering

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Modern Agentic AI systems plan, reason, and act across multiple steps, creating execution patterns that are difficult to interpret. Existing observability platforms track prompt I/O and operational metrics but require manual inspection of traces to reconstruct structure and reasoning. We present AgentGraph, which converts execution logs into interactive knowledge graphs and actionable insights. Nodes represent agents, tasks, tools, data inputs/outputs, and humans, while typed edges capture relations such as inputs consumed, tasks delegated or sequenced, tools required or used, outputs produced and delivered, and interventions from agents or humans. Each graph element links to its exact trace span, ensuring verifiability. Building on this representation, AgentGraph enables two analyses: qualitative trace-grounded failure detection and optimisation recommendations, and quantitative robustness evaluation via perturbation testing and causal attribution.

AgentGraph: Trace-to-Graph Platform for Interactive Analysis and Robustness Testing in Agentic AI Systems

Large Language Models (LLMs) have shown improved generation performance through retrieval-augmented generation (RAG) following the retriever-reader paradigm, which supplements model inputs with externally retrieved knowledge. However, prior work often evaluates RAG holistically, assessing the retriever and reader jointly, making it difficult to isolate the true contribution of retrieval, particularly given the prompt sensitivity of LLMs used as readers. We introduce Spectrum Projection Score (SPS), a lightweight, supervision‑free metric that lets the reader gauge the semantic alignment of a retrieved summary with its hidden representation. SPS projects the passage summary embeddings onto the reader’s principal subspace and uses the residual as an immediate quality signal: the smaller the residual, the more the reader “expects” the passage summary. Building on SPS we present xCompress, an inference‑time controller framework that dynamically samples, ranks, and compresses retrieval summary candidates. Extensive experiments on five QA benchmarks with four open source LLMs show that SPS not only enhances performance across a range of tasks but also provides a principled perspective on the interaction between retrieval and generation.

Beyond Perplexity: Let the Reader Select Retrieval Summaries via Spectrum Projection Score

Aligning text-to-image (T2I) diffusion models with human preferences has emerged as a critical research challenge.
While Direct Preference Optimization (DPO) has established a foundation for preference learning in large language models
(LLMs), its extension to diffusion models remains limited in alignment performance. In this work, we propose an enhanced
version of Diffusion-DPO by introducing a stable reference model update strategy. This strategy facilitates the exploration
of better alignment solutions while maintaining training stability. Moreover, we design a timestep-aware optimization
strategy that further boosts performance by addressing preference learning imbalance across timesteps. 
Through the synergistic combination of our exploration and timestep-aware optimization, our method significantly improves the alignment
performance of Diffusion-DPO on human preference evaluation benchmarks, achieving state-of-the-art results.

Rethinking Direct Preference Optimization in Diffusion Models

We study the fair allocation of indivisible goods across groups of agents, where each agent fully enjoys all goods allocated to their group.
We focus on groups of two (*couples*) and other groups of small size.
For two couples, an EF1 allocation — one in which all agents find their group's bundle no worse than the other group's, up to one good — always exists and can be found efficiently.
For three or more couples EF1 allocations need not exist.

Turning to proportionality, we show that, whenever groups have size at most $k$, a PROP$k$ allocation exists and can be found efficiently.
In fact, our algorithm additionally guarantees (fractional) Pareto optimality, and PROP1 to the first agent in each group, PROP2 to the second, etc., for an arbitrary agent ordering.
In special cases, we show that there are PROP1 allocations for any number of couples.

Fair Division Among Couples and Small Groups

Subset selection under budget constraints is critical in applications like multi-robot patrolling, crime deterrence, and targeted marketing, where multiple agents must jointly select targets and plan feasible routes. We formalize this challenge as Multi-Subset Selection with Budget-Constrained Routing (MSS-BCR), involving complex, non-additive cost structures that defy traditional methods. We propose GRIP, a graph-based framework integrating spatial reward fields and policy learning to enable coordinated, budget-aware target selection and routing. GRIP uses attention-based embeddings and constraint-triggered pruning with utility recovery to produce high-quality, feasible solutions. Experiments based on multiple synthetic and real-world datasets show GRIP outperforms baselines in reward efficiency and scalability across varied scenarios.

GRIP: Latent Field-Guided Graph Policy for Budget-Constrained Multi-Agent Routing

Dependency Quantified Boolean Formulas (DQBF) generalize QBF by explicitly specifying which universal variables each existential variable depends on, instead of relying on a linear quantifier order. The satisfiability problem of DQBF is NEXP-complete, and many hard problems can be succinctly encoded as DQBF. Recent work has revealed a strong analogy between DQBF and SAT: $k$-DQBF (with $k$ existential variables) is a succinct form of $k$-SAT, and satisfiability is NEXP-complete for $3$-DQBF but PSPACE-complete for $2$-DQBF, mirroring the complexity gap between $3$-SAT (NP-complete) and $2$-SAT (NL-complete).

Motivated by this analogy, we study the model counting problem for DQBF, denoted $\#$DQBF. Our main theoretical result is that $\#$2-DQBF is $\#$EXP-complete, where $\#$EXP is the exponential-time analogue of $\#$P. This parallels Valiant's classical theorem stating that $\#$2-SAT is $\#$P-complete. As a direct application, we show that first-order model counting (FOMC) remains $\#$EXP-complete even when restricted to a PSPACE-decidable fragment of first-order logic and domain size two.

Building on recent successes in reducing 2-DQBF satisfiability to symbolic model checking, we develop a dedicated 2-DQBF model counter. Using a diverse set of crafted instances, we experimentally evaluated it against a baseline that expands 2-DQBF formulas into propositional formulas and applies propositional model counting. While the baseline worked well when each existential variable depends on few variables, our implementation scaled significantly better to larger dependency sets. 

Missing details, code and data can be found in the supplementary material.

Model Counting for Dependency Quantified Boolean Formulas

Recent advances in parameter-efficient transfer learning have demonstrated the utility of composing LoRA adapters from libraries of pretrained modules. However, most existing approaches rely on simple retrieval heuristics or uniform averaging, which overlook the latent structure of task relationships in representation space. We propose a new framework for adapter reuse that moves beyond retrieval, formulating adapter composition as a geometry-aware sparse reconstruction problem. Specifically, we represent each task by a latent prototype vector derived from the base model’s encoder and aim to approximate the target task prototype as a sparse linear combination of retrieved reference prototypes, under an $\ell_1$-regularized optimization objective. The resulting combination weights are then used to blend the corresponding LoRA adapters, yielding a composite adapter tailored to the target task. This formulation not only preserves the local geometric structure of the task representation manifold, but also promotes interpretability and efficient reuse by selecting a minimal set of relevant adapters. We demonstrate the effectiveness of our approach across multiple domains—including medical image segmentation, medical report generation and image synthesis. Our results highlight the benefit of coupling retrieval with latent geometry-aware optimization for improved zero-shot generalization.

Beyond Adapter Retrieval: Latent Geometry-Preserving Composition via Sparse Task Projection

Model discovery aims to uncover governing differential equations of dynamical systems directly from experimental data. Benchmarking such methods is essential for tracking progress and understanding trade-offs in the field. While prior efforts have focused mostly on identifying single equations, typically framed as symbolic regression, there remains a lack of comprehensive benchmarks for discovering dynamical models. To address this, we introduce MDBench, an open-source benchmarking framework for evaluating model discovery methods on dynamical systems. MDBench assesses 12 algorithms on 14 partial differential equations (PDEs) and 63 ordinary differential equations (ODEs) under varying levels of noise. Evaluation metrics include derivative prediction accuracy, model complexity, and equation fidelity. We also introduce seven challenging PDE systems from fluid dynamics and thermodynamics, revealing key limitations in current methods. Our findings illustrate that linear methods and genetic programming methods achieve the lowest prediction error for PDEs and ODEs, respectively. Moreover, linear models are in general more robust against noise. MDBench accelerates the advancement of model discovery methods by offering a rigorous, extensible benchmarking framework and a rich, diverse collection of dynamical system datasets, enabling systematic evaluation, comparison, and improvement of equation accuracy and robustness.

MDBench: Benchmarking Data-Driven Methods for Model Discovery

Prompt optimization methods have demonstrated significant effectiveness in aligning black-box large language models (LLMs). In parallel, inference scaling strategies such as Best-of-N Sampling and Majority Voting have also proven to enhance alignment and performance by trading off computation. However, existing prompt optimization approaches are inference strategy agnostic; that is, they optimize prompts without regard to the inference strategy employed during deployment. This constitutes a significant methodological gap, as our empirical and theoretical analysis reveals a strong interdependence between these two paradigms. Moreover, we find that user preferences regarding trade-offs among multiple objectives and inference budgets substantially influence the choice of prompt and inference configuration. To address this gap, we introduce a unified novel framework named IAPO (Inference-Aware Prompt Optimization) that jointly optimizes the prompt and inference scale, while being aware of the inference budget and different task objectives. We then develop a fixed-budget training algorithm for IAPO, which we call PSST (Prompt Scaling via Sequential Trimming), and analyze finite-budget guarantees on error probability. Finally, we evaluate the effectiveness of PSST on six different tasks, including multi-objective text generation and reasoning, and demonstrate the critical role of incorporating inference-awareness when aligning black-box LLMs through prompt optimization.

Inference-Aware Prompt Optimization for Aligning Black-Box Large Language Models

Large language models (LLMs) have garnered significant interest in AI community. Despite their impressive generation capabilities, they have been found to produce misleading or fabricated information, a phenomenon known as hallucinations. Consequently, hallucination detection has become critical to ensure the reliability of LLM-generated content. One primary challenge in hallucination detection is the scarcity of well-labeled datasets containing both truthful and hallucinated outputs. To address this issue, we introduce **P** rompt-guided data __A__ ugmented ha __L__ lucination d __E__ tection (PALE), a novel framework that leverages prompt-guided responses from LLMs as data augmentation for hallucination detection. This strategy can generate both truthful and hallucinated data under prompt guidance at a relatively low cost. To more effectively evaluate the truthfulness of the sparse intermediate embeddings produced by LLMs, we introduce an estimation metric called the Contrastive Mahalanobis Score (CM Score). This score is based on modeling the distributions of truthful and hallucinated data in the activation space. CM Score employs a matrix decomposition approach to more accurately capture the underlying structure of these distributions. Importantly, our framework does not require additional human annotations, offering strong generalizability and practicality for real-world applications. Extensive experiments demonstrate that PALE achieves superior hallucination detection performance, outperforming the competitive baseline by a significant margin of 6.55\%.

Downloads

Next from AAAI 2026

AgentGraph: Trace-to-Graph Platform for Interactive Analysis and Robustness Testing in Agentic AI Systems

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES