Singapore

We present $\textit{Infinite-Story}$, a training-free framework for consistent text-to-image (T2I) generation tailored for multi-prompt storytelling scenarios. Built upon a scale-wise autoregressive model, our method addresses two key challenges in consistent T2I generation: identity inconsistency and style inconsistency. To overcome these issues, we introduce three complementary techniques: $\textit{Identity Prompt Replacement}$, which mitigates context bias in text encoders to align identity attributes across prompts; and a unified attention guidance mechanism comprising $\textit{Adaptive Style Injection}$ and $\textit{Synchronized Guidance Adaptation}$, which jointly enforce global style and identity appearance consistency while preserving prompt fidelity. Unlike prior diffusion-based approaches that require fine-tuning or suffer from slow inference, Infinite-Story operates entirely at test time, delivering high identity and style consistency across diverse prompts. Extensive experiments demonstrate that our method achieves state-of-the-art generation performance, while offering over 6$\times$ faster inference (1.72 seconds per image) than the existing fastest consistent T2I models, highlighting its effectiveness and practicality for real-world visual storytelling.

AAAI 2026

Infinite-Story: A Training-Free Consistent Text-to-Image Generation

consistent text-to-image generation

training-free

autoregressive model

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

My research focuses on making LLM training and serving
widely accessible to academia and smaller organizations by
reducing dependence on proprietary data and heavy compute.
In this talk, I will present a coherent framework that
unifies data curation without labels, zero-data
self-evolution, reward calibration for reliable confidence
estimation, and computation-efficient language model
inference.

Breaking the Resource Monopoly: LLM Post-Training and Serving with Modest Data and Compute

Differentially Private Stochastic Gradient Descent (DPSGD) is widely used to train deep neural networks with formal privacy guarantees. However, the addition of differential privacy (DP) and per-sample gradient clipping often degrades model accuracy by introducing both noise and bias. Existing techniques typically address only one of these issues, as reducing DP noise can exacerbate clipping bias and vice versa. In this paper, we propose a novel method, DP-PMLF, which integrates per-sample momentum with a low-pass filtering strategy to simultaneously mitigate DP noise and clipping bias. Our approach uses per-sample momentum to smooth gradient estimates prior to clipping, thereby reducing sampling variance, and employs a post-processing low-pass filter to attenuate high-frequency DP noise without consuming additional privacy budget. We provide a theoretical analysis demonstrating an improved convergence rate under rigorous DP guarantees, and our empirical evaluations reveal that DP-PMLF significantly enhances the privacy-utility trade-off compared to several state-of-the-art DPSGD variants.

Enhancing DPSGD via Per-Sample Momentum and Low-Pass Filtering

In standard fair division models, we assume that all agents are selfish. However, in many scenarios, division of resources has a direct impact on the whole group or even society. Therefore, we study fair allocations of indivisible items that, at the same time, maximize social impact. In this model, each agent is associated with two additive functions that define their value and social impact for each item. The goal is to allocate items so that the social impact is maximized while maintaining some fairness criterion. We reveal that the complexity of the problem heavily depends on whether the agents are socially aware, i.e., they take into consideration the social impact functions.
For socially unaware agents, we prove that the problem is NP-hard for a variety of fairness notions, and that it is tractable only for very restricted cases, e.g., if for every agent valuation equals social impact and it is binary. On the other hand, social awareness allows for fair allocations that maximize social impact, and such allocations can be computed in polynomial time. Interestingly, the problem becomes again intractable as soon as the definition of social awareness is relaxed.

Dividing Indivisible Items for the Benefit of All: It Is Hard to Be Fair Without Social Awareness

With the rapid deployment of Chinese large language models (LLMs), culturally-grounded bias evaluation remains understudied due to the dominance of English benchmarks and simplistic Chinese scenarios. To address this, we propose GeWu, a comprehensive benchmark featuring a culturally-aware dataset of 60,192 questions spanning 14 social groups with fine-grained Chinese contexts—significantly exceeding existing resources in breadth and depth. Our two-stage evaluation first quantifies bias via multiple-choice questions using a novel probability-based scoring mechanism to sensitively capture bias tendencies, distilling high-bias scenarios into GeWu-1K. This refined subset then enables multi-turn dialogue evaluations for in-depth analysis under realistic conditions. Experiments reveal that GeWu effectively exposes social biases in state-of-the-art Chinese LLMs, with 13.93% of scenarios eliciting universal bias across all models. This highlights persistent challenges and provides actionable insights for bias mitigation in Chinese contexts.

GeWu: A Culturally-Grounded Chinese Benchmark for Multi-Stage Social Bias Evaluation in Large Language Models

In recent years, RF fingerprinting (RFF) has emerged as a promising technology for wireless device authentication. 
However, temporal variations in device load and temperature, along with channel effects, lead to inconsistencies in RFF distributions between training and testing phases. 
As a result, deep learning (DL)-based recognition models often suffer from degraded performance. 
To address this problem, we propose the first test-time-adaptation (TTA) approach to improve the domain generalization ability of RFF recognition models. 
We first analyze the causes of time-varying RFF distribution shifts, such as carrier frequency offset (CFO), and develop a physical impairment-based data augmentation strategy.
Based on this, we further propose a physically information-aware prototype to guide the model for TTA.
Our method requires no model retraining or labeled test samples, and is a lightweight, nonparametric solution.
Finally, our approach is extensively evaluated using mobile phones with the IEEE 802.11 orthogonal frequency division multiplexing (OFDM) system, which demonstrates that our scheme can effectively
improve RFF average recognition performance by about 7.8%.

RFF-TTA: Physical Information-Aware Prototype for Temporally Varying RF Fingerprinting Online Test-Time-Adaptation

Structured Electronic Health Record (EHR) data stores patient information in relational tables and plays a central role in clinical decision-making. 
Recent advances have explored the use of large language models (LLMs) to process such data, showing promise across various clinical tasks.
However, the absence of standardized evaluation frameworks and clearly defined tasks makes it difficult to systematically assess and compare LLM performance on structured EHR data.
To address these evaluation challenges, we introduce EHRStruct, a benchmark specifically designed to evaluate LLMs on structured EHR tasks.
EHRStruct defines 11 representative tasks spanning diverse clinical needs and includes 2,200 task-specific evaluation samples derived from two widely used EHR datasets.
We use EHRStruct to evaluate 20 advanced and representative LLMs, covering both general and medical models.
We further analyze key factors influencing model performance, including input formats, few-shot generalisation, and finetuning strategies, and compare results with 11 state-of-the-art LLM-based enhancement
methods for structured data reasoning. 
Our results indicate that many structured EHR tasks place high demands on the understanding and reasoning capabilities of LLMs.
In response, we propose SEMaster, a code-augmented method that achieves state-of-the-art performance and offers practical insights to guide future research.

EHRStruct: A Comprehensive Benchmark Framework for Evaluating Large Language Models on Structured Electronic Health Record Tasks

The Open Digital Rights Language (ODRL) is a pivotal standard for automating data rights management. However, the inherent logical complexity of authorization policies, combined with the scarcity of high-quality ``Natural Language-to-ODRL" training datasets, impedes the ability of current methods to efficiently and accurately translate complex rules from natural language into the ODRL format. To address this challenge, this research leverages the potent comprehension and generation capabilities of Large Language Models (LLMs) to achieve both automation and high fidelity in this translation process. We introduce AgentODRL, a multi-agent system based on an Orchestrator-Workers architecture. The architecture consists of specialized Workers, including a Generator for ODRL policy creation, a Decomposer for breaking down complex use cases, and a Rewriter for simplifying nested logical relationships. The Orchestrator agent dynamically coordinates these Workers, assembling an optimal pathway based on the complexity of the input use case. Specifically, we enhance the ODRL Generator by incorporating a validator-based syntax strategy and a semantic reflection mechanism powered by a LoRA-finetuned model, significantly elevating the quality of the generated policies. Extensive experiments were conducted on a newly constructed dataset comprising 770 use cases of varying complexity, all situated within the context of data spaces. The results, evaluated using ODRL syntax and semantic scores, demonstrate that our proposed Orchestrator-Workers system, enhanced with these strategies, achieves superior performance on the ODRL generation task.

AgentODRL: A Large Language Model-based Multi-agent System for ODRL Generation

Stochastic multi-objective optimization (SMOOP) requires ranking multivariate distributions; yet, most empirical studies perform scalarization, which loses information and is unreliable. Based on the optimal transport theory, we introduce the center-outward $q$-dominance relation and prove it implies strong first-order stochastic dominance (FSD). Also, we develop an empirical test procedure based on $q$-dominance, and derive an explicit sample size threshold, $n^*(\delta)$, to control the Type I error.

We verify the usefulness of our approach in two scenarios: (1) as a ranking method in hyperparameter tuning; (2) as a selection method in multi-objective optimization algorithms. For the former, we analyze the final stochastic Pareto sets of seven multi-objective hyperparameter tuners on the YAHPO-MO benchmark tasks with $q$-dominance, which allows us to compare these tuners when the expected hypervolume indicator (HVI, the most common performance metric) of the Pareto sets becomes indistinguishable. For the latter, we replace the mean value-based selection in the NSGA-II algorithm with $q$-dominance, which shows a superior convergence rate on noise-augmented ZDT benchmark problems.

These results establish center-outward $q$-dominance as a principled, tractable foundation for seeking truly stochastically dominant solutions for SMOOPs.

Center-Outward q-Dominance: A Sample-Computable Proxy for Strong Stochastic Dominance in Stochastic Multi-Objective Optimisation

Interpretability methods have recently gained significant attention, particularly in the context of large language models, enabling insights into linguistic representations, error detection, and model behaviors such as hallucinations and repetitions. However, these techniques remain underexplored in automatic speech recognition (ASR), despite their potential to advance both the performance and interpretability of ASR systems. In this work, we adapt and systematically apply established interpretability methods such as logit lens, linear probing, and activation patching, to examine how acoustic and semantic information evolves across layers in ASR systems. Our experiments reveal previously unknown internal dynamics, including specific encoder-decoder interactions responsible for repetition hallucinations and semantic biases encoded deep within acoustic representations. These insights demonstrate the benefits of extending and applying interpretability techniques to speech recognition, opening promising directions for future research on improving model transparency and robustness.

Beyond Transcription: Mechanistic Interpretability in ASR

This article proposes a Learning-from-Demonstration (LfD)
method using probability densities on the workspaces of
robot manipulators. The method, named
PRobabilistically-Informed Motion Primitives (PRIMP),
learns the probability distribution of the end effector
trajectories in the 6-D workspace that includes both
positions and orientations. It is able to adapt to new
situations such as novel via points with uncertainty and a
change of viewing frame. The method itself is
robot-agnostic, in that the learned distribution can be
transferred to another robot with the adaptation to its
workspace density. Workspace-STOMP, a new version of the
existing STOMP motion planner, is also introduced, which
can be used as a postprocess to improve the performance of
PRIMP and any other reachability-based LfD method. The
combination of PRIMP and Workspace-STOMP can further help
the robot avoid novel obstacles that are not present during
the demonstration process. The proposed methods are
evaluated with several sets of benchmark experiments. PRIMP
runs more than five times faster than existing
state-of-the-art methods while generalizing trajectories
more than twice as close to both the demonstrations and
novel desired poses. They are then combined with our lab's
robot imagination method that learns object affordances,
illustrating the applicability to learn tool use through
physical experiments.

Downloads

Next from AAAI 2026

Breaking the Resource Monopoly: LLM Post-Training and Serving with Modest Data and Compute

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES