Singapore

Recent advances in controllable text-to-image (T2I) generation have shown promising results in natural image generation. However, controllable remote sensing (RS) T2I generation remains a challenging task due to the unique characteristics and requirements of geospatial data. Existing methods struggle to effectively integrate diverse spatial control condition (e.g., edge maps, segmentation masks) into a coherent generation process. They often fail to model the complex spatial relationships among different geographic elements and maintain semantic consistency with textual descriptions, which are typically vague or incomplete in RS applications. Additionally,
constrained by the small scale, low description quality, and limited scene variety of existing datasets, these models tend
to produce outputs with structurally inconsistent layouts and visually unrealistic content. To address these issues, we propose
Any2RSI, a flexible framework for controllable RS T2I generation that supports the flexible combination of various control
conditions. At its core, Any2RSI introduces a Cross-Modal Multi-Control Adapter capable of extracting modality-agnostic
embeddings from heterogeneous inputs, enabling precise spatial guidance. Furthermore, to overcome the limitations of sparse and
ambiguous textual prompts commonly found in RS tasks, we design a Vision Language Model (VLM)-Empowered Enriched
Description Generation module. This module enhances input descriptions by integrating cross-modal semantic information,
generating richer and more accurate textual representations that guide the generation of semantically coherent images. Finally, to
mitigate the data scarcity in RS T2I generation task, we construct RST2I-110K, a new large-scale, multi-scene dataset containing
over 115,000 high quality RS images paired with detailed textual descriptions. Extensive experiments on both existing and newly
proposed datasets demonstrate that Any2RSI achieves state-of-the-art performance, significantly improving both the realism and
structural accuracy of generated RS imagery.

AAAI 2026

Any2RSI: Controllable Remote Sensing Text-to-Image Generation via Any Control and Enriched Description

cv: remote sensing / geospatial ai

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The Knowledge of Optimization And Learning Algorithms (KOALA) group studies how to integrate optimization, machine learning, and generative modeling to enable data-driven decision-making under uncertainty. We study decision-focused learning, embedding optimization as a differentiable layer to train models end-to-end for decision quality. We design scalable reinforcement learning algorithms for population and personalized healthcare, and develop efficient bilevel optimization methods for nested and multi-agent decision-making. These directions form a unified framework linking optimization and learning for impactful AI in healthcare. Through collaborations with hospitals and NGOs, our group designs and deploys algorithms for pediatric, diabetes, maternal, and mental health applications. Looking ahead, we aim to unite these foundations with generative AI to build theoretically grounded and socially responsible algorithms that advance trustworthy, real-world AI for health and beyond.

KOALA: Knowledge of Optimization and Learning Algorithms for Healthcare

Human artists can continuously refine their coarse sketches during artistic creation. This is quiet different from existing autoregressive generation, where a token is determined once sampled.
Aiming to flexibly refine the generated contents, this paper presents a Self-Calibrated AutoregressioN (SCAN) model capable of self-evaluating and refining generation quality without regenerating the entire image.
We unify image token generation and quality evaluation into a single autoregressive model, formulating both tasks as categorical prediction problems. 
During inference, the model first generates a coarse initial image, then iteratively refines the lowest-quality patches until satisfactory image quality is achieved. 
Experimental results demonstrate that SCAN effectively handles diverse real-world generation errors and achieves a promising balance between image quality and speed. For example, SCAN-XL achieves an FID of 2.10 and an IS of 326.1, surpassing the LlamaGen-XL by 1.29 (+38\%) in FID and 99.0 (+43.6\%) in IS, with a 5.6× speedup (19.76s → 3.56s).
Compared to recent works, SCAN improves FID and speed by +18.3\% and +23\% over VAR-d20, and by +7\% and +46\% over RandAR-XL. 
Code and models will be released to facilitate further exploration in self-calibrated content generation.

SCAN: Self-Calibrated AutoregressioN for High-Quality Visual Generation

We present VisAssist, the first large-scale video question-answering dataset with 13,413 real-world videos captured by visually impaired users, addressing a critical gap in assistive vision research. Unlike existing benchmarks relying on third-person footage, VisAssist provides authentic first-person perspectives that uniquely capture challenges in blind photography—including unconventional framing, motion artifacts, and frequent information omission. Benchmark evaluations of SOTA multimodal models reveal systematic limitations: severe deficiencies in spatial reasoning when processing dynamic first-person viewpoints, an inability to distinguish missing information from poor capture quality leading to hazardous hallucinations, and fragile text understanding especially for non-Latin scripts under suboptimal conditions. This work establishes a vital real-world benchmark and underscores the need for specialized architectures in visual assistance systems.

VisAssist: A Visually Impaired-Captured Video Question Answering Benchmark for Assistive Systems

Multi-objective molecular optimization is a fundamental yet inherently challenging task in drug discovery, as it requires simultaneously optimizing multiple, often conflicting, molecular properties. Although recent deep learning methods have shown promise, they often lack objective-specific specialization and dynamic coordination, making them ineffective in handling competing objectives and difficult to scale in complex, high-dimensional molecular design tasks. Inspired by the division of labor among domain experts in medicinal chemistry, we propose MAMO, a multi-agent framework for molecular design that simulates expert collaboration. Each agent specializes in optimizing a single objective, and their interactions are orchestrated by a central scheduling module that dynamically reallocates tasks based on evaluation feedback. This coordination mechanism enables interpretable and goal-conditioned optimization while adaptively balancing conflicting objectives. Extensive experiments on benchmark datasets demonstrate that MAMO consistently achieves superior performance in both objective quality and Pareto diversity, particularly in scenarios with strong inter-objective conflict. Our results highlight the potential of multi-agent coordination strategies for scalable and conflict-aware molecular design.

Expert-Inspired Multi-Agent Coordination for Multi-Objective Molecular Optimization

Language-guided long-horizon mobile manipulation has long been a grand challenge in embodied semantic reasoning, generalizable manipulation, and adaptive locomotion. Three fundamental limitations hinder progress: First, although large language models have shown promise in enhancing spatial reasoning and task planning through learned semantic priors, existing implementations remain confined to tabletop scenarios, failing to address the constrained perception and limited actuation ranges characteristic of mobile platforms. Second, current manipulation strategies exhibit insufficient generalization when confronted with the diverse object configurations encountered in open-world environments. Third, while crucial for practical deployment, the dual requirement of maintaining high platform maneuverability alongside precise end-effector control in unstructured settings remains understudied in the literature.

In this work, we present ODYSSEY, a unified mobile manipulation framework for agile quadruped robots equipped with manipulators, which seamlessly integrates high-level task planning with low-level whole-body control. To address the challenge of egocentric perception in language-conditioned tasks, we introduce a hierarchical planner powered by a vision-language model, enabling long-horizon instruction decomposition and precise action execution. At the control level, our novel whole-body policy achieves robust coordination of locomotion and manipulation across challenging terrains. We further present the first comprehensive benchmark for long-horizon mobile manipulation, evaluating diverse indoor and outdoor scenarios. Through successful sim-to-real transfer, we demonstrate the system’s generalization and robustness in real-world deployments, underscoring the practicality of legged manipulators in unstructured environments. Our work advances the feasibility of generalized robotic assistants capable of complex, dynamic tasks.

ODYSSEY: Open-World Quadrupeds Exploration and Manipulation for Long-Horizon Tasks

This paper investigates the problems large-scale distributed composite convex optimization, with motivations from a broad range of applications, including multi-agent systems, federated learning, smart grids, wireless sensor networks, compressed sensing, and so on. Stochastic gradient descent (SGD) and its variants are commonly employed to solve such problems. However, existing algorithms often rely on vanishing step sizes, strong convexity assumptions, or entail substantial computational overhead to ensure convergence or obtain favorable complexity. To bridge the gap between theory and practice, we integrate consensus optimization and operator splitting techniques (see Problem Reformulation) to develop a novel stochastic splitting algorithm, termed the stochastic distributed regularized splitting method (S-D-RSM). In practice, S-D-RSM performs parallel updates of proximal mappings and gradient information for only a randomly selected subset of agents at each iteration. By introducing regularization terms, it effectively mitigates consensus discrepancies among distributed nodes. In contrast to conventional stochastic methods, our theoretical analysis establishes that S-D-RSM achieves global convergence without requiring diminishing step sizes or strong convexity assumptions. Furthermore, it achieves an iteration complexity of $\mathcal{O}(1/\epsilon)$ with respect to both the objective function value and the consensus error. Numerical experiments show that S-D-RSM achieves up to 2--3$\times$ speedup compared to state-of-the-art baselines, while maintaining comparable or better accuracy. These results not only validate the algorithm’s theoretical guarantees but also demonstrate its effectiveness in practical tasks such as compressed sensing and empirical risk minimization. The code will be released after the anonymity period.

S-D-RSM: Stochastic Distributed Regularized Splitting Method for Large-Scale Convex Optimization Problems

Predicting the tensor properties of crystalline materials is a fundamental task in materials science. Unlike single-value property prediction, which is inherently invariant, tensor property prediction requires maintaining O(3) group tensor equivariance. Such equivariance constraint often requires specialized architecture designs to achieve effective predictions, inevitably introducing tremendous computational costs. Canonicalization, a classical technique for geometry, has recently been explored for efficient learning with symmetry. In this work, we revisit the problem of crystal tensor property prediction through the lens of canonicalization. Specifically, we demonstrate how polar decomposition, a simple yet efficient algebraic method, can serve as a form of canonicalization and be leveraged to ensure equivariant tensor property prediction. Building upon this insight, we propose a general O(3)-equivariant framework for efficient crystal tensor property prediction, referred to as GoeCTP. By utilizing canonicalization, GoeCTP achieves high efficiency without requiring the explicit incorporation of equivariance constraints into the network architecture. Experimental results indicate that GoeCTP achieves the best prediction performance and runs at most 13 times faster compared to existing state-of-the-art methods in benchmarking datasets, underscoring its effectiveness and efficiency.

Revisiting the Canonicalization for Fast and Accurate Crystal Tensor Property Prediction

Tasks that require character-level reasoning, such as counting or locating characters within words, remain challenging for contemporary language models. A common conjecture is that language models' reliance on subword units, rather than characters, contributes to their struggles with character-level tasks, yet recent studies offer conflicting conclusions about the role of tokenization, leaving its impact unclear. To address this gap, we introduce CharBench, a comprehensive benchmark of character-level tasks that is two orders of magnitude larger than existing alternatives.
We evaluate a diverse range of leading open-weight and proprietary models on CharBench and find that it presents a significant challenge to modern LLMs, with an average accuracy of 43.6% and 32.3% on some tasks.
We present an in-depth analysis of how intrinsic properties of words and their segmentations into tokens correspond to model performance. For counting tasks, we find that tokenization properties are weakly correlated with correctness, while the length of the queried word and the actual character count play a more significant part. In contrast, for tasks requiring intra-word positional understanding, performance is negatively correlated with the length of the token containing the queried character, suggesting that longer tokens obscure character position information for LLMs. We encourage future work to build on the benchmark and evaluation methodology introduced here as tools for improving model performance on such tasks.

CharBench: Evaluating the Role of Tokenization in Character-Level Tasks

Uniform-reward reinforcement learning from human feedback (RLHF), which trains a single reward model to represent the preferences of all annotators, fails to capture the diversity of opinions across sub-populations, inadvertently favoring dominant groups. The state-of-the-art, MaxMin-RLHF, addresses this by learning group-specific reward models, and by optimizing for the group receiving the minimum reward, thereby promoting fairness. However, we identify that a key limitation of MaxMin-RLHF is its poor performance when the minimum-reward group is a *minority*. To mitigate this drawback, we introduce a novel framework, termed *SharedRep-RLHF*. At its core, SharedRep-RLHF learns and leverages *shared traits* in annotations among various groups, in contrast to learning separate reward models across groups. We first show that MaxMin-RLHF is provably suboptimal in learning shared traits, and then quantify the sample complexity of SharedRep RLHF. Experiments across diverse natural language tasks showcase the effectiveness of ShareRep-RLHF compared to MaxMin-RLHF with a gain of up to 20% in win rate.

SharedRep-RLHF: A Shared Representation Approach to RLHF with Diverse Preferences

IT environments typically have logging mechanisms to monitor system health and detect issues. However, the huge volume of generated logs makes manual inspection impractical, highlighting the importance of automated log analysis in IT Software Support. In this paper, we propose a log analytics tool that leverages Large Language Models (LLMs) for log data processing and issue diagnosis, enabling the generation of automated insights and summaries. We further present a novel approach for efficiently running LLMs on CPUs to process massive log volumes in minimal time without compromising output quality. We share the insights and lessons learned from deployment of the tool - in production since March 2024 - scaled across 70 software products, processing over 2000 tickets for issue diagnosis, achieving a time savings of 300+ man hours and an estimated $15,444 per month in manpower costs compared to the traditional log analysis practices.

Downloads

Next from AAAI 2026

KOALA: Knowledge of Optimization and Learning Algorithms for Healthcare

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

KOALA: Knowledge of Optimization and Learning Algorithms for Healthcare

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads