Singapore

Referring Image Segmentation (RIS), which aims to segment specific objects based on natural language descriptions, plays an essential role in vision-language understanding. Despite its progress in remote sensing applications, RIS under Low-Altitude Drone (LAD) scenarios remains underexplored, as existing datasets and methods are typically designed for high-altitude and static-view imagery. They struggled to handle the unique characteristics of LAD views, such as diverse viewpoints and high object density. In this paper, we propose RIS-LAD, the first fine-grained RIS benchmark tailored for LAD scenarios, featuring 13,871 meticulously annotated image-text-mask triplets collected from real-world drone footage with emphasis on small, densely cluttered objects and multi-view perspectives. Additionally, we propose the Semantic-Aware Adaptive Reasoning Network, which decomposes and adaptively routes semantic information to different network stages rather than uniformly injecting all linguistic features. Specifically, the Category-Dominated Linguistic Enhancement aligns visual features with object categories during early encoding, while the Adaptive Reasoning Fusion Module dynamically selects semantic cues across scales to enhance reasoning in complex scenes. Extensive experiments reveal that RIS-LAD presents substantial challenges to state-of-the-art RIS algorithms, and also demonstrate the effectiveness of our proposed model in addressing these challenges. RIS-LAD is publicly released and is available at: https://github.com/AHideoKuzeA/RIS-LAD-A-Benchmark-and-Model-for-Referring-Low-Altitude-Drone-Image-Segmentation.

AAAI 2026

RIS-LAD: A Benchmark and Model for Referring Image Segmentation in Low-Altitude Drone Imagery

remote sensing image process

drone

referring image segmentation

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Underwater image enhancement (UIE) aims to address image degradation caused by water absorption and scattering effects. Despite significant progress in deep learning-based UIE methods, existing approaches still face key challenges due to the neglect of physical imaging principle. Moreover, while current Mamba models achieve global modeling via multi-directional scanning, their local sequential strategy lacks sufficient global context. To this end, we propose a novel Physical Model-Guided Global Mamba (PGMamba) that combines the efficient sequential modeling capability of Mamba with underwater imaging physical model. Specifically, we first design a Spatial-Aware Global Mamba (SAGMamba) that achieves efficient long-range dependency modeling through a spatial-aware ranking strategy with global context information. Second, we develop a Physical Model-Guided Feed-Forward Network (PMGFFN) that explicitly incorporates underwater optical imaging principles into the network architecture. Extensive experimental results and comprehensive ablation studies demonstrate the outstanding performance and importance of our proposed method.

PGMamba: A Physical Model-Guided Global Mamba for Underwater Image Enhancement

Dual-lens video inpainting aims to simultaneously restore missing or corrupted contents in videos captured by each lens of binocular systems. Although preliminary explorations have been conducted, existing methods still face two key challenges: limited exploitation of long-range reference information and inadequate modeling of inter-lens consistency in non-standard binocular systems. In this paper, we propose a novel dual-lens video inpainting framework named DLVINet, which addresses these challenges with two core components. Firstly, we develop a sparse spatial-temporal transformer (SSTT) that effectively utilizes the information from distant frames to complete the video contents of each lens individually. By employing sparse spatial-temporal attention with a channel selection mechanism, SSTT not only restores missing regions, but also avoids introducing redundant or irrelevant information. Furthermore, SSTT introduces a multi-scale feed-forward network to enrich the multi-scale representation of completed features. Secondly, we design a cross-lens texture transformer (CLTT) to model inter-lens consistency. By interacting with corresponding features between lenses under the guidance of cross-attention, CLTT captures global inter-lens correspondences. Such a design enables effective cross-view information modeling without being constrained by horizontal parallax, which is particularly critical for non-standard binocular systems. Extensive experiments demonstrate the effectiveness of our DLVINet.

DLVINet: Advancing Dual-Lens Video Inpainting Beyond Parallax Constraints

Strategic machine learning investigates scenarios where agents manipulate their features to receive favorable decisions from predictive models. To address fairness concerns intrinsic to strategic classification, recent work has introduced group-specific fairness constraints. However, current fairness-aware approaches face a fundamental dilemma in the issue of fairness exposure: making these constraints public enables strategic manipulation and can lead to fairness reversal, while keeping them hidden may reduce social welfare and discourage genuine improvement.
To fill this gap, we subsequently propose the problem of Partial Fairness Awareness (PFA), as our theoretical analysis informs that such a dilemma can be mitigated by releasing the candidate set of fairness constraints and concealing the grounding constraint. 
To be specific, we introduce a belief-guided strategic mechanism wherein agents iteratively interact with the decision system and maintain a belief distribution over the candidate set of fairness constraints. This belief-guided process enables agents, through iterative interaction and feedback, to update their belief distribution over the candidate set, thereby gradually aligning their belief with the grounding fairness constraint employed by the system.
Extensive experiments on real-world and synthetic datasets demonstrate that PFA achieves lower group fairness gaps, higher acceptance of truly qualified individuals, and more stable outcomes compared to fully public or private fairness regimes.

Partial Fairness Awareness: Belief-Guided Strategic Mechanism for Strategic Agents

Point cloud data augmentation is critical to improving the generalization of 3D deep learning models. However, existing methods often fail to preserve the underlying manifold structure, leading to semantic distortion or topology violation. This causes models to learn untrustworthy features, thereby limiting the representational ability of the model. To overcome these limitations, we propose ManiPoint, a novel point cloud augmentation framework based on diffeomorphism that explicitly preserves manifold structure during deformation. ManiPoint constructs diffeomorphic transformations via continuous differentiable mappings, ensuring topological consistency and geometric continuity between original and augmented data. To prevent excessive distortion and ensure semantic consistency, we introduce a controllable deformation mechanism that quantitatively constrains the augmentation magnitude and enables fine-grained control over the deformation space. We further provide theoretical analysis, indicating that, compared with topologically inconsistent methods, ManiPoint reduces empirical and vicinal risks by generating diverse and structurally reliable samples. Extensive experiments and visualizations on object-level datasets demonstrate that ManiPoint produces high-quality augmentations and consistently improves model robustness over existing baselines. Meanwhile, the scalability of our method was further verified on the scene-level datasets.

Shaping Without Tearing: Controllable Diffeomorphic Deformations for Topology-Preserving 3D Point Cloud Augmentation

Precise modeling of lane topology is essential for autonomous driving, as it directly impacts navigation and control decisions. Existing methods typically represent each lane with a single query and infer topological connectivity based on the similarity between lane queries.
However, this kind of design struggles to accurately model complex lane structures, leading to unreliable topology prediction. In this view, we propose a Fine-Grained lane topology reasoning framework (TopoFG). It divides the procedure from bird’s-eye-view (BEV) features to topology prediction via fine-grained queries into three phases, i.e., Hierarchical Prior Extractor (HPE), Region-Focused Decoder (RFD), and Robust Boundary-Point Topology Reasoning (RBTR). Specifically, HPE extracts global spatial priors from the BEV mask and local sequential priors from in-lane keypoint sequences to guide subsequent fine-grained query modeling. RFD constructs fine-grained queries by integrating the spatial and sequential priors. It then samples reference points in RoI regions of the mask and applies cross-attention with BEV features to refine the query representations of each lane. RBTR models lane connectivity based on boundary-point query features and further employs a topological denoising strategy to reduce matching ambiguity. By integrating spatial and sequential priors into fine-grained queries and applying a denoising strategy to boundary-point topology reasoning, our method precisely models complex lane structures and delivers trustworthy topology predictions. Extensive experiments on the OpenLane-V2 benchmark demonstrate that TopoFG achieves new state-of-the-art performance, with an OLS of 48.0% on subset_A and 45.4% on subset_B.

Fine-Grained Representation for Lane Topology Reasoning

Prevailing quantization techniques in Learned Image Compression (LIC) typically employ a static, uniform bit-width across all layers, failing to adapt to the highly diverse data distributions and sensitivity characteristics inherent in LIC models. This leads to a suboptimal trade-off between performance and efficiency. In this paper, we introduce DynaQuant, a novel framework for dynamic mixed-precision quantization that operates on two complementary levels. First, we propose content-aware quantization, where learnable scaling and offset parameters dynamically adapt to the statistical variations of latent features. This fine-grained adaptation is trained end-to-end using a novel Distance-aware Gradient Modulator (DGM), which provides a more informative learning signal than the standard Straight-Through Estimator. Second, we introduce a data-driven, dynamic bit-width selector that learns to assign an optimal bit precision to each layer, dynamically reconfiguring the network's precision profile based on the input data. Our fully dynamic approach offers substantial flexibility in balancing rate-distortion (R-D) performance and computational cost. Experiments demonstrate that DynaQuant achieves R-D performance comparable to full-precision models while significantly reducing computational and storage requirements, thereby enabling the practical deployment of advanced LIC on diverse hardware platforms.

DynaQuant: Dynamic Mixed-Precision Quantization for Learned Image Compression

Hierarchical clustering is a fundamental machine-learning technique for grouping data points into dendrograms.
However, existing hierarchical clustering methods encounter two primary challenges: 1) Most methods specify dendrograms without a global objective.
2) Graph-based methods often neglect the significance of graph structure, optimizing objectives on complete or static predefined graphs.
In this work, we propose $\textbf{Hyp}$erbolic $\textbf{C}$ontinuous $\textbf{S}$tructural $\textbf{E}$ntropy neural networks, namely HypCSE, for structure-enhanced continuous hierarchical clustering.
Our key idea is to map data points in the hyperbolic space and minimize the relaxed continuous structural entropy (SE) on structure-enhanced graphs. 
Specifically, we encode graph vertices in hyperbolic space using hyperbolic graph neural networks and minimize approximate SE defined on graph embeddings.
To make the SE objective differentiable for optimization, we reformulate it into a function using the lowest common ancestor (LCA) on trees and then relax it into continuous SE (CSE) by the analogy of hyperbolic graph embeddings and partitioning trees.
To ensure a graph structure that effectively captures the hierarchy of data points for CSE calculation, we employ a graph structure learning (GSL) strategy that updates the graph structure during training.
Extensive experiments on seven datasets demonstrate the superior performance of HypCSE.

Hyperbolic Continuous Structural Entropy for Hierarchical Clustering

Knowledge Tracing (KT) aims to dynamically model a student’s mastery of knowledge concepts based on their historical learning interactions. Most current methods rely on single-point estimates, which cannot distinguish true ability from outburst or carelessness, creating ambiguity in judging mastery. To address this issue, we propose a Knowledge Mastery-State Disambiguation for Knowledge Tracing model (KeenKT), which represents a student’s knowledge state at each interaction using a Normal-Inverse-Gaussian (NIG) distribution, thereby capturing the fluctuations in student learning behaviors. Furthermore, we design an NIG-distance-based attention mechanism to model the dynamic evolution of the knowledge state. In addition, we introduce a diffusion-based denoising reconstruction loss and a distributional contrastive learning loss to enhance the model’s robustness. Extensive experiments on six public datasets demonstrate that KeenKT outperforms state-of-the-art KT models in terms of prediction accuracy and sensitivity to behavioral fluctuations. The proposed method yields the maximum AUC improvement of 5.85% and the maximum ACC improvement of 6.89%.

KeenKT: Knowledge Mastery-State Disambiguation for Knowledge Tracing

Large Reasoning Language Models (LRMs) have recently shown remarkable performance in complex reasoning tasks, but their extensive reasoning chains incur substantial computational overhead. 
To address this challenge, we propose Outlier-aware Reasoning Conciseness Adaptive Merge (ORCA), a novel plug-and-play model merging framework that leverages outlier activation patterns to fuse base models with reasoning models. Our ORCA introduces three key innovations: (1) adaptive alignment that reduces conflicts between disparate activation patterns during merging, (2) outlier-guided allocation that assigns merging coefficients proportional to each layer's reasoning importance as indicated by outlier concentrations, and (3) dynamic probe-based adjustment that adapts merging coefficients during inference based on input-specific activation characteristics. These strategies allow seamless integration into existing merging pipelines while creating unified models that maintain reasoning accuracy with significantly reduced response verbosity. Comprehensive evaluation across six benchmarks using Qwen and LLaMA models shows ORCA reduces average response length by 55\% while improving accuracy by 2.4$\sim$5.7\% over existing methods. Code is in the supplemental.

Outlier Matters: Efficient Long-to-Short Reasoning via Outlier-Guided Model Merging

Object detection in sonar images is a key technology in underwater detection systems. Compared to natural images, sonar images contain fewer texture details and are more susceptible to noise, making it difficult for non-experts to distinguish subtle differences between classes. This leads to their inability to provide precise annotation data for sonar images. Therefore, designing effective object detection methods for sonar images with extremely limited labels is particularly important. To address this, we propose a teacher-student framework called RSOD, which aims to fully learn the characteristics of sonar images and develop a pseudo-label strategy suitable for these images to mitigate the impact of limited labels. First, RSOD calculates a reliability score by assessing the consistency of the teacher's predictions across different views. To leverage this score, we introduce an object mixed pseudo-label method to tackle the shortage of labeled data in sonar images. Finally, we optimize the performance of the student by implementing a reliability-guided adaptive constraint. By taking full advantage of unlabeled data, the student can perform well even in situations with extremely limited labels. Notably, on the UATD dataset, our method, using only 5% of labeled data, achieves results that can compete against those of our baseline algorithm trained on 100% labeled data. We also collected a new dataset to provide more valuable data for research in the field of sonar.

Content not yet available

Next from AAAI 2026

PGMamba: A Physical Model-Guided Global Mamba for Underwater Image Enhancement

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES