Singapore

Infrared video has been of great interest in visual tasks under challenging environments, but often suffers from severe atmospheric turbulence and compression degradation. Existing video super-resolution (VSR) methods either neglect the inherent modality gap between infrared and visible images or fail to restore turbulence-induced distortions. Directly cascading turbulence mitigation (TM) algorithms with VSR methods leads to error propagation and accumulation due to the decoupled modeling of degradation between turbulence and resolution. We introduce \textbf{HATIR}, a \textbf{H}eat-\textbf{A}ware Diffusion for \textbf{T}urbulent \textbf{I}nfra\textbf{R}ed Video Super-Resolution, which injects heat-aware deformation priors into the diffusion sampling path to jointly model the inverse process of turbulent degradation and structural detail loss. Specifically, HATIR constructs a Phasor-Guided Flow Estimator, rooted in the physical principle that thermally active regions exhibit consistent phasor responses over time, enabling reliable turbulence-aware flow to guide the reverse diffusion process. To ensure the fidelity of structural recovery under nonuniform distortions, a Turbulence-Aware Decoder is proposed to selectively suppress unstable temporal cues and enhance edge-aware feature aggregation via turbulence gating and structure-aware attention. We built FILR-IVSR, the first dataset for turbulent infrared VSR, comprising paired LR-HR sequences from a FILR T1050sc camera ($1024 \times 768$) spanning 645 diverse scenes with varying camera and object motion conditions. This encourages future research in infrared VSR.

AAAI 2026

HATIR: Heat-Aware Diffusion for Turbulent Infrared Video Super-Resolution

infrared image

cv: low level & physics-based vision

low level vision

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Pan-sharpening aims to generate high-resolution multispectral (HRMS) images by integrating a high-resolution panchromatic (PAN) image with its corresponding low-resolution multispectral (MS) image. To achieve effective fusion, it is crucial to fully exploit the complementary information between the two modalities. Traditional CNN-based methods typically rely on channel-wise concatenation with fixed convolutional operators, which limits their adaptability to diverse spatial and spectral variations. While cross-attention mechanisms enable global interactions, they are computationally inefficient and may dilute fine-grained correspondences, making it difficult to capture complex semantic relationships. Recent advances in the Multimodal Diffusion Transformer (MMDiT) architecture have demonstrated impressive success in image generation and editing tasks. Unlike cross-attention, MMDiT employs in-context conditioning to facilitate more direct and efficient cross-modal information exchange. In this paper, we propose MMMamba, a cross-modal in-context fusion framework for pan-sharpening, with the flexibility to support image super-resolution in a zero-shot manner. Built upon the Mamba architecture, our design ensures linear computational complexity while maintaining strong cross-modal interaction capacity. Furthermore, we introduce a novel multimodal interleaved (MI) scanning mechanism that facilitates effective information exchange between the PAN and MS modalities. Extensive experiments demonstrate the superior performance of our method compared to existing state-of-the-art (SOTA) techniques across multiple tasks and benchmarks.

MMMamba: A Versatile Cross-Modal in Context Fusion Framework for Pan-Sharpening and Zero-Shot Image Enhancement

Researchers strategically choose where to submit their work in order to maximize its impact, and these publication decisions in turn determine venues’ impact factors. To analyze how individual publication choices both respond to and shape venue impact, we introduce a game-theoretic framework—coined the *Publication Choice Problem*—that captures this two‐way interplay. We show the existence of a pure-strategy equilibrium in the Publication Choice Problem and its uniqueness under binary researcher types. Our characterizations of the equilibrium properties offer insights about what publication behaviors better indicate a researcher's impact level and reveal how the disproportionate scaling of high-impact and low-impact researchers can result in the fluctuation in the impact of publication venues. Through equilibrium analysis, we further investigate how labeling top papers with "spotlight" affects the impact factor of venues in the research community.

The Publication Choice Problem

Large language models now draft news, legal analyses, and software code with human-level fluency. At the same time, regulations such as the EU AI Act mandate that each synthetic passage carry an imperceptible, machine-verifiable mark for provenance. Conventional logit-based watermarks satisfy this requirement by selecting a pseudorandom green vocabulary at every decoding step and boosting its logits, yet the random split can exclude the highest-probability token and thus erode fluency. WaterMod mitigates this limitation through a probability-aware modular rule. The vocabulary is first sorted in descending model probability; the resulting ranks are then partitioned by the residue $\text{rank}\bmod k$, which distributes adjacent—and therefore semantically similar—tokens across different classes.
A fixed bias of small magnitude is applied to one selected class. In the zero-bit setting ($k=2$), an entropy-adaptive gate selects either the even or the odd parity as the green list. Because the top two ranks fall into different parities, this choice embeds a detectable signal 
while guaranteeing that at least one high-probability token remains available for sampling. In the multi-bit regime ($k>2$), the current payload digit $d$ selects the color class whose ranks satisfy $\text{rank} \bmod k = d$. Biasing the logits of that class embeds exactly 
one base-$k$ digit—equivalently $\log_{2}k$ bits—per decoding step, thereby enabling fine-grained provenance tracing. The same modular arithmetic therefore supports both binary attribution and rich payloads. Experimental results demonstrate that WaterMod consistently attains strong watermark detection performance while maintaining generation quality in both zero-bit and multi-bit settings. This robustness holds across a range of tasks, including natural language generation, mathematical reasoning, and code synthesis. Our code and data are available at \url{https://github.com/Shinwoo-Park/WaterMod}.

WaterMod: Modular Token-Rank Partitioning for Probability-Balanced LLM Watermarking

Federated learning (FL) allows for collaborative model training while preserving data privacy, but its distributed nature makes it vulnerable to poisoning attacks. Existing defense methods typically rely on using gradients from multiple clients to define a trusted region, selecting only the trustworthy update (good gradients) within this region for aggregation. Mainstream defense boundaries are categorized as hard
boundaries, soft boundaries, and semi-soft boundaries. However, we argue that even good gradients within these boundaries can still be exploited by attackers to poison the model. To tackle this challenge, we introduce a boundary-adaptive attack method that leverages the directional properties of optimization techniques to derive baseline poisoned gradients. Through iterative perturbation, it generates seemingly innocent gradients that subtly deviate from the global model. Our extensive study on 3 benchmark datasets and 13 mainstream defensive mechanisms confirms that the proposed attack raises a significantly severe threat to the integrity and security of federated learning practices, regardless of the flourishing of robust Federated Learning methods.

Good Gradients Poison Your Model: Evading Defenses in Federated Learning via Boundary-adaptive Perturbation

Session-based recommendation (SBR) aims to predict anonymous users' next interaction based on their interaction sessions. In practical recommendation scenario, low-exposure items constitute the majority of interactions, creating a long-tail distribution that severely compromises recommendation diversity. Existing approaches attempt to address this issue by promoting tail items but incur accuracy degradation, exhibiting a "see-saw" effect between long-tail and accuracy performance. We attribute such conflict to session-irrelevant noise within the tail item set, which existing long-tail approaches fail to identify and constrain effectively. To resolve our fundamental conflict, we propose HID (Hybrid Intent-based Dual Constraint Framework), a plug-and-play framework that transforms the conventional "see-saw" into a "win-win" relationship through introducing the hybrid intent-based dual constraints. Two key innovations are incorporated in this framework: (i) Hybrid Intent Learning, where we reformulate the intent extraction strategies by employing attribute-aware spectral clustering to reconstruct the item-to-intent mapping. Furthermore, discrimination of session-irrelevant noise is achieved through the assignment of both target and noise intents to each sessions. (ii) Intent Constraint Loss, where we propose two novel constraint paradigms regarding the diversity and accuracy to regulate the representation learning process, and unify the two optimization objectives into a unique loss. Extensive experiments across multiple SBR models and datasets demonstrate that HID can enhance both long-tail performance and recommendation accuracy, establishing new state-of-the-art performance in long-tail recommender systems.

Bid Farewell to Seesaw: Towards Accurate Long-Tail Session-Based Recommendation via Dual Constraints of Hybrid Intents

Real world systems evolve in continuous-time according to their underlying causal relationships, yet their dynamics are often unknown. Existing approaches to learning such dynamics typically either discretize time ---leading to poor performance on irregularly sampled data--- or ignore the underlying causality. We propose CADYT, a novel method for causal discovery on dynamical systems addressing both these challenges. In contrast to state-of-the-art causal discovery methods that model the problem using discrete-time Dynamic Bayesian networks, our formulation is grounded in Difference-based causal models, which allow milder assumptions for modeling the continuous nature of the system. CADYT leverages exact Gaussian Process inference for modeling the continuous-time dynamics which is more aligned with the underlying dynamical process. We propose a practical instantiation that identifies the causal structure via a greedy search guided by the Algorithmic Markov Condition and Minimum Description Length principle. Our experiments show that CADYT outperforms state-of-the-art methods on both regularly and irregularly-sampled data, discovering causal networks closer to the true underlying dynamics.

Causal Structure Learning for Dynamical Systems with Theoretical Score Analysis

Streamlining constraints (or streamliners, for short) narrow the search space, enhancing the speed and feasibility of solving complex constraint satisfaction problems. Traditionally, streamliners were crafted manually or generated through systematically combined atomic constraints with high-effort offline testing. Our approach utilizes the generative capabilities of Large Language Models (LLMs) to propose effective streamliners for problems specified in the MiniZinc constraint programming language and integrates feedback to the LLM with quick empirical tests for validation. Evaluated across seven diverse constraint satisfaction problems, our method achieves substantial runtime reductions. We compare the results to obfuscated and disguised variants of the problem to see whether the results depend on LLM memorization. We also analyze whether longer offline runs improve the quality of streamliners and whether the LLM can propose good combinations of streamliners.

Generating Streamlining Constraints with Large Language Models

3D medical image fusion (MIF) and segmentation (MIS) are critical and inherently synergistic tasks in medical image analysis. However, fundamentally integrating them remains highly challenging, since effective collaborative paradigms are still scarce and their optimization objectives fundamentally diverge. Moreover, existing continual learning techniques are unable to achieve truly advanced performance for both tasks using a shared weight. To address these challenges, we propose M²-CoFS, a unified model capable of jointly handling both tasks. Our core contribution is a “network-guided network learning” paradigm designed to break the task boundaries. We model the weight spaces of MIF and MIS as high-dimensional manifolds and innovatively use a lightweight neural network to implicitly construct a shared manifold. Interestingly, this network yields a unified weight for both tasks. To ensure the shared manifold retains the intrinsic geometry of both original manifolds, we embed manifold distances into the loss function of this network as a constraint. Additionally, we design a tailored three-stage training paradigm for our core contribution mentioned above. Stage I focuses on independent task optimization for high-quality weights; Stage II aims to reduce parameter-space distance between tasks via our cross-task weight adaptation strategy; Our core innovation serves as stage III. Experimental results show that M²-CoFS consistently outperforms state-of-the-art comparison models on both MlF and MIS.

Breaking Task Boundaries: A Unified Model for 3D Medical Image Fusion and Segmentation Guided by Manifold Perspective

A fundamental use of knowledge bases (KBs) is query answering, i.e.,
retrieving the information entailed by the KB in response to a user
query. When both the KB and the query are specified as logical
formulae, the standard form of answer provided to users is the set
of all certain answers (CAs): tuples of constants that satisfy the
formula defining the query in every model of the logical theory
defining the KB.

Despite their wide adoption, CAs are known to be just a lossy
representation of the information that a KB and a query provide.
While several alternative answer languages have been proposed in the
literature, no general consensus has emerged on the most suitable
approach to query answering over ontological KBs, as each language
come with its own limitations.

To address some of these issues, we introduce Regularly Recurrent
Answers (RRAs), a novel answer language for queries over
ontological KBs based on regular expressions. RRAs support the
representation of infinite sets of tuples of constants via a simple
(and arguably well understood) generation mechanism. We show that
RRAs can capture a fundamental fragment of the certain information
entailed by union of conjunctive queries and DL-Lite KBs, making
them a strong candidate for informative query answering
settings. Our contribution include the formal definition of RRAs, a
proof of their informativeness, and a study of the computational
complexity of query answering problem using RRAs.

Expressive Recursive Answers for Ontological Knowledge Bases

Hard negatives are essential for training effective retrieval models. Hard-negative mining typically relies on ranking documents using cross-encoders or static embedding models based on similarity metrics such as cosine distance. Hard negative mining becomes challenging for biomedical and scientific domains due to the difficulty in distinguishing between source and hard negative documents. However, referenced documents naturally share contextual relevance with the source document but are not duplicates, making them well-suited as hard negatives. In this work, we propose BiCA: Biomedical Dense Retrieval with Citation-Aware Hard Negatives, an approach for hard-negative mining by utilizing citation links in 20,000 PubMed articles for improving a domain-specific small dense retriever. We fine-tune the GTE_small and GTE_Base models using these citation-informed negatives and observe consistent improvements in zero-shot dense retrieval using nDCG@10 for both in-domain and out-of-domain tasks on BEIR and outperform baselines on long-tailed topics in LoTTE using Success@5. Our findings highlight the potential of leveraging document link structure to generate highly informative negatives, enabling state-of-the-art performance with minimal fine-tuning and demonstrating a path towards highly data-efficient domain adaptation.

Downloads

Next from AAAI 2026

MMMamba: A Versatile Cross-Modal in Context Fusion Framework for Pan-Sharpening and Zero-Shot Image Enhancement

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

MMMamba: A Versatile Cross-Modal in Context Fusion Framework for Pan-Sharpening and Zero-Shot Image Enhancement

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads