Singapore

With the emergence of large multimodal models, dual-encoder alignment via contrastive learning has seen a resurgence. However, the escalating model size demands effective Parameter-Efficient Fine-Tuning (PEFT). While LoRA is a promising inference-free alternative to adapters, we find that its naive application to multimodal tasks causes a severe rank imbalance, favoring the text modality and FFN layers. To address this, we propose HALoRA (Hierarchical Allocation LoRA), which introduces a component-wise budget allocator to ensure balanced fine-tuning across both modalities and their internal components. This is complemented by a gradient-approximated initialization to accelerate convergence. With only half the parameters of adapters, HALoRA achieves superior or competitive performance in retrieval and zero-shot classification. Our work presents a more principled approach to multimodal LoRA and uncovers an intriguing asymmetry in vision-language alignment, paving the way for future research. Code is made available.

AAAI 2026

HALoRA: Low-Rank Adaptation with Hierarchical Budget Allocation for Efficient Vision-Language Alignment

ml: efficient ml / green ai

ml: large multimodal models (lmms)

ml: multimodal learning

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Recent approaches for few-shot 3D point cloud semantic segmentation typically require a two-stage learning process, i.e., a pre-training stage followed by a few-shot training stage. While effective, these methods face overreliance on pre-training, which hinders model flexibility and adaptability. In addition, current approaches focus on visual information in the support set and neglect or do not fully exploit other useful data, such as textual annotations. This inadequate utilization of support information impairs the performance of the model and restricts its zero-shot ability. To address these limitations, we present a novel pre-training-free network, named **E**fficient **P**oint Cloud Semantic **Seg**mentation for **F**ew- and **Z**ero-shot scenarios. Our EPSegFZ incorporates three key components. A **Pro**totype-**E**nhanced **R**egisters **A**ttention (**ProERA**) module and a **D**ual **R**elative **P**ositional **E**ncoding (**DRPE**)-based cross-attention mechanism for improved feature extraction and accurate query-prototype correspondence construction without pre-training. A **L**anguage-**G**uided **P**rototype **E**mbedding (**LGPE**) module that effectively leverages textual information from the support set to improve few-shot performance and enable zero-shot inference. Extensive experiments show that our method outperforms the state-of-the-art method by 5.68\% and 3.82\% on the S3DIS and ScanNet benchmarks, respectively.

EPSegFZ: Efficient Point Cloud Semantic Segmentation for Few- and Zero-Shot Scenarios with Language Guidance

Accurate traffic forecasting plays a vital role in intelligent transportation systems, enabling applications such as congestion control, route planning, and urban mobility optimization. However, traffic forecasting remains challenging due to two key factors: (1) complex spatial dependencies arising from dynamic interactions between road segments and traffic sensors across the network, and (2) the coexistence of multi-scale periodic patterns (e.g., daily and weekly periodic patterns driven by human routines) with irregular fluctuations caused by unpredictable events (e.g., accidents, weather, or construction). To tackle these challenges, we propose **HyperD** (Hybrid Periodic Decoupling), a novel framework that decouples traffic data into **periodic** and **residual components**. The periodic component is handled by the **Hybrid Periodic Representation Module**, which extracts fine-grained daily and weekly patterns using learnable periodic embeddings and spatial-temporal attention. The residual component, which captures non-periodic, high-frequency fluctuations, is modeled by the **Frequency-Aware Residual Representation Module**, leveraging complex-valued MLP in frequency domain. To enforce semantic separation between the two components, we further introduce a **Dual-View Alignment Loss**, which aligns low-frequency information with the periodic branch and high-frequency information with the residual branch. Extensive experiments on four real-world traffic datasets demonstrate that HyperD achieves state-of-the-art prediction accuracy, while offering superior robustness under disturbances and improved computational efficiency compared to existing methods.

HyperD: Hybrid Periodicity Decoupling Framework for Traffic Forecasting

In hard-label black-box adversarial attacks, where only the top-1 predicted label is accessible, the prohibitive query complexity poses a major obstacle to practical deployment. In this paper, we focus on optimizing a representative class of attacks that search for the optimal ray direction yielding the minimal $\ell_p$-norm perturbation required to push a benign image into the adversarial region. Inspired by Nesterov’s Accelerated Gradient (NAG), we propose a momentum-based algorithm, `ARS-OPT`, which proactively estimates the gradient at a future position inferred from accumulated momentum. We provide a theoretical analysis of its convergence behavior, showing that `ARS-OPT` enables more accurate directional updates and achieves faster, more stable optimization. Furthermore, we derive the step size analytically, eliminating the need for expensive line-search procedures. To further accelerate convergence, we incorporate surrogate-model-based priors into `ARS-OPT`'s gradient estimation, resulting in `PARS-OPT` with enhanced performance. The superiority of our approach is supported by rigorous theoretical analysis under mild assumptions. Extensive experiments on ImageNet and CIFAR-10 demonstrate that our method surpasses thirteen state-of-the-art approaches in query efficiency. The source code will be released online.

Improving the Convergence Rate of Ray Search Optimization for Query-Efficient Hard-Label Attacks

Cross-Domain Recommendation (CDR) transfers user preferences from a source domain to alleviate data sparsity in a target domain. While disentangling representations into domain-specific and shared components is a common method, existing methods overlook user preference heterogeneity and item appeal heterogeneity. To this end, we propose **DPGCDR**, a **D**ual-**P**erspective **G**roup-aware **CDR** method that learns symmetric group-aware representations from both user and item. Conceptually, DPGCDR dynamically clusters users into groups and items into themes, then symmetrically disentangles user preferences into group-specific and cross-group shared components, and item attributes into theme-specific and cross-theme shared components. We propose a two-stage training scheme: 1) an initial warm-up stage learns preliminary representations to dynamically cluster users and items into group and theme structures which generalize cross-domain scenarios into multi-group disentanglement analogous to multi-domain settings; 2) a fusion-based aggregation stage integrates these group/theme-specific components into unified global representations. Besides, an information-theoretic alignment regularizer further ensures consistency and discriminability between global shared and group/theme-specific representations, facilitating effective knowledge transfer by explicitly modeling and preserving the inherent multi-group structure within cross-domain interactions. Extensive experiments show DPGCDR achieves state-of-the-art performance, with significant gains of up to 25\% in HR@10 over baselines on datasets with heterogeneous interaction structures. Further analyses confirm our dynamic clustering mechanism effectively adapts to underlying data patterns, enabling fine-grained cross-domain knowledge transfer.

Dual-Perspective Disentanglement: Learning Symmetric Group-Aware Representations for Cross-Domain Recommendation

Large language models (LLMs) have demonstrated impressive performance on natural language tasks, but their decision-making processes remain largely opaque. Existing explanation methods either suffer from limited faithfulness to the model's reasoning or produce explanations that are difficult for humans to understand. To address these challenges, we propose \textbf{ProtoSurE}, a novel prototype-based surrogate framework that provides faithful and understandable explanations for LLMs. ProtoSurE trains an interpretable-by-design surrogate model that aligns with the target LLM while utilizing sentence-level prototypes as understandable concepts. Extensive experiments show that ProtoSurE consistently outperforms state-of-the-art explanation methods across diverse LLMs and datasets. Importantly, ProtoSurE demonstrates strong data efficiency, requiring relatively few training examples to achieve good performance, making it practical for real-world applications. Code is available in the Appendix.

Making Sense of LLM Decisions: A Prototype-based Framework for Explainable Classification

We investigate the problem of synthesizing strategies that guarantee the successful execution of a high-level nondeterministic agent program $\delta$ in Golog within a nondeterministic first-order basic action theory considering the environment as adversarial. Our approach constructs a symbolic program graph that captures the control flow of $\delta$ independently of the domain, enabling strategy synthesis through the cross product of the program graph with the domain model. We formally relate graph-based transitions to standard Golog semantics and provide a synthesis procedure that is sound though incomplete (in general, the problem is undecidable, given that we have a first-order representation of the state). We also extend the framework to handle the case where the environment's possible behaviors are specified by a Golog program.

Strategic Reasoning over Golog Programs in the Nondeterministic Situation Calculus

Predicting the popularity of user-generated content (UGC) is a crucial but challenging task in social media analysis. While existing retrieval-augmented models enhance predictions by supplying rich contextual information, they remain limited by a fundamental precision-recall dilemma: enlarging the retrieval set increases coverage but introduces noisy, irrelevant context that harms prediction. In this work, we propose a unified framework that learns to retrieve, filter, and predict. Central to our approach is a Mixture-of-Logits based retrieval module that replaces static similarity metrics with a dynamic, multi-faceted scoring function, enabling the retrieval to be directly optimized by the prediction objective. Then an uncertainty-aware filter is designed to perform differentiable subset selection and refine the selected representations using the information bottleneck principle. At last, to enhance predictive robustness, we introduce a confidence-weighted test-time perturbation strategy. By learning to retrieve UGCs that are beneficial for prediction and filtering out uncertainty, our framework provides more relevant and reliable context. Extensive experiments demonstrate that the proposed framework achieves state-of-the-art performance, consistently outperforming strong baselines.

Learning to Curate Context: Jointly Optimizing Retrieval and Prediction for Multimodal Social Media Popularity

While large language models (LLMs) leverage both knowledge and reasoning during inference, the capacity to distinguish between them plays a pivotal role in model analysis, interpretability, and development. Inspired by dual-system cognitive theory, we propose a cognition attribution framework to decouple the contribution of knowledge and reasoning. In particular, the cognition of LLMs is decomposed into two distinct yet complementary phases: knowledge retrieval (Phase 1) and reasoning adjustment (Phase 2). To separate these phases, LLMs are prompted to generate answers under two different cognitive modes, fast thinking and slow thinking, respectively. The performance under different cognitive modes is analyzed to quantify the contribution of knowledge and reasoning. This architecture is employed to 15 LLMs across 3 datasets. Results reveal: (1) reasoning adjustment is domain-specific, benefiting reasoning-intensive domains (e.g., mathematics, physics, and chemistry) and potentially imparing knowledge-intensive domains. (2) Parameter scaling improves both knowledge and reasoning, with knowledge improvements being more pronounced. Additionally, parameter scaling make LLMs reasoning significantly more prudent, while moderately more intelligent. (3) Knowledge primarily resides in lower network layers, while reasoning operates in higher layers. Our framework not only helps understand LLMs from a "decoupling" perspective, but also provides new insights into existing research, including scaling laws, hierarchical knowledge editing, and limitations of small-scale-LLM reasoning.

Decoupling Knowledge and Reasoning in LLMs: An Exploration Using Cognitive Dual-System Theory

Ensuring the safety of AI-enabled systems, particularly in high-stakes domains such as autonomous driving and healthcare, has become increasingly critical. Traditional formal verification tools fall short when faced with systems that embed both opaque, black-box AI components and complex stochastic dynamics. To address these challenges, we introduce LUCID (Learning-enabled Uncertainty-aware Certification of stochastIc Dynamical systems), a verification engine for certifying safety of black-box stochastic dynamical systems from a finite dataset of random state transitions. As such, LUCID is the first known tool capable of establishing quantified safety guarantees for such systems. Thanks to its modular architecture and extensive documentation, LUCID is designed for easy extensibility. 

LUCID employs a data-driven methodology rooted in control barrier certificates, which are learned directly from system transition data, to ensure formal safety guarantees. We use conditional mean embeddings to embed data into a reproducing kernel Hilbert space (RKHS), where an RKHS ambiguity set is constructed that can be inflated to robustify the result to out-of-distribution behavior. 

A key innovation within LUCID is its use of a finite Fourier kernel expansion to reformulate a semi-infinite non-convex optimization problem into a tractable linear program. The resulting spectral barrier allows us to leverage the fast Fourier transform to generate the relaxed problem efficiently, offering a scalable yet distributionally robust framework for verifying safety. LUCID thus offers a robust and efficient verification framework, able to handle the complexities of modern black-box systems while providing formal guarantees of safety. These unique capabilities are demonstrated on challenging benchmarks.

LUCID: Learning-Enabled Uncertainty-Aware Certification of Stochastic Dynamical Systems

In this work, we propose a novel framework for the logical specification of non-Markovian rewards in Markov Decision Processes (MDPs) with large state spaces. Our approach leverages Linear Temporal Logic Modulo Theories over finite traces (LTLf$^{MT}$), a more expressive extension of classical temporal logic in which predicates are first-order formulas of arbitrary first-order theories rather than simple Boolean variables. This enhanced expressiveness enables the specification of complex tasks over unstructured and heterogeneous data domains, promoting a unified and reusable framework that eliminates the need for manual predicate encoding. However, the increased expressive power of LTLf$^{MT}$ introduces additional theoretical and computational challenges compared to standard LTLf specifications. We address these challenges from a theoretical standpoint, identifying a fragment of LTLf$^{MT}$ that is tractable but sufficiently expressive for reward specification in an infinite-state-space context. From a practical perspective, we introduce a method based on reward machines and Hindsight Experience Replay (HER) to translate first-order logic specifications and address reward sparsity. We evaluate this approach to a continuous-control setting using Non-Linear Arithmetic Theory, showing that it enables natural specification of complex tasks. Experimental results show how a tailored implementation of HER is fundamental in solving tasks with complex goals.

Downloads

Next from AAAI 2026

EPSegFZ: Efficient Point Cloud Semantic Segmentation for Few- and Zero-Shot Scenarios with Language Guidance

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

EPSegFZ: Efficient Point Cloud Semantic Segmentation for Few- and Zero-Shot Scenarios with Language Guidance

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads