Singapore

Existing Image-Text Sentiment Analysis (ITSA) methods may suffer from inconsistent intra-modal and inter-modal sentiment relationships. Therefore, we develop a method that balances before fusing to solve the issue of vision-language imbalance intra-modal and inter-modal sentiment relationships; that is, a Semi-Push-Pull Supervised Contrastive Learning (SPP-SCL) method is proposed. Specifically, the method is implemented using a novel two-step strategy, namely first using the proposed intra-modal supervised contrastive learning to pull the relationships between the intra-modal and then performing a well-designed conditional execution statement. If the statement result is false, our method will perform the second step, which is inter-modal supervised contrastive learning to push away the relationships between inter-modal. The two-step strategy will balance the intra-modal and inter-modal relationships to achieve the purpose of relationship consistency and finally perform cross-modal feature fusion for sentiment analysis and detection. Experimental studies on three public image-text sentiment and sarcasm detection datasets demonstrate that SPP-SCL significantly outperforms state-of-the-art methods by a large margin and is more discriminative in sentiment. Codes will be released soon on GitHub.

AAAI 2026

SPP-SCL: Semi-Push-Pull Supervised Contrastive Learning for Image-Text Sentiment Analysis and Beyond

social networks

affective computing

multimodal learning

sentiment analysis

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

In open-vocabulary mobile manipulation (OVMM), task success often hinges on the selection of an appropriate base placement for the robot. Existing approaches typically navigate to proximity-based regions without considering affordances, resulting in frequent manipulation failures. We propose Affordance-Guided Coarse-to-Fine Exploration, a zero-shot framework for base placement that integrates semantic understanding from vision-language models (VLMs) with geometric feasibility through an iterative optimization process. Our method constructs cross-modal representations, namely Affordance RGB and Obstacle Map+, to align semantics with spatial context. This enables reasoning that extends beyond the egocentric limitations of RGB perception. To ensure interaction is guided by task-relevant affordances, we leverage coarse semantic priors from VLMs to guide the search toward task-relevant regions and refine placements with geometric constraints, thereby reducing the risk of convergence to local optima. Evaluated on five diverse open-vocabulary mobile manipulation tasks, our system achieves an 85% success rate, significantly outperforming classical geometric planners and VLM-based methods. This demonstrates the promise of affordance-aware and multimodal reasoning for generalizable, instruction-conditioned planning in OVMM.

Affordance-Guided Coarse-to-Fine Exploration for Base Placement in Open-Vocabulary Mobile Manipulation

Class-agnostic 3D instance segmentation tackles the challenging task of segmenting all object instances, including previously unseen ones, without semantic class reliance. Current methods struggle with generalization due to the scarce annotated 3D scene data or noisy 2D segmentations. While synthetic data generation offers a promising solution, existing 3D scene synthesis methods fail to simultaneously satisfy geometry diversity, context complexity, and layout reasonability, each essential for this task. To address these needs, we propose an Adapted 3D Scene Synthesis pipeline for class-agnostic 3D Instance SegmenTation, termed as **ASSIST-3D**, to synthesize proper data for model generalization enhancement. Specifically, ASSIST-3D features three key innovations, including 1) **Heterogeneous Object Selection** from extensive 3D CAD asset collections, incorporating randomness in object sampling to maximize geometric and contextual diversity; 2) **Scene Layout Generation** through LLM-guided spatial reasoning combined with depth-first search for reasonable object placements; and 3) **Realistic Point Cloud Construction** via multi-view RGB-D image rendering and fusion from the synthetic scenes, closely mimicking real-world sensor data acquisition. Experiments on ScanNetV2, ScanNet++, and S3DIS benchmarks demonstrate that models trained with ASSIST-3D-generated data significantly outperform existing methods. Further comparisons underscore the superiority of our purpose-built pipeline over existing 3D scene synthesis approaches.

ASSIST-3D: Adapted Scene Synthesis for Class-Agnostic 3D Instance Segmentation

Temporal Graph Neural Networks (TGNNs) aim to capture the evolving structure and timing of interactions in dynamic graphs. Although many models incorporate time through encodings or architectural design, they often compute attention over entangled node and edge representations, failing to reflect their distinct temporal behaviors. Node embeddings evolve slowly as they aggregate long-term structural context, while edge features reflect transient, timestamped interactions (e.g. messages, trades, or transactions). This mismatch results in semantic attention blurring, where attention weights cannot distinguish between slowly drifting node states and rapidly changing, information-rich edge interactions. As a result, models struggle to capture fine-grained temporal dependencies and provide limited transparency into how temporal relevance is computed. This paper introduces KEAT (Kernelized Edge Attention for Temporal Graphs), a novel attention formulation that modulates edge features using a family of continuous-time kernels, including Laplacian, RBF, and learnable MLP variant. KEAT preserves the distinct roles of nodes and edges, and integrates seamlessly with both Transformer-style (e.g., DyGFormer) and message-passing (e.g., TGN) architectures. It achieves up to 18\% MRR improvement over the recent DyGFormer and 7\% over TGN on link prediction tasks, enabling more accurate, interpretable and temporally aware message passing in TGNNs.

Kernelized Edge Attention: Addressing Semantic Attention Blurring in Temporal Graph Neural Networks

In cooperative Multi-Agent Reinforcement Learning (MARL), the subgroup-wise learning is employed to assign sub-tasks to agents towards the enhancement of team collaboration. However, the present work is dependent on manually defined allocation criteria, which hinders its capacity to adapt to environmental changes promptly, and also relaxes communication restrictions, thereby constraining the application of algorithms in a range of fields. In order to address these issues, the Autonomous Partner Selection (APS) framework is proposed, which offers an implicit grouping mechanism in an autonomous way. Each agent is capable of autonomously selecting cooperative partners and integrating their own observations with those of partners to harmonise the cooperative behaviour during the training stage. With a view to strictly restricting communication, the intention encoder is trained through information distillation, which enables agents to selectively take more cooperative actions based solely on local observations. Meanwhile, in order to circumvent potential conflicts engendered by homogenization behaviour, we employ a contrastive learning strategy to the cooperative intention generated by agents, thereby ensuring that the behavioural tendencies exhibited by different individuals remain as diverse as possible. Finally, extensive comparative experiments on the StarCraft Multi-Agent Challenge and Google Research Football are conducted. The results demonstrate that APS exhibits superior performance in comparison to the state-of-the-art algorithms across a range of tasks, and agents can adapt their grouping strategies in accordance with the environment to facilitate enhanced cooperation.

Autonomous Partner Selection for Cooperative Multi-Agent Reinforcement Learning

Combining Mixture of Experts (MoE) with Low-Rank Adaptation (LoRA) has shown promising efficiency in multi-task instruction tuning for Large Language Models (LLMs). While existing routing schemes for such MoE systems employ auxiliary functions to ensure both expert selection certainty and workload balance among experts, they are hindered by two critical challenges: (1) Existing methods overlook the evolving cross-expert relationships across layers, leading to inefficient expert utilization. (2) The auxiliary functions fail to incorporate cross-task semantic characteristics during expert assignment, leading to suboptimal task adaptation. To address these challenges, we propose $\textbf{H}$ybrid r$\textbf{o}$u$\textbf{t}$ing for a $\textbf{M}$ixture $\textbf{o}$f LoRA $\textbf{E}$xperts ($\textbf{HotMoE}$), a novel multi-task instruction tuning framework that adapts hierarchical routing to the distinct characteristics of different LLM layers. First, we design a $\textit{hybrid routing module}$. In lower layers, expert-expert attention facilitates cross-task collaboration and generalization.
In higher layers, token-expert attention enables precise alignment between task semantics and specialized experts. 
Second, we introduce a $\textit{similarity-guided auxiliary loss module}$ to regularize routing decisions by exploiting hidden state similarities. This loss synergistically reinforces expert specialization without sacrificing certainty of expert selection by promoting cohesive activation patterns among semantically related tasks while sharpening distinctions between conflicting ones. Experiments across two multi-task instruction tuning scenarios covering seven NLP benchmarks demonstrate that HotMoE consistently outperforms all baselines, improving Mean Relative Difference by up to 1.68\% with only 3.1\% of trainable parameters.

Hybrid Routing for a Mixture of LoRA Experts

In modern Computer-Aided Design (CAD), parametric sketches play a crucial role by capturing both the geometric structure and design intent through constraints. However, existing deep learning–based sketch methods remain restricted to simple geometric primitives and limited constraint types, hindering their application to complex real-world engineering tasks. To address this gap, we introduce the UniSketch dataset, comprising 3,836,290 sketches. It offers a comprehensive and diverse collection of 7 types of geometric primitives and 23 types of 2D constraints, all represented as unified vector sequences suitable for deep learning applications. Leveraging the UniSketch dataset, we propose a unified multi-task Transformer framework as a true foundation model for parametric sketch modeling, supporting diverse core tasks like image-to-sketch generation, constraint prediction, and unconditional sketch synthesis. Furthermore, the generated sketches can be efficiently converted to CAD-compatible formats, enabling seamless integration with industrial CAD system for re-editing and reusing. The experimental results show that UniSketch outperforms existing methods in multiple tasks, demonstrating its versatility and practical value in industrial CAD applications.

UniSketch: A Unified Framework for Parametric Sketch Generation and Constraint Prediction

Prevalent pre-training strategies for Brain-Computer Interfaces (BCIs) are often constrained by spatio-temporal entanglement. This critical issue arises from processing multi-channel Electroencephalography (EEG) signals as monolithic sequences, which intertwines the signal's temporal dynamics with its spatial topography and hinders the learning of robust and generalizable representations. To address this, we introduce BraSTORM, a framework that explicitly disentangles EEG data into separate temporal and spatial streams at the input level. Two streams are processed by parallel encoders trained with a composite dual-objective: a masked signal reconstruction loss captures fine-grained, intra-modal details, while a cross-modal contrastive loss enforces high-level semantic alignment. Extensive fine-tuning experiments on six benchmarks covering three major BCI downstream tasks—Emotion Recognition, Sleep Staging, and Motor Imagery—demonstrate that BraSTORM achieves state-of-the-art performance. Our findings validate that resolving spatio-temporal entanglement at the input level can be a competitive pre-training framework for the BCI field.

BraSTORM: A Dual-Branch Self-Supervised Framework for EEG Representation Learning via Input-Level Spatio-Temporal Decomposition

Reinforcement Learning from Human Feedback (RLHF) is a methodology that aligns agent behavior with human preferences by utilizing signals such as scalar rewards, comparative preferences, or physical demonstrations from human users. We introduce NEURO-LOOP, a framework for Reinforcement Learning from Neural Feedback that leverages passive Brain-Computer Interfaces (BCI) to infer user assessments directly from brain activity. We present and release a novel dataset of functional near-infrared spectroscopy (fNIRS) recordings collected from 25 human participants observing agent behavior in three different domains: a Pick-and-Place Robot task, Lunar Lander, and Flappy Bird. We train classifiers to predict levels of agent performance (optimal, sub-optimal, or poor) from windows of preprocessed fNIRS feature vectors, achieving an average F1 score of 67% for binary classification and 46% for multi-class models across conditions. We also train regressors to predict the degree of deviation between the agent's chosen action and a set of near-optimal actions, providing a continuous measure of performance. To evaluate cross-subject generalization, we use a leave-one-subject-out approach to demonstrate that fine-tuning a pre-trained model with a small sample of witheld, subject-specific data increases average F1 scores by 17% for binary classification and 41% for multi-class models. Our work demonstrates that mapping implicit neural feedback to agent performance is feasible, laying the foundation for integrating brain data into future RLHF systems.

Towards Reinforcement Learning from Neural Feedback: Mapping fNIRS Signals to Agent Performance

We introduce FIXME, the first end-to-end, open-source, and large-scale benchmark for evaluating Large Language Models (LLMs) in hardware design functional verification (FV). Comprising 747 tasks derived from real-world hardware designs, FIXME spans five core FV sub-sets: specification comprehension, reference model generation, testbench generation, assertion design, and RTL debugging. To ensure high data quality, we developed an AI-human collaborative framework for agile data curation and annotation. This process resulted in 25,000 lines of verified RTL, 35,000 lines of enhanced testbenches, and over 1,200 SystemVerilog Assertions. Furthermore, through expert-guided optimization within the multi-agent aided flow, we achieved a remarkable 45.57% improvement in average functional coverage, underscoring the benchmark's robustness. Through evaluation of state-of-the-art LLMs like GPT-4.1, FIXME identifies key limitations and provides actionable insights, advancing the potential of LLM-driven automation in hardware design functional verification.

FIXME: Towards End-to-End Benchmarking of LLM-Aided Design Verification

Retrieval-Augmented Generation (RAG) has been demonstrated to effectively mitigate the knowledge recency issue in Large Language Models (LLMs) while significantly reducing hallucinations. However, existing RAG methods exhibit insufficient capability in modeling reasoning paths for complex multi-hop reasoning tasks. While Reinforcement Learning (RL) has demonstrated success in enhancing model reasoning ability , token-level RL frameworks exhibit inherent limitations in maintaining coherent reasoning trajectories. This approach remains susceptible to the compounding accumulation of contextual errors during the retrieval process, ultimately resulting in erroneous output generation. To address this challenge, we propose Chain Progressive Search (CP-Search), a novel two-stage training framework designed to enhance the model's retrieval capability in complex scenarios. This framework models the entire retrieval process as a Retrieval-level Markov Decision Process , systematically optimizing the model's retrieval behavior at each step of the chained retrieval. Specifically, CP-Search first constructs a retrieval-cognitive behavioral dataset and employs Supervised Fine-Tuning (SFT) to endow the model with cognitive behaviors for searching. More importantly, by introducing a dense progressive process reward in reinforcement learning training, CP-Search significantly improves the model's reasoning consistency and feedback correction ability in chained retrieval. Experiments conducted on multiple multi-hop datasets demonstrate that CP-Search significantly outperforms existing RAG methods in complex multi-hop reasoning tasks.

Downloads

Next from AAAI 2026

Affordance-Guided Coarse-to-Fine Exploration for Base Placement in Open-Vocabulary Mobile Manipulation

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

Affordance-Guided Coarse-to-Fine Exploration for Base Placement in Open-Vocabulary Mobile Manipulation

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads