Singapore

Despite the rapid progress in large language models (LLMs), even sub-billion-scale systems perform at chance level on challenging natural language inference (NLI) benchmarks such as Adversarial Natural Language Inference (ANLI), while training larger models is often impractical due to limited computational resources. We address this parameter-efficiency bottleneck in NLI with a Complex-Vector Token Representation that explicitly decouples each token from its context, and a Token-Context Attention mechanism that updates each token based on the most informative contextual semantics. On ANLI, a 0.8B-parameter Token-Context Attention model achieves higher parameter efficiency (accuracy per parameter) than all 1B and comparable 0.8B self-attention baselines; it also suffers smaller performance degradation under FGSM/PGD attacks and exhibits better transfer performance to SNLI in zero- and few-shot learning. These results suggest that explicitly disentangling token and context offers a viable alternative to standard self-attention for NLI tasks.

AAAI 2026

Token-Context Attention for NLI: An Alternative to Self-Attention

token-context-attention

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Traditional Intrusion Detection Systems (IDS) are typically trained in specific network environments, and their performance often degrades significantly when deployed in new environments with different attack categories. To address this challenge, we propose and define the task of cross-dataset intrusion detection and design a novel multimodal contrastive learning framework named TriFusion-IDS. This framework represents network traffic from three complementary dimensions: a graph view to capture structural communication patterns, a tabular view to model statistical features, and a textual view to define the semantics of attacks. TriFusion-IDS fuses the graph and tabular representations and aligns them with textual descriptions in a shared embedding space using a CLIP-style contrastive loss function. This semantics-based alignment mechanism enables the model to overcome the effects of zero-shot categories and thus generalize to new network environments. Our extensive experiments on several mainstream datasets demonstrate that this method significantly outperforms existing baselines in cross-dataset intrusion detection scenarios.

TriFusion-IDS: A Multimodal Graph-Tabular-Text Contrastive Framework for Cross-Dataset Intrusion Detection

Conversational search aims to satisfy users’ complex information needs via multiple-turn interactions. The key challenge lies in revealing real users’ search intent from the context-dependent queries. Previous studies achieve conversational search by fine-tuning a conversational dense retriever with relevance judgments between pairs of context-dependent queries and documents. However, this training paradigm encounters data scarcity issues. To this end, we propose ConvMix, a mixed-criteria framework to augment conversational dense retrieval, which covers more aspects than existing data augmentation frameworks. We design a two-sided relevance judgment augmentation schema in a scalable manner via the aid of large language models. Besides, we integrate the framework with quality control mechanisms to obtain semantically diverse samples and near-distribution supervisions to combine various annotated data. Experimental results on five widely used benchmarks show that the conversational dense retriever trained by our ConvMix framework outperforms previous baseline methods, which demonstrates our superior effectiveness.

ConvMix: A Mixed-Criteria Data Augmentation Framework for Conversational Dense Retrieval

Recent studies on LLM agent scaling have highlighted the potential of Multi-Agent Debate (MAD) to enhance reasoning abilities. However, the critical aspect of role allocation strategies remains underexplored. In this study, we demonstrate that allocating roles with differing viewpoints to specific positions significantly impacts MAD's performance in reasoning tasks. Specifically, we find a novel role allocation strategy, ``Truth Last'', which can improve MAD performance by up to 22\% in reasoning tasks. To address the issue of unknown truth in practical applications, we propose the Multi-Agent Debate Consistency (MADC) strategy, which systematically simulates and optimizes its core mechanisms. MADC incorporates path consistency to assess agreement among independent roles, simulating the role with the highest consistency score as the truth. We validated MADC across a range of LLMs (9 models), including the DeepSeek-R1 Distilled Models, on challenging reasoning tasks. MADC consistently demonstrated advanced performance, effectively overcoming MAD's performance bottlenecks and providing a crucial pathway for further improvements in LLM agent scaling.

Key Decision-Makers in Multi-Agent Debates: Who Holds the Power?

Noun phrases (NPs) in open knowledge bases (OKBs) are not canonicalized, leading to scattered knowledge that necessitates the exploration of the OKB canonicalization task (i.e., clustering synonymous noun phrases into the same group and assigning them a unique identifier). However, existing OKB canonicalization methods typically adhere to a traditional embedding-centered pipeline, which fails to exploit the direct interaction between NPs for pairwise NP similarity calculations, resulting in suboptimal performance and instead relying extensively on external resources. To address these limitations, we introduce a groundbreaking retrieve-read-group paradigm that enables fine-grained pairwise NP similarity calculations by effectively leveraging the direct NP interaction via the reading stage, thereby relieving the reliance on external resources. As an instantiation of this paradigm, we propose DUVK, a novel self-supervised framework that fully integrates the dual-view knowledge involved in OKBs from the relational view and the semantic view. In the retriever component of DUVK, a dual-view cross-training strategy is designed to make two view-specific encoders mutually reinforce each other by capitalizing on the complementary knowledge delivered from both views. Experimental results demonstrate that, even without the need of any external resources, DUVK outperforms all state-of-the-art competitors that rely on such resources.

A Novel Retrieve-Read-Group Paradigm for Open Knowledge Base Canonicalization

Over the past few decades, combinatorial solvers have seen remarkable performance improvements, enabling their practical use in real-world applications.
In some of these applications, ensuring the correctness of the solver's output is critical.
However, the complexity of modern solvers makes them susceptible to bugs in their source code.
In the domain of satisfiability checking (SAT), this issue has been addressed through \emph{proof logging}, where the solver generates a formal proof of the correctness of its solution.
For more expressive problems like MaxSAT, which is the optimization variant of SAT, proof logging had not seen a comparable breakthrough until recently, when the VeriPB proof system was put forwards as a general-purpose proof system for MaxSAT solvers.


In this paper, we show how to add proof logging to state-of-the-art techniques in Branch-and-Bound MaxSAT solving using the VeriPB proof system. 
This includes certifying lookahead methods used in such algorithms as well as advanced encodings of pseudo-Boolean constraints as clauses based on so-called Multi-Valued Decision Diagrams (MDDs).
We implement these ideas in MaxCDCL, the dominant branch-and-bound solver, and experimentally demonstrate that proof logging is feasible with limited overhead, while proof checking remains a challenge.

Certified Branch-and-Bound MaxSAT Solving

Urban Network Security Games (UNSGs), which model the strategic allocation of limited security resources on city road networks, are critical for urban safety. However, finding a Nash Equilibrium (NE) in large-scale UNSGs is challenging due to their massive and combinatorial action spaces. One common approach to addressing these games is the Policy-Space Response Oracle (PSRO) framework, which requires computing best responses (BR) at each iteration. However, precisely computing exact BRs is impractical in large-scale games, and employing reinforcement learning to approximate BRs inevitably introduces errors that limits the overall effectiveness of the PSRO methods. Recent advancements in leveraging non-convex stochastic optimization to approximate an NE offer a promising alternative to the burdensome BR computation. However, utilizing existing stochastic optimization techniques with an unbiased loss function for UNSGs remains challenging because the action spaces are too vast to be effectively represented by neural networks. To address these issues, we introduce \textbf{T}ree-based \textbf{S}tochastic \textbf{O}ptimization (TSO), a framework that bridges the gap between the stochastic optimization paradigm for NE-finding and the demands of UNSGs. Specifically, we employ the tree-based action representation that maps the whole action space onto a tree structure, addressing the challenge faced by neural networks in representing actions when the action space cannot be enumerated. We then incorporate this representation into the loss function and theoretically demonstrate its equivalence to the unbiased loss function. To further enhance the quality of the converged solution, we introduce a sample-and-prune mechanism that reduces the risk of being trapped in suboptimal local optima. Extensive experimental results indicate the superiority of TSO over other baseline algorithms in addressing the UNSGs.

Tree-Based Stochastic Optimization for Solving Large-Scale Urban Network Security Games

Cooperative perception is critical for autonomous driving, overcoming the inherent limitations of a single vehicle, such as occlusions and constrained fields-of-view. However, current approaches sharing dense Bird's-Eye-View (BEV) features are constrained by quadratically-scaling communication costs and the lack of flexibility and interpretability for precise alignment across asynchronous or disparate viewpoints.
While emerging sparse query-based methods offer an alternative, they often suffer from inadequate geometric representations, suboptimal fusion strategies, and training instability.
In this paper, we propose SparseCoop, a fully sparse cooperative perception framework for 3D detection and tracking that completely discards intermediate BEV representations. Our framework is built on a trio of innovations designed for robust and efficient fusion: a kinematic-grounded instance query that uses an explicit state vector with 3D geometry and velocity for precise spatio-temporal alignment; a coarse-to-fine aggregation module that effectively integrates information from both matched and unmatched instances; and a cooperative instance denoising task that provides stable, abundant supervision to accelerate and stabilize training.
Experiments on the V2X-Seq and Griffin datasets demonstrate that SparseCoop achieves new state-of-the-art performance in both 3D detection and tracking. Notably, it delivers this performance with superior computational efficiency and a highly competitive transmission cost, while showing remarkable robustness to real-world challenges like communication latency.

SparseCoop: Cooperative Perception with Kinematic-Grounded Queries

Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) with external knowledge retrieval, improving factual accuracy and knowledge coverage. However, existing RAG approaches face a fundamental trade-off when handling complex reasoning: while traditional iterative retrieval methods offer flexibility, their local perspective limits their ability to establish global knowledge connections. In contrast, struct-augmented RAG methods capture global relationships but incur significant construction costs. To fill in this gap, we propose MGranRAG, an innovative framework designed to integrate precise local retrieval with structured global reasoning. Our approach circumvents expensive semantic extraction by employing a lightweight contextual hierarchical graph, effectively combining the local adaptability of iterative retrieval with the global consistency of structured knowledge. The framework adopts a novel iterative optimization scheme: at the local level, the LLM identifies multi-granular contextual evidence, such as key sentences and phrases, within retrieved passages to refine retrieval. at the global level, these multi-granularity evidence nodes are then mapped and propagated within the structured hierarchical graph, enabling the diffusion of rich contextual information at different levels to introduce global semantic constraints and reorder retrieval results. This coordination between local and global iterative processes dynamically balances retrieval accuracy and contextual coherence. Experimental results on challenging multi-hop and open-domain question answering dataset show that our proposal achieves new state-of-the-art performance in both retrieval and answer accuracy.

Iterative Multi-Granular RAG with Contextual Hierarchical Graph

Spatiotemporal forecasting is a fundamental task in areas
such as traffic flow prediction, environmental sensing, and
urban planning. Recent advances have shown that decomposing
temporal signals into multiple frequencies and modeling
them jointly with spatial structures can significantly
enhance forecasting performance. However, existing multifrequency
forecasting models still face two critical limitations.
First, the importance of different temporal frequencies
evolves over time, yet most models assume fixed or static frequency
contributions. Second, spatial dependencies are inherently
frequency-sensitive. For instance, low-frequency components
often align with global spatial patterns, while highfrequency
components tend to correspond to localized interactions.
However, current approaches typically use a shared
spatial information across all frequencies, introducing spatiotemporal
inconsistency. To address these challenges, we
propose a novel Adaptive Frequency Pathways (AdaFre) for
spatiotemporal forecasting, which adaptively captures both
dynamic frequency relevance and frequency-aligned spatial
structures. AdaFre employs a multi-frequency routing mechanism
to dynamically select and aggregate the most informative
temporal frequency components, while associating each
with its corresponding spatial representation derived from
frequency-aware embeddings. Spatiotemporal backbones are
then used to model each path independently before final
aggregation. Extensive experiments on several real-world
datasets demonstrate that AdaFre significantly outperforms
state-of-the-art baselines.

Adaptive Frequency Pathways for Spatiotemporal Forecasting

Inverse Protein Folding (IPF) is a critical subtask in the field of protein design, aiming to engineer amino acid sequences capable of folding correctly into a specified three-dimensional (3D) conformation. Although substantial progress has been achieved in recent years, existing methods generally rely on either backbone coordinates or molecular surface features alone, which restricts their ability to fully capture the complex chemical and geometric constraints necessary for precise sequence prediction. To address this limitation, we present DS-ProGen, a dual-structure deep language model for functional protein design, which integrates both backbone geometry and surface-level representations. By incorporating backbone coordinates as well as surface chemical and geometric descriptors into a next-amino-acid prediction paradigm, DS-ProGen is able to generate functionally relevant and structurally stable sequences while satisfying both global and local conformational constraints. On the PRIDE dataset, DS-ProGen attains the current state-of-the-art recovery rate of 61.47\%, demonstrating the synergistic advantage of multi-modal structural encoding in protein design. Furthermore, DS-ProGen excels in predicting interactions with a variety of biological partners, including ligands, ions, and RNA, confirming its robust functional retention capabilities.

Downloads

Next from AAAI 2026

TriFusion-IDS: A Multimodal Graph-Tabular-Text Contrastive Framework for Cross-Dataset Intrusion Detection

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

TriFusion-IDS: A Multimodal Graph-Tabular-Text Contrastive Framework for Cross-Dataset Intrusion Detection

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads