Singapore

Large Language Models (LLMs) have demonstrated impressive capabilities in code generation. Like human programmers, LLMs call high-level APIs/libraries to program efficiently. However, this shortcut may hinder LLMs from learning the essential algorithm reasoning, leading instead to rote memorization of API usage. As a result, LLMs often struggle to generalize to new or domain-specific algorithms that lack ready-made library support. In this work, we propose **ARBench**, a novel benchmark for evaluating LLMs’ ability to generate machine learning algorithms from scratch, beyond merely invoking high-level APIs. It emphasizes algorithmic reasoning and implementation, distinguishing genuine understanding from superficial API usage. It covers fundamental and advanced machine learning tasks, rigorously assessing current LLMs’ capacity to implement these algorithms from scratch. Our evaluation reveals the strengths and weaknesses of state-of-the-art LLMs in algorithmic reasoning and generalization, offering valuable insights to guide future research and development. The code and data will be released after the paper is accepted.

AAAI 2026

ARBench: Algorithmic Reasoner or API Alchemist? Evaluating LLMs Beyond API Calls

evaluation of nlp models

code generation

large language models

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The ASP(Q) language extends Answer Set Programming (ASP) with Quantifiers that operate over answer sets.
In this way, ASP(Q) allows for natural modeling of problems of complexity beyond NP with ASP. 
In this paper we focus on ASP(Q) programs with two quantifiers, i.e., 2-ASP(Q) programs, which can be used to model problems in the second level of the PH. 
In particular, we propose an approach for evaluating 2-ASP(Q) programs that is inspired by Counterexample Guided Abstraction Refinement (CEGAR).
Unlike existing state-of-the-art ASP(Q) solvers, which are typically based on QBF solvers, our new approach leverages ASP solvers, and suffers no overhead due to the effects of translating ASP(Q) in QBF.
Experimental results demonstrate that our technique consistently outperforms both state-of-the-art ASP(Q) solvers and ASP solvers that rely on disjunctive encodings, across benchmark problems located at the second level of the polynomial hierarchy.

2-ASP(Q) Solving Based on CEGAR

Decentralized cooperative multi-agent multi-armed bandits (DeCMA2B) consider how multiple agents collaborate in a decentralized multi-armed bandit setting. Though this problem has been extensively studied in previous works, most existing methods remain susceptible to various adversarial attacks. In this paper, we first study DeCMA2B with adversarial corruption, where an adversary can corrupt reward observations of all agents with a limited corruption budget. We propose a robust algorithm, called DeMABAR, which ensures that each agent’s individual regret only suffers an additive term proportional to the corruption budget. Then we consider a more realistic scenario where the adversary can only attack a small number of agents. Our theoretical analysis shows that the DeMABAR algorithm can also almost completely eliminate the influence of adversarial attacks and is inherently robust in the Byzantine setting, where an unknown fraction of the agents can be Byzantine, i.e., arbitrarily select arms and communicate wrong information. We also conduct numerical experiments to illustrate the robustness and effectiveness of proposed methods.

Robust Decentralized Multi-armed Bandits: From Corruption-Resilience to Byzantine-Resilience

Semantic understanding of large-scale aerial scenes represents a critical challenge in 3D computer vision, hindered by the prohibitive cost of dense annotation. This paper introduces EvoPropGS, a novel approach for the semantic segmentation of 3D Gaussian Splatting models that requires only minimal supervision. Our core insight is to leverage the inherent structural repetitions within aerial environments to propagate semantic information from a sparse set of annotations across the entire 3D scene. Our approach constructs a prompt library by pairing SAM-generated mask candidates with DINOv2 feature embeddings from annotated views. For unannotated regions, we generate pseudo-labels by matching region proposals with these featured prompts via cosine similarity. We then formulate optimal prompt selection as a discrete optimization problem solved via evolutionary search, guided by our novel fitness function that evaluates both 3D consistency and 2D semantic coherence. Extensive experiments demonstrate that EvoPropGS achieves accurate segmentation with only 2\% annotated pixels.

Evolving Semantic Propagation for Aerial Semantic 3D Gaussian Splatting

Large Language Models (LLMs) have revolutionized intelligent interactions, enabling mobile applications such as personal assistants on edge devices for local execution. Speculative decoding (SD) has emerged as a promising paradigm to accelerate LLM inference without compromising generation quality, employing a draft-then-verify manner. However, due to the constrained computing and memory resources on edge devices, existing SD works heavily rely on an auxiliary draft model that incurs additional memory burden and hinders the adaptability, as well as static token trees that yield suboptimal inference performance. To this end, we propose DIAA, a Decoding-efficient Inference Acceleration Approach for on-device LLMs. DIAA achieves plug-and-play and model-agnostic inference speedup with memory and computation efficiency for edge devices. Specifically, a pair of lightweight look-up tables (LUTs) is constructed by Top-K token sampling to cache historical tokens and probabilities for rapid candidate drafting. DIAA integrates a dynamic token tree with prior LUTs enabling paralleled verification, updated during decoding process, to adapt the online context. A computation overlap is then employed to pipeline the update operations of token tree, LUTs, and KV cache to improve the computational efficiency. Finally, through extensive experiments implemented on edge platform NVIDIA Jetson, DIAA outperforms existing baselines in generation speed and inference wall-clock time, while incurring minimal memory overhead.

DIAA: A Decoding-Efficient Inference Acceleration Approach for On-Device Large Language Models

Object removal in 3D space is a key technology for immersive applications such as virtual reality (VR), augmented reality (AR), and the metaverse. While recent approaches have attempted to address this task using 2D inpainting techniques, they often suffer from two major limitations: (1) inaccurate geometric restoration in the removed regions, and (2) visual inconsistency across multiple viewpoints. To address these challenges, we propose a novel pipeline built upon the Gaussian Splatting framework. First, we perform geometry-aware inpainting by leveraging a pre-trained point cloud completion model and a coarse-to-fine inference strategy, enabling accurate restoration of unseen 3D structures. Next, we introduce a projection refinement network that improves the appearance of novel-view projections by addressing view-dependent artifacts such as color shifts and texture misalignments. Our method further enhances overall scene consistency through fine-tuning of the original Gaussian Splatting representation using the refined multi-view images. Experimental results show that our method makes geometrically accurate and visually coherent outputs, even in challenging 360° panoramic scenes, significantly outperforming existing methods.

GPGS: Consistent 3D Object Removal via Geometry-Aware 3D Inpainting and Projected Image Refinement in 3D Gaussian Splatting

Enabling neural networks to learn complex logical constraints and fulfill symbolic reasoning is a critical challenge. Bridging this gap often requires guiding the neural network’s output distribution to move closer to the symbolic constraints. While diffusion models have shown remarkable generative capability across various domains, we employ the powerful architecture to perform neuro-symbolic learning and solve logical puzzles. Our diffusion-based pipeline adopts a two-stage training strategy: the first stage focuses on cultivating basic reasoning abilities, while the second emphasizes systematic learning of logical constraints. To impose hard constraints on neural outputs in the second stage, we formulate the diffusion reasoner as a Markov decision process and innovatively fine-tune it with an improved proximal policy optimization algorithm. We utilize a rule-based reward signal derived from the logical consistency of neural outputs and adopt a flexible strategy to optimize the diffusion reasoner's policy. We evaluate our methodology on some classical symbolic reasoning benchmarks, including Sudoku, Maze, pathfinding and preference learning. Experimental results demonstrate that our approach achieves outstanding accuracy and logical consistency among neural networks.

Constraints-Guided Diffusion Reasoner for Neuro-Symbolic Learning

Large Language Models (LLMs) have exhibited remarkable capabilities but remain vulnerable to jailbreaking attacks, which can elicit harmful content from the models by manipulating the input prompts. 
Existing black-box jailbreaking techniques primarily rely on static prompts crafted with a single, non-adaptive strategy, or employ rigid combinations of several underperforming attack methods, which limits their adaptability and generalization.
To address these limitations, we propose MAJIC, a Markovian adaptive jailbreaking framework that attacks black-box LLMs by iteratively combining diverse innovative disguise strategies.
MAJIC first establishes a ''Disguise Strategy Pool'' by refining existing strategies and introducing several innovative approaches.
To further improve the attack performance and efficiency, MAJIC
formulate the sequential selection and fusion of strategies in the pool as a Markov chain. 
Under this formulation, MAJIC initializes and employs a Markov matrix to guide the strategy composition, where transition probabilities between strategies are dynamically adapted based on attack outcomes, thereby enabling MAJIC to learn and discover effective attack pathways tailored to the target model. 
Our empirical results demonstrate that MAJIC significantly outperforms existing jailbreak methods on prominent models such as GPT-4o and Gemini-2.0-flash, achieving over 90\% attack success rate with fewer than 15 queries per attempt on average.

MAJIC: Markovian Adaptive Jailbreaking via Iterative Composition of Diverse Innovative Strategies

Cross-language code clone detection, which identifies functionally similar code across programming languages, is critical for ensuring synchronized evolution and reducing maintenance costs in multi-platform software development. While zero-shot approaches have emerged as a practical solution to data scarcity, state-of-the-art methods still face two major limitations: an insufficiency in learning language-agnostic representations and information loss during the processing of long code. To address these challenges, we propose LC3, a novel framework for robust zero-shot cross-language code clone detection. To overcome the language-agnostic representation insufficiency, LC3 fuses source code with its underlying opcode sequences, leveraging a bimodal architecture and adversarial training to learn a language-agnostic representation. To resolve long-code information loss, LC3 introduces a semantic affinity aggregation strategy. This strategy synthesizes a robust clone score from a complete pairwise similarity matrix computed between segmented code blocks, overcoming the limitations of both simple truncation and aggregation. Extensive experiments show that LC3 significantly outperforms state-of-the-art zero-shot baselines, especially in challenging long-code scenarios.

LC3: Long Cross-Language Code Clone Detection Enhanced by Opcode Sequences and Affinity Aggregation

Simultaneous speech translation (SimulST) produces translations incrementally while processing partial speech input. Although large language models (LLMs) have showcased strong capabilities in offline translation tasks, applying them to SimulST poses notable challenges. Existing LLM-based SimulST approaches either incur significant computational overhead due to repeated encoding of bidirectional speech encoder, or they depend on a fixed read/write policy, limiting the efficiency and performance. In this work, we introduce Efficient and Adaptive Simultaneous Speech Translation (EASiST) with fully unidirectional architecture, including both speech encoder and LLM. EASiST includes a multi-latency data curation strategy to generate semantically aligned SimulST training samples and redefines SimulST as an interleaved generation task with explicit read/write tokens. To facilitate adaptive inference, we incorporate a lightweight policy head that dynamically predicts read/write actions. Additionally, we employ a multi-stage training strategy to align speech-text modalities and optimize both translation and policy behavior.
Experiments on both in-domain (MuST-C) and out-of-domain (Europarl-ST) En$\rightarrow$De and En$\rightarrow$Es datasets demonstrate that EASiST offers superior latency-quality trade-offs compared to several strong baselines.

Efficient and Adaptive Simultaneous Speech Translation with Fully Unidirectional Architecture

Accurate segmentation of neural structures in Electron Microscopy (EM) images is paramount for neuroscience. However, this task is challenged by intricate morphologies, low signal-to-noise ratios, and scarce annotations, limiting the accuracy and generalization of existing methods. To address these challenges, we seek to leverage the priors learned by visual foundation models on a vast amount of natural images to better tackle this task. Specifically, we propose a novel framework that can effectively transfer knowledge from Segment Anything 2 (SAM2)—a model pre-trained on natural images—to the EM domain. We first use SAM2 to extract powerful, general-purpose features. To bridge the domain gap, we introduce a Feature-Guided Attention module that leverages semantic cues from SAM2 to guide a lightweight encoder, the Fine-Grained Encoder (FGE), in focusing on these challenging regions. Finally, a dual-affinity decoder generates both coarse and refined affinity maps. Experimental results demonstrate that our method achieves performance comparable to state-of-the-art (SOTA) approaches with the SAM2 weights frozen. Upon further fine-tuning on EM data, our method significantly outperforms existing SOTA methods. This study validates that transferring representations pre-trained on natural images, when combined with targeted domain-adaptive guidance, can effectively address the specific challenges in neuron segmentation.

Downloads

Next from AAAI 2026

2-ASP(Q) Solving Based on CEGAR

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

2-ASP(Q) Solving Based on CEGAR

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads