Singapore

In the context of pretraining of Large Language Models (LLMs), synthetic data has emerged as an alternative for generating high-quality pretraining data at scale. This is particularly beneficial in low resource language settings where the benefits of the recent LLMs have been unevenly distributed across languages. In this work, we present a systematic study on the generation and evaluation of synthetic multilingual pretraining data for Indic languages, where we construct a large scale synthetic dataset BhashaKritika, comprising 540B tokens using 5 different techniques for 10 languages. We explore the impact of grounding generation in documents, personas and topics. We analyze how language choice, both in the prompt instructions and document grounding affects data quality and we compare translations of English content with native generation in Indic languages. In order to support scalable and language-sensitive evaluation, we introduce a modular quality evaluation pipeline that integrates script and language detection, metadata consistency checks, n-gram repetition analysis, and perplexity-based filtering using KenLM models. Our framework enables robust quality control across diverse scripts and linguistic contexts. Empirical results through model runs reveal key trade-offs in generation strategies and highlight best practices for constructing
effective multilingual corpora. This work contributes practical insights for advanced pretraining recipes in low-resource and script-diverse settings, particularly in the Indian context.

AAAI 2026

BhashaKritika: Building Synthetic Pretraining Data at Scale for Indic Languages

nlp: machine translation

nlp: prompt engineering / prompting

nlp: (large) language models

multilinguality

cross-lingual nlp

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Large language models (LLMs) are seeing growing adoption in multi-agent systems. In these systems, efficient failure attribution is critical for ensuring robustness and interpretability. Current LLM-based attribution methods often face challenges with lengthy logs and lacking expert knowledge. Drawing inspiration from human debugging strategies, we propose an automated failure attribution framework, Scope Delineation Before Localization, which operates in two key stages: (1) identifying the failure scope and (2) pinpointing the failure step. By decoupling failure attribution into the two stages, our approach alleviates the reasoning workload of LLMs, enabling more precise failure attribution. To support scope delineation, we further introduce two strategies: Stepwise Scope Delineation and Expertise-Assisted Scope Delineation. Experiments on the Who\&When dataset validate the efficacy of our two-stage framework, demonstrating substantial improvements over prior methods (up to 24.27\% on step-level accuracy).

Scope Delineation Before Localization: A Two-Stage Framework for Enhancing Failure Attribution in Multi-Agent Systems

As the pretraining-finetuning paradigm becomes dominant, it exposes new vulnerabilities in the model supply chain, particularly through sophisticated backdoor attacks. Prevailing research has largely focused on backdoors embedded during pretraining, viewing subsequent finetuning merely as a potential defense. This perspective overlooks the possibility of weaponizing the finetuning process itself, leaving a critical security blind spot. While emerging studies have explored finetuning-activated backdoors, their efficacy critically depends on white-box access to the downstream task's data distribution. This reliance on unobtainable prior knowledge severely limits their real-world feasibility. In this work, we propose the Dormant Backdoor, \textbf{a novel backdoor attack robust across unknown downstream tasks by weaponizing the finetuning process itself}. The key innovation is to shift the trigger from static data features to the universal dynamics of gradient-based optimization. We engineer the backdoor to be dormant and stealthy in the pretrained model, making it indistinguishable from a benign one. During finetuning, however, the very gradient updates intended for task adaptation are hijacked to progressively awaken and amplify the malicious functionality, turning the learning process against itself. Our comprehensive evaluations across multiple downstream datasets, finetuning techniques and backdoor detection schemes demonstrate that Dormant Backdoor persists reliably, revealing a new and dangerous class of process-as-trigger vulnerabilities inherent in the modern AI ecosystem.

Dormant Backdoor: Weaponizing Model Finetuning for Feasible Backdoor Attacks Against Pretrained Models

While \textbf{RE}trieval-\textbf{A}ugmented \textbf{L}LM-based \textbf{M}achine \textbf{T}ranslation (REAL-MT) shows promise, its behavior under noisy contexts remains poorly understood. In this work, we propose a noise synthesis framework and robustness metrics to assess REAL-MT under noisy contexts. We evaluate REAL-MT systems based on Qwen series models on idiomatic translation tasks across diverse languages and resource levels under noisy contexts. Our results reveal that LLMs exhibit severe degradation in translation quality, frequently generating nonsensical translations. Although large reasoning models (LRMs) possess enhanced reasoning capabilities, they show no improvement in error correction and are even more susceptible to noise. By analyzing attention patterns, we find that the model shifts its focus from essential idiomatic components to noisy contextual content, leading to erroneous translations. We investigate training-free and training-based strategies that enhance robustness but slightly degrade performance in clean contexts. These results highlight the limitations of current approaches and underscore the need for more effective methods that strike a balance between noise resistance and knowledge integration.

Exposing the Cracks: Vulnerabilities of Retrieval-Augmented LLM-based Machine Translation

While Vision-Language Models (VLMs) have garnered increasing attention in the AI community due to their promising practical applications, they exhibit persistent hallucination issues, generating outputs misaligned with visual inputs. Recent studies attribute these hallucinations to VLMs' over-reliance on linguistic priors and insufficient visual feature integration, proposing heuristic decoding calibration strategies to mitigate them. However, the non-trainable nature of these strategies inherently limits their optimization potential. To this end, we propose an adversarial parametric editing framework for Hallucination mitigation in VLMs, which follows an Activate-Locate-Edit Adversarially paradigm. Specifically, we first construct an activation dataset that comprises grounded responses (positive samples attentively anchored in visual features) and hallucinatory responses (negative samples reflecting LLM prior bias and internal knowledge artifacts). Next, we identify critical hallucination-prone parameter clusters by analyzing differential hidden states of response pairs. Then, these clusters are fine-tuned using prompts injected with adversarial prefixes optimized via prompt tuning to maximize visual neglect, thereby forcing the model to prioritize visual evidence over inherent parametric biases. Evaluations on both generative and discriminative VLM tasks demonstrate the significant effectiveness of ALEAHallu in alleviating hallucinations. Our code is available at https://anonymous.4open.science/r/knowledge_editing-C890/

Look Closer! An Adversarial Parametric Editing Framework for Hallucination Mitigation in VLMs

Swarm UAV autonomous flight for Long-Horizon (LH) tasks is crucial for advancing the low-altitude economy. However, existing methods focus only on specific basic tasks due to dataset limitations, failing in real-world deployment for LH tasks. LH tasks are not mere concatenations of basic tasks, requiring handling long-term dependencies, maintaining persistent states, and adapting to dynamic goal shifts. This paper presents U2UData-2, the first large-scale swarm UAV autonomous flight dataset for LH tasks and the first scalable swarm UAV data online collection and algorithm closed-loop verification platform. The dataset is captured by 15 UAVs in autonomous collaborative flights for LH tasks, comprising 12 scenes, 720 traces, 120 hours, 600 seconds per trajectory, 4.32M LiDAR frames, and 12.96M RGB frames. This dataset also includes brightness, temperature, humidity, smoke, and airflow values covering all flight routes. The platform supports the customization of simulators, UAVs, sensors, flight algorithms, formation modes, and LH tasks. Through a visual control window, this platform allows users to collect customized datasets through one-click deployment online and to verify algorithms by closed-loop simulation. U2UData-2 also introduces an LH task for wildlife conservation and provides comprehensive benchmarks with 9 SOTA models.

U2UData+: A Scalable Swarm UAVs Autonomous Flight Dataset for Embodied Long-horizon Tasks

Split inference (SI) enables users to access deep learning (DL) services without directly transmitting raw data. However, recent studies reveal that data reconstruction attacks (DRAs) can recover the original inputs from the smashed data sent from the client to the server, leading to significant privacy leakage. While various defenses have been proposed, they often result in substantial utility degradation, particularly when the client-side model is shallow. We identify a key cause of this trade-off: existing defenses apply excessive perturbation to redundant information in the smashed data. To address this issue in computer vision tasks, we propose InfoDecom, a defense framework that first decomposes and removes redundant information and then injects noise calibrated to provide theoretically guaranteed privacy. Experiments demonstrate that InfoDecom achieves a superior utility-privacy trade-off compared to existing baselines.

InfoDecom: Decomposing Information for Defending Against Privacy Leakage in Split Inference

Zero-shot coordination(ZSC) has recently become a hot topic in reinforcement learning research recently. It focuses on the generalization ability of agents, requiring them to coordinate well with collaborators that are not seen before without any fine-tuning. Population-based training has been proven to provide good zero-shot coordination performance; nevertheless, existing methods are limited by computational resources, mainly focusing on optimizing diversity in small populations while neglecting the potential performance gains from scaling population size. To address this issue, this paper proposes the Scalable Population Training (ScaPT), an efficient training framework comprising two key components: a meta-agent that efficiently realizes a population by selectively sharing parameters across agents, and a mutual information regularizer that guarantees population diversity. To empirically validate the effectiveness of ScaPT, this paper evaluates it along with representational frameworks in Hanabi and confirms its superiority.

Efficient Reinforcement Learning for Zero-Shot Coordination in Evolving Games

One-shot federated learning (OSFL) reduces the communication cost and privacy risks of iterative federated learning by constructing a global model with a single round of communication. However, most existing methods struggle to achieve robust performance on real-world domains such as medical imaging, or are inefficient when handling non-IID (Independent and Identically Distributed) data. To address these limitations, we introduce FALCON, a novel framework that enhances the effectiveness of OSFL over non-IID image data. The core idea of FALCON is to leverage the feature-aware hierarchical token sequences generation and knowledge distillation into OSFL. First, each client leverages a pretrained visual encoder with hierarchical scale encoding to compress images into hierarchical token sequences, which capture multi-scale semantics. Second, a multi-scale autoregressive transformer generator is used to model the distribution of these token sequences and generate the synthetic sequences. Third, clients upload the synthetic sequences along with the local classifier trained on the real token sequences to the server. Finally, the server incorporates knowledge distillation into global training to reduce reliance on precise distribution modeling. Experiments on medical and natural image datasets validate the effectiveness of FALCON in diverse non-IID scenarios, outperforming the best OSFL baselines by 9.58\% in average accuracy.

Feature-Aware One-Shot Federated Learning via Hierarchical Token Sequences

Hybrid action space, which combines discrete choices and continuous parameters, is prevalent in domains such as robot control and game AI. However, efficiently modeling and optimizing hybrid discrete-continuous action space remains a fundamental challenge, mainly due to limited policy expressiveness and poor scalability in high-dimensional settings. 
To address this challenge, we view the hybrid action space problem as a fully-cooperative game and propose a \textbf{Cooperative Hybrid Diffusion Policies (CHDP)} framework to solve it.
CHDP employs two cooperative agents that leverage a discrete and a continuous diffusion policy, respectively.
The continuous policy is conditioned on the discrete action's representation, explicitly modeling the dependency between them.
This cooperative design allows the diffusion policies to leverage their expressiveness to capture complex distributions in their respective action spaces.
To mitigate the update conflicts arising from simultaneous policy updates in this cooperative setting, we employ a sequential update scheme that fosters co-adaptation.
Moreover, to improve scalability when learning in high-dimensional discrete action space, we construct a codebook that embeds the action space into a low-dimensional latent space. 
This mapping enables the discrete policy to learn in a compact, structured space. 
Finally, we design a Q-function-based guidance mechanism to align the codebook's embeddings with the discrete policy's representation during training.
On challenging hybrid action benchmarks, CHDP outperforms state-of-the-art method by up to $19.3\%$ in success rate.

CHDP: Cooperative Hybrid Diffusion Policies for Reinforcement Learning in Parameterized Action Space

In real-world time-series modelling, graph structures are widely adopted because they explicitly encode node topology and capture complex network dynamics. In practice, however, a complete graph is often partitioned across multiple parties; each party can access only its local sub-graph and, owing to privacy regulations, cannot share topology or data, creating pervasive data silos. Federated Graph Learning (FGL) offers a privacy-preserving collaborative-learning paradigm, yet current methods still face two key challenges: (1) they implicitly capture inter-edge information, making it difficult to accurately reconstruct the global structure and consequently degrading model performance; (2) explicitly exchanging inter-edge information may leak graph-topology privacy. To overcome these obstacles, we propose FedSkeleton, a privacy-preserving framework for time-series prediction that comprises a Skeleton Construction Module and a Dual-stream Forecasting Module, enabling global dependency capture without revealing the topology. Extensive experiments show that FedSkeleton consistently outperforms existing baselines and even surpasses centralised models with full-graph access. In addition, we conduct comprehensive security analysis, communication-cost evaluation and scalability experiments, demonstrating that FedSkeleton effectively resists common attacks, keeps communication overhead manageable and remains robust with respect to key hyper-parameters and the number of participating parties.

Downloads

Next from AAAI 2026

Scope Delineation Before Localization: A Two-Stage Framework for Enhancing Failure Attribution in Multi-Agent Systems

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES