Singapore

Obtaining high-quality labeled datasets for e-commerce product information extraction remains challenging and costly. We present a systematic framework for generating trustworthy synthetic product data using Large Language Models (LLMs), introducing controlled modification strategies with built-in governance mechanisms: attribute-preserving modification, controlled negative example generation, and systematic attribute removal. Our approach implements responsible generation through brand anonymization, multi-stage validation, and semantic consistency enforcement. Human evaluation of 2,000 synthetic products demonstrates high quality (99.6\% natural language, 96.5\% valid attributes, 94.2\% consistency). Downstream evaluation shows synthetic data matches real data performance (60.5\% vs 60.8\% accuracy), with hybrid configurations reaching 68.8\% accuracy while reducing annotation costs by up to three orders of magnitude. Our framework provides a cost-effective, scalable solution for responsible synthetic data generation in resource-constrained scenarios, with quantitative metrics demonstrating maintained lexical diversity (TTR: 0.83 vs 0.84) and semantic fidelity (0.86 cosine similarity).

AAAI 2026

Attribute-Aware Controlled Product Generation with LLMs for E-commerce

workshop paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.

To access this event page, you need to log in with the **email address you registered with**. Access credentials will be sent to your email from Underline - subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The increased availability of genetic data has transformed genomics research, but raised many privacy concerns regarding its handling due to its sensitive nature. This work explores the use of language models (LMs) for the generation of synthetic genetic mutation profiles, leveraging differential privacy (DP) for the protection of sensitive genetic data. We empirically evaluate the privacy guarantees of our DP modes by introducing a novel **Biologically-Informed Hybrid Membership Inference Attack** (biHMIA), which combines traditional black box MIA with contextual genomics metrics for enhanced attack power. Our experiments show that both small and large transformer GPT-like models are viable synthetic variant generators for *small-scale genomics*, and that our hybrid attack leads, on average, to higher adversarial success compared to traditional metric-based MIAs.

Biologically-Informed Hybrid Membership Inference Attacks on Generative Genomic Models

The proliferation of AI-generated video technologies poses challenges to information integrity. While recent benchmarks advance AIGC video detection, they overlook a critical factor: many state-of-the-art generative models embed digital watermarks in outputs, and detectors may partially rely on these patterns. To evaluate this influence, we present RobustSora, the benchmark designed to assess watermark robustness in AIGC video detection. We systematically construct a dataset of 6,500 videos comprising four types: Authentic-Clean (A-C), Authentic-Spoofed with fake watermarks (A-S), Generated-Watermarked (G-W), and Generated-DeWatermarked (G-DeW). Our benchmark introduces two evaluation tasks: Task-I tests performance on watermark-removed AI videos, while Task-II assesses false alarm rates on authentic videos with fake watermarks. Experiments with ten models spanning specialized AIGC detectors, transformer architectures, and MLLM approaches reveal performance variations of 2-8pp under watermark manipulation. Transformer-based models show consistent moderate dependency (6-8pp), while MLLMs exhibit diverse patterns (2-8pp). These findings indicate partial watermark dependency and highlight the need for watermark-aware training strategies. RobustSora provides essential tools to advance robust AIGC detection research.

RobustSora: De-Watermarked Benchmark for Robust AI-Generated Video Detection

Large Language Models (LLMs) have shown advanced capabilities in tasks like counterfactual generation and style transfer using prompt strategies. However, previous strategies lacked detailed instructions, limiting effectiveness. To address this, we introduce Compare&Generate, an algorithm inspired by human comparison, where minimal instructions lead to substantial learning. Specifically, our method incorporates an objective function that quantitatively assesses alignment with the task goal and the content relevance in the output. Then, it constructs comparison pairs based on previous generation assessments and prompts the model to reconsider how to optimize its output. Through comparison, the model focuses on the critical aspects of the task objective and refines its outputs accordingly. We benchmark our method with single-instruction as well as iterative refinement approaches across three natural language generation tasks. Experimental results show that our approach outperforms other related methods; for instance, it surpasses its single-instruction base by 17% and a state-of-the-art refinement approach by 7% on IMDB datasets in generated label accuracy, highlighting the effectiveness of using comparisons in prompts to enhance LLMs.

Improving Synthetic Data Generation with LLMs through Strategic Comparisons

Knowledge Graphs (KGs) enable applications in various domains such as semantic search, recommendation systems, and natural language processing. KGs are often incomplete, missing entities and relations, an issue addressed by Knowledge Graph Completion (KGC) methods that predict missing elements. Different evaluation metrics, such as Mean Reciprocal Rank (MRR), Mean Rank (MR), and Hit@k (e.g., Hit@1), are commonly used to assess the performance of such KGC models. A major challenge in evaluating KGC models however, lies in comparing their performance across multiple datasets and metrics. A model may outperform others on one dataset but underperform on another, making it difficult to determine overall superiority. Moreover, even within a single dataset, different metrics such as MRR and Hit@1 can yield conflicting rankings, where one model excels in MRR while another performs better in Hit@1, further complicating model selection for downstream tasks. These inconsistencies hinder holistic comparisons and highlight the need for a unified meta-metric that integrates performance across all metrics and datasets to enable a more reliable and interpretable evaluation framework. To address this need, we propose KG \textit{E}valuation based on \textit{D}istance from \textit{A}verage \textit{S}olution (EDAS), a robust and interpretable meta-metric that synthesizes model performance across multiple datasets and diverse evaluation criteria into a single normalized score ($M_i \in [0,1]$). Unlike traditional metrics that focus on isolated aspects of performance, EDAS offers a global perspective that supports more informed model selection and promotes fairness in cross-dataset evaluation. Experimental results on benchmark datasets such as FB15k-237 and WN18RR demonstrate that EDAS effectively integrates multi-metric, multi-dataset performance into a unified ranking, offering a consistent, robust, and generalizable framework for evaluating KGC models.

KG-EDAS: A Meta-Metric Framework for Evaluating Knowledge Graph Completion Models

Dialogue State Tracking (DST) requires precise extraction of structured information from multi-domain conversations, a task where Large Language Models (LLMs) struggle despite their impressive general capabilities. We present GEM (Graph-Enhanced Mixture-of-Experts), a novel framework that combines language models and graph-structured dialogue understanding with ReAct agent-based reasoning for superior DST performance. Our approach dynamically routes between specialized experts: a Graph Neural Network that captures dialogue structure and turn-level dependencies, and a finetuned T5-Small encoder-decoder for sequence modeling, coordinated by an intelligent gating network. For complex value generation tasks, we integrate ReAct agents that perform structured reasoning over dialogue context. On MultiWOZ 2.2, GEM achieves 65.19% Joint Goal Accuracy, substantially outperforming end-to-end LLM approaches (best: 38.43%) and surpassing state-of-the-art (SOTA) methods including TOATOD (63.79%), D3ST (58.70%), and Diable (56.48%). Our graph-enhanced mixture-of-experts architecture with ReAct integration demonstrates that combining structured dialogue representation with dynamic expert routing and agent-based reasoning provides a powerful paradigm for dialogue state tracking, achieving superior accuracy while maintaining computational efficiency through selective expert activation.

GEM: Graph-Enhanced Mixture-of-Experts with ReAct Agents for Dialogue State Tracking

With the increase of data in day-to-day life, businesses and different stakeholders need to analyze the data for better predictions. Traditionally, relational data has been a source of various insights, but with the increase in computational power and the need to understand deeper relationships between entities, the need to design new techniques has arisen. For this graph data analysis has become an extraordinary tool for understanding the data, which reveals more realistic and flexible modelling of complex relationships. Recently, Graph Neural Networks (GNNs) have shown great promise in various applications, such as social network analysis, recommendation systems, drug discovery, and more. However, many adversarial attacks can happen over the data, whether during training (poisoning attack) or during testing (evasion attack), which can adversely manipulate the desired outcome from the GNN model. Therefore, it is crucial to make the GNNs robust to such attacks. The existing robustness methods are computationally demanding and perform poorly when the intensity of attack increases. This paper presents a computationally efficient framework, namely, pLAPGNN, based on weighted p-Laplacian for making GNNs robust. Empirical evaluation on real datasets establishes the efficacy and efficiency of the proposed method.

Enhancing Robustness of Graph Neural Networks through p-Laplacian

Many real-world systems, from neural circuits to economic networks, exhibit feedback loops that are best represented as directed cyclic graphs (DCGs). Yet most scalable causal discovery methods either impose hard acyclicity or rely on global backpropagation, making them unsuitable for feedback-rich settings. We propose PreCyc, a predictive coding framework for causal structure learning that combines node-wise energy minimisation with a soft acyclicity surrogate and sparsity regularisation. The algorithm alternates local state inference and weight updates, avoiding reverse-mode differentiation while remaining scalable to larger graphs. Our analysis shows convergence to a stationary point under standard smoothness assumptions, and we clarify the distinction between local error signals for data fit and the global nature of acyclicity enforcement. Experiments on synthetic Erdos–Renyi, Watts–Strogatz, and scale free graphs, as well as the 279-node C. elegans connectome, demonstrate competitive performance in both structure recovery and cycle identification compared with state-of-the art cyclic causal discovery methods. While the current implementation focuses on linear structural equation models with observational equilibrium data, PreCyc establishes predictive coding as a principled and scalable foundation for causal discovery in feedback-rich systems.

Predictive Coding Causal Discovery for Directed Cyclic Graphs

Prior work on node classification shows that Graph Neural Networks (GNNs) can learn transferable representations of graph properties when those properties are consistent across graphs. For a fixed graph, one would then expect GNNs trained for link prediction to learn a representation consistent with that learnt for node classification. We show this intuition does not hold in the general case. We find instead, popular link prediction models can learn a trivial mini-batch dependent heuristic, enabled by batch normalisation layers, to solve the edge classification task. When correcting for this, we observe increased alignment of network representation with node-class relevant features, suggesting the network has learnt a graph representation that better aligns with the underlying graph's properties. Our findings suggest that standard link prediction training may be leading us to overestimate link predictors' ability to learn a generalised representation of a graph that is consistent across tasks.

Mini-Batch Class Composition Bias in Link Prediction

Knowledge graph completion aims to infer unknown information in a knowledge graph that is incomplete, due to noisy or missing data. Geographic knowledge graphs, which are typically derived from crowd-sourced data, are often incomplete, making geographic knowledge graph completion an important problem. Most current methods for knowledge graph completion are generic, and do not account for the spatial nature of geographic knowledge graphs. The few methods that are tailored to geographic knowledge graphs are computationally expensive or are designed for a closed-world setting, which is not practical in the geography domain. We study this problem by evaluating existing state-of-the-art standard and geo-specific knowledge graph completion methods on a large dataset of geographic knowledge graphs. Our findings reveal that these methods perform poorly, leaving an open problem for the AI and graphs community. To aid in further research, we suggest some possible areas of work that we believe could lead to fruitful developments for this problem.

An Experimental Analysis of Geographic Knowledge Graph Completion Methods

Chain-of-thought (CoT) prompting enables Large Language Models to solve complex problems, but deploying these models safely requires reliable confidence estimates—a capability where existing methods suffer from poor calibration and severe overconfidence on incorrect predictions. We propose Enhanced Dirichlet+Topology Risk (EDTR), a novel decoding strategy that combines topological analysis with Dirichlet-based uncertainty quantification to measure LLM confidence across multiple reasoning paths. EDTR treats each CoT as a vector in high-dimensional space and extracts eight topological risk features capturing the geometric structure of reasoning distributions: tighter, more coherent clusters indicate higher confidence while dispersed, inconsistent paths signal uncertainty. We evaluate EDTR against three state-of-the-art calibration methods across four diverse reasoning benchmarks spanning olympiad-level mathematics (AIME), grade school math (GSM8K), commonsense reasoning, and stock price prediction. EDTR achieves 41\% better calibration than competing methods with an average ECE of 0.287 and the best overall composite score of 0.672, while notably achieving perfect accuracy on AIME and exceptional calibration on GSM8K with an ECE of 0.107—domains where baselines exhibit severe overconfidence. Our work provides a geometric framework for understanding and quantifying uncertainty in multi-step LLM reasoning, enabling more reliable deployment where calibrated confidence estimates are essential.

Premium content

Next from AAAI 2026

Biologically-Informed Hybrid Membership Inference Attacks on Generative Genomic Models

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES