Singapore

Chart Question Answering (CQA) evaluates Multimodal Large Language Models (MLLMs) on visual understanding and reasoning over chart data. However, existing benchmarks mostly test surface-level parsing, such as reading labels and legends, while overlooking deeper scientific reasoning. We propose DomainCQA, a framework for constructing domain-specific CQA benchmarks that emphasize both visual comprehension and knowledge-intensive reasoning. It integrates complexity-aware chart selection, multitier QA generation, and expert validation. Applied to astronomy, DomainCQA yields AstroChart, a benchmark of 1,690 QA pairs over 482 charts, exposing persistent weaknesses in fine-grained perception, numerical reasoning, and domain knowledge integration across 21 MLLMs. Fine-tuning on AstroChart improves performance across fundamental and advanced tasks. Pilot QA sets in biochemistry, economics, medicine, and social science further demonstrate DomainCQA’s generality. Together, our results establish DomainCQA as a unified pipeline for constructing and augmenting domain-specific chart reasoning benchmarks.

AAAI 2026

DomainCQA: Crafting Knowledge-Intensive QA from Domain-Specific Charts

(large) language models

question answering

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Existing state-of-the-art image tokenization methods leverage diverse semantic features from pre-trained vision models for additional supervision, to expand the distribution of latent representations and thereby improve the quality of image reconstruction and generation.
These methods employ a locally supervised approach for semantic supervision, which limits the uniformity of semantic distribution. However, VA-VAE proves that a more uniform feature distribution yields better generation performance.
In this work, we introduce a Global Perspective Tokenizer (GloTok), which utilizes global relational information to model a more uniform semantic distribution of tokenized features.
Specifically, a codebook-wise histogram relation learning method is proposed to transfer the semantics, which are modeled by pre-trained models on the entire dataset, to the semantic codebook.
Then, we design a residual learning module which recovers the fine-grained details to minimize the reconstruction error caused by quantization.
Through the above design, GloTok delivers more uniformly distributed semantic latent representations, which facilitates the training of autoregressive (AR) models for generating high-quality images without requiring direct access to pre-trained models during the training process.
Experiments on the standard ImageNet-1k benchmark clearly show that our proposed method achieves state-of-the-art reconstruction performance and generation quality.

GloTok: Global Perspective Tokenizer for Image Reconstruction and Generation

Core-selecting combinatorial auctions are popular auction designs that constrain prices to eliminate the incentive for any group of bidders---with the seller---to renegotiate for a better deal. They help overcome the low-revenue issues of classical combinatorial auctions. We introduce a new class of core-selecting combinatorial auctions that leverage bidder information available to the auction designer. We model such information through constraints on the joint type space of the bidders---these are constraints on bidders' private valuations that are known to hold by the auction designer before bids are elicited. First, we show that type space information can overcome the well-known impossibility of incentive-compatible core-selecting combinatorial auctions. We present a revised and generalized version of that impossibility result that depends on how much information is conveyed by the type spaces. We then devise a new family of core-selecting combinatorial auctions and show that they minimize the sum of bidders' incentives to deviate from truthful bidding. We develop new constraint generation techniques---and build upon existing quadratic programming techniques---to compute core prices, and conduct experiments to evaluate the incentive, revenue, fairness, and computational merits of our new auctions. Our new core-selecting auctions directly improve upon existing designs that have been used in many high-stakes auctions around the world. We envision that they will be a useful addition to any auction designer's toolkit.

Weakest Bidder Types and New Core-Selecting Combinatorial Auctions

Underwater Instance Segmentation (UIS), integrating pixel-level understanding and instance-level discrimination, is a pivotal technology in marine resource exploration and ecological protection. In recent years, large-scale pretrained visual foundation models, exemplified by DINO, have advanced rapidly and demonstrated remarkable performance on complex downstream tasks. In this paper, we demonstrate that DINO can serve as an effective feature learner for UIS, and we introduce DiveSeg, a novel framework built upon two insightful components: (1) The AquaStyle Aligner, designed to embed underwater color style features into the DINO fine-tuning process, facilitating better adaptation to the underwater domain. (2) The ObjectPrior Prompter, which incorporates binary segmentation-based prompts to deliver object-level priors, provides essential guidance for instance segmentation task that requires both object- and instance-level reasoning. We conduct thorough experiments on the popular UIIS and USIS10K datasets, and the results show that DiveSeg achieves the state-of-the-art performance.

Empowering DINO Representations for Underwater Instance Segmentation via Aligner and Prompter

Quantization is critical for efficiently deploying large language models (LLMs). Yet conventional methods remain hardware-agnostic, limited to bit-width constraints, and do not account for intrinsic circuit characteristics such as the timing behaviors and energy profiles of Multiply-Accumulate (MAC) units. This disconnect from circuit-level behavior limits the ability to exploit available timing margins and energy-saving opportunities, reducing the overall efficiency of deployment on modern accelerators. To address these limitations, we propose HALO, a versatile framework for Hardware-Aware Post-Training Quantization (PTQ). Unlike traditional methods, HALO explicitly incorporates detailed hardware characteristics, including critical-path timing and power consumption, into its quantization approach. HALO strategically selects weights with low critical-path-delays enabling higher operational frequencies and dynamic frequency scaling without disrupting the architecture's dataflow. Remarkably, HALO achieves these improvements with only a few dynamic voltage and frequency scaling (DVFS) adjustments, ensuring simplicity and practicality in deployment. Additionally, by reducing switching activity within the MAC units, HALO effectively lowers energy consumption. Evaluations on accelerators such as Tensor Processing Units (TPUs) and Graphics Processing Units (GPUs) demonstrate that HALO significantly enhances inference efficiency, achieving average performance improvements of 270% and energy savings of 51% over baseline quantization methods, all with minimal impact on accuracy.

HALO: Hardware-Aware Quantization with Low Critical-Path-Delay Weights for LLM Acceleration

Hierarchical data pervades diverse machine learning applications, including natural language processing, computer vision, and social network analysis. Hyperbolic space, characterized by its negative curvature, has demonstrated strong potential in such tasks due to its capacity to embed hierarchical structures with minimal distortion. Previous evidence indicates that the hyperbolic representation capacity can be further enhanced through kernel methods. However, existing hyperbolic kernels still suffer from mild geometric distortion or lack adaptability. This paper addresses these issues by introducing a curvature-aware de Branges–Rovnyak space, a reproducing kernel Hilbert space (RKHS) that is isometric to a Poincar$\'$e ball. We design an adjustable multiplier to select the appropriate RKHS corresponding to the hyperbolic space with any curvature adaptively. Building on this foundation, we further construct a family of adaptive hyperbolic kernels, including the novel adaptive hyperbolic radial kernel, whose learnable parameters modulate hyperbolic features in a task-aware manner. Extensive experiments on visual and language benchmarks demonstrate that our proposed kernels outperform existing hyperbolic kernels in modeling hierarchical dependencies.

Adaptive Hyperbolic Kernels: Modulated Embedding in de Branges-Rovnyak Spaces

We propose an extension of Description Logics (DLs) with generic concepts
and conditional axioms. Inspired by object-oriented languages, generic
concepts allow a compact definition of concepts with similar structures.
For example, one can define a generic concept Owner[X] to describe objects
that own another object from X, and later use a specific replacement of the
parameter X, such as Owner[Pet] representing pet owners. Conditional axioms
can be used to set bounds on the values that replace the generic parameters.
For example, we could restrict replacements of X in a concept Keeper[X] to
only subconcepts of Pet. As the set of possible parameter replacements can
be infinite and even uncountable, the generic extensions are, in general,
undecidable. To identify decidable generic DLs, we focus on the case of
terminologies, requiring that variables are only used in definitions of
generic concepts. We formulate restrictions that allow a reduction
of generic entailment to classical entailment and further conditions that
ensure decidability.

Extending Description Logics with Generic Concepts – the Case of Terminologies

In semi-decentralized federated learning, devices primarily rely on device-to-device communication but occasionally interact with a central server. Periodically, a sampled subset of devices uploads their local models to the server, which computes an aggregate model. The server can then either (i) share this aggregate model only with the sampled clients (sampled-to-sampled, S2S) or (ii) broadcast it to all clients (sampled-to-all, S2A). Despite their practical significance, a rigorous theoretical and empirical comparison of these two strategies remains absent.
We address this gap by analyzing S2S and S2A within a unified convergence framework that accounts for key system parameters: sampling rate, server aggregation frequency, and network connectivity. Our results---both analytical and experimental---reveal distinct regimes where one strategy outperforms the other, depending primarily on the degree of data heterogeneity across devices. These insights lead to concrete design guidelines for practical semi-decentralized FL deployments.

A Unified Convergence Analysis for Semi-Decentralized Learning: Sampled-to-Sampled vs. Sampled-to-All Communication

Prompt-driven scene synthesis allows users to generate complete 3D environments from textual descriptions. Current text-to-scene methods often struggle with complex geometries and object transformations, and tend to show weak adherence to complex instructions. We address these limitations by introducing Reason-3D, a text-to-scene model powered by large reasoning models (LRMs). Reason-3D integrates object retrieval using captions covering physical, functional, and contextual attributes. Reason-3D then places the selected objects based on implicit and explicit layout constraints, and refines their positions with collision-aware spatial reasoning. Evaluated on instructions ranging from simple to complex indoor configurations, Reason-3D significantly outperforms previous methods in human-rated visual fidelity, adherence to constraints, and asset retrieval quality. Beyond its contribution to the field of text-to-scene generation, our work showcases the advanced spatial reasoning abilities of modern LRMs. Additionally, we release the codebase to further the research in object retrieval and placement with LRMs.

Text-to-Scene with Large Reasoning Models

Probabilistic Circuits (PCs) are a class of generative models that allow exact and tractable inference for diverse queries. While recent developments have enabled the learning of deep and expressive PCs, this increased capacity can often lead to overfitting, especially when data is limited. In this work, we analyze PC overfitting from a log-likelihood-landscape perspective and show that it is often caused by convergence to $\textit{sharp optima}$ that generalize poorly. Inspired by sharpness aware minimization in neural networks, we propose a Hessian-based regularizer for training PCs. As a key contribution, we show that the trace of the Hessian of the log-likelihood—a sharpness proxy that is typically intractable in deep neural networks- can be computed efficiently for PCs. Minimizing this Hessian trace induces a natural gradient-norm-based regularizer that yields simple closed-form parameter updates for EM, and integrates seamlessly with gradient based methods. Experiments on synthetic and real-world PC benchmarks demonstrate that our method consistently guides PCs toward flatter minima, improving generalization over standard training.

Tractable Sharpness-Aware Learning of Probabilistic Circuits

Identity-preserving models have led to notable progress in generating personalized content. Unfortunately, such models also exacerbate risks when misused, for instance, by generating threatening content targeting specific individuals. This paper introduces the Attribute Misbinding Attack, a novel method that poses a threat to identity-preserving models by inducing them to produce Not-Safe-For-Work (NSFW) content. The attack's core idea involves crafting benign-looking textual prompts to circumvent text-filter safeguards and leverage a key model vulnerability: flawed attribute binding that stems from its internal attention bias. This results in misattributing harmful descriptions to a target identity and generating NSFW outputs. To facilitate the study of this attack, we present the Misbinding Prompt evaluation set, which examines the content generation risks of current state-of-the-art identity-preserving models across four risk dimensions: pornography, violence, discrimination, and illegality. Additionally, we introduce the Attribute Binding Safety Score (ABSS), a metric for concurrently assessing both content fidelity and safety compliance. Experimental results show that our Misbinding Prompt evaluation set achieves a 5.28 \% higher success rate in bypassing five leading text filters (including GPT-4o) compared to existing main-stream evaluation sets, while also demonstrating the highest proportion of NSFW content generation. The proposed ABSS metric enables a more comprehensive evaluation of identity-preserving models by concurrently assessing both content fidelity and safety compliance. The dataset and code will be open-sourced. For further experimental data and anonymous open-source links, please see the appendix.

Disclaimer: This paper contains NSFW imagery that might be offensive to some readers.

Downloads

Next from AAAI 2026

GloTok: Global Perspective Tokenizer for Image Reconstruction and Generation

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES