Singapore

With the growing number of submitted scientific papers, there is an increasing demand for systems that can assist reviewers in evaluating research claims. Experimental results are a core component of scientific work, often presented in varying formats such as tables or charts. Understanding how robust current multimodal large language models (multimodal LLMs) are at verifying scientific claims across different evidence formats remains an important and underexplored challenge.
In this paper, we design and conduct a series of experiments to assess the ability of multimodal LLMs to verify scientific claims using both tables and charts as evidence. To enable this evaluation, we adapt two existing datasets of scientific papers by incorporating annotations and structures necessary for a multimodal claim verification task. 
Using this adapted dataset, we evaluate 12 multimodal LLMs and find that current models perform better with table-based evidence while struggling with chart-based evidence.
We further conduct human evaluations and observe that humans maintain strong performance across both formats, unlike the models. 
Our analysis also reveals that smaller multimodal LLMs (under 8B) show weak correlation in performance between table-based and chart-based tasks, indicating limited cross-modal generalization. 
These findings highlight a critical gap in current models&#39; multimodal reasoning capabilities. We suggest that future multimodal LLMs should place greater emphasis on improving chart understanding to better support scientific claim verification.

AAAI 2026

Format Matters: The Robustness of Multimodal LLMs in Reviewing Evidence from Tables and Charts

fact-checking / misinformation detection (nlp focus)

large multimodal models (lmms)

evaluation and analysis

With the growing number of submitted scientific papers, there is an increasing demand for systems that can assist reviewers in evaluating research claims. Experimental results are a core component of scientific work, often presented in varying formats such as tables or charts. Understanding how robust current multimodal large language models (multimodal LLMs) are at verifying scientific claims across different evidence formats remains an important and underexplored challenge.
In this paper, we design and conduct a series of experiments to assess the ability of multimodal LLMs to verify scientific claims using both tables and charts as evidence. To enable this evaluation, we adapt two existing datasets of scientific papers by incorporating annotations and structures necessary for a multimodal claim verification task. 
Using this adapted dataset, we evaluate 12 multimodal LLMs and find that current models perform better with table-based evidence while struggling with chart-based evidence.
We further conduct human evaluations and observe that humans maintain strong performance across both formats, unlike the models. 
Our analysis also reveals that smaller multimodal LLMs (under 8B) show weak correlation in performance between table-based and chart-based tasks, indicating limited cross-modal generalization. 
These findings highlight a critical gap in current models' multimodal reasoning capabilities. We suggest that future multimodal LLMs should place greater emphasis on improving chart understanding to better support scientific claim verification.

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Neural network constraint satisfaction is crucial for safety-critical applications such as power system optimization, robotic path planning, and autonomous driving. However, existing constraint satisfaction methods face efficiency-applicability trade-offs, with hard constraint methods suffering from either high computational complexity or restrictive assumptions on constraint structures. The Sampling Kaczmarz-Motzkin (SKM) method is a randomized iterative algorithm for solving large-scale linear inequality systems with favorable convergence properties, but its argmax operations introduce non-differentiability, posing challenges for neural network applications. This work represents the first application of SKM-type methods to neural network constraint satisfaction and proposes Trainable Sampling Kaczmarz-Motzkin Network (T-SKM-Net). The framework transforms mixed constraint problems into pure inequality problems through null space transformation, employs SKM for iterative solving, and maps solutions back to the original constraint space, efficiently handling both equality and inequality constraints. We provide theoretical proof of post-processing effectiveness in expectation and end-to-end trainability guarantees based on unbiased gradient estimators, demonstrating that despite non-differentiable operations, the framework supports standard backpropagation. On the DCOPF case118 benchmark, our method achieves up to 9.87ms/item CPU serial forward inference with only 0.177\% average optimality gap, delivering over $10\times$ speedup compared to the pandapower solver while maintaining zero constraint violations under given tolerance.

T-SKM-Net: Trainable Neural Network Framework for Linear Constraint Satisfaction via Sampling Kaczmarz-Motzkin Method

With the increasing number of items requiring handling simultaneously in complex logistics, offline three-dimensional packing methods need to plan larger numbers of items. Existing deep reinforcement learning (DRL)-based packing methods cannot plan for large numbers of items while keeping high-quality solutions due to limited exploration space and high computational complexity. To address this issue, this paper proposes a scalable DRL-based packing method. An attention-based pack-Q-network (PQNet) is constructed to learn the optimal packing policy by integrating unpacked items, available spaces, and packed items. To expand the valid exploration space, a bidding-based multi-policy (BBMP) framework composed of multiple PQNets is designed to efficiently explore more latent valid solutions, thus enhancing solution quality. To reduce computational complexity, a training-free dynamic candidate selection (DCS) framework is proposed to incorporate comprehensive item information during execution with minimal computation overhead, which helps in effectively planning large numbers of items. Experimental results show that across item numbers of 20$\sim$1000, our method consistently outperforms the best-performing baseline at each tested scale by 3.2\%$\sim$13.1\% in space utilization.

Deep Reinforcement Learning for Scalable Offline Three-Dimensional Packing

We propose a physics-informed learning framework, called Koopman-PINN, to estimate the parameters of the Heston stochastic volatility model with high-frequency price data in financial markets. The method integrates a nonparametric volatility estimation (known as ART-filter in the literature), moment-based parameter initialization, and a neural Koopman operator constrained by the infinitesimal generator of the underlying stochastic differential equation. By incorporating a generator-based loss, the model bridges Koopman theory and neural modeling to handle partially observed coupled stochastic dynamics in a manner consistent with continuous-time evolution. Across diverse parameter combinations reflecting varying market conditions, Koopman-PINN consistently achieves accurate and robust five-parameter recovery, outperforming existing estimators under a minimal set of initialization assumptions.

Physics-Informed Koopman Neural Estimation of the Heston Model from High-Frequency Observations

Hallucination in Large Vision-Language Models (LVLMs) remains a critical challenge, undermining their reliability in real-world applications. Existing studies have investigated the causes of hallucination at the modality level and proposed effective strategies. However, interaction patterns beyond the modality level remain insufficiently explored. In this paper, we conduct a token-level analysis and identify two key phenomena: (1) a small subset of textual tokens in LVLMs exert disproportionate influence in the visual-active layers, surpassing that of the visual modality and potentially misleading visual understanding; (2) while LVLMs can correctly identify key visual information, insufficient focus on these cues can sometimes lead to hallucinations. Based on such observation, we attribute hallucinations in LVLMs to two token-level causes: the disproportionate influence of certain textual tokens (phantom tokens) and the underutilization of critical visual cues (anchor tokens). To mitigate these issues, we introduce Token-Asymmetric Filtering (TAF)—a training-free, plug-and-play method that modulates intermediate attention maps in LVLMs. TAF isolates the influence of phantom tokens and emphasizes the influence of anchor tokens in the visual-active layers. Experimental results across multiple benchmarks demonstrate that TAF significantly mitigates hallucinations across a range of state-of-the-art LVLMs. The code will be released.

Taming the Phantom: Token-Asymmetric Filtering for Hallucination Mitigation in Large Vision-Language Models

Graph Contrastive Learning (GCL) has proven effective in mitigating data sparsity and enhancing representation learning for recommendation. Yet, most GCL frameworks indiscriminately treat all non-anchor nodes as negatives during contrastive sampling, often leading to the false negative problem where semantically similar nodes are incorrectly repelled. Previous attempts to mitigate this issue rely on predetermined heuristics or local neighborhood mining, which struggle to reliably identify false negatives. More critically, they often overlook authentic user-item interactions for anchoring sample relationships. As a result, this paper presents MACRec, a Multi-View subspace-Alignment framework designed to Calibrate contrastive sampling in GCLbased Recommendation. MACRec comprises three core components: (1) a Multi-View Affinity (MVA) module that captures consistent semantic relations across multiple augmentations via self-expression modeling; (2) a Cross-Subspace Alignment (CSA) mechanism that leverages authentic useritem behavioral interactions to enforce semantic consistency across user and item subspaces; and (3) a Calibrationbased Contrastive Reweighting (CCR) strategy to dynamically down-weight potential false negatives during the contrastive learning process. Extensive experiments on three realworld benchmarks demonstrate that MACRec consistently improves performance across various augmentation backbones, achieving up to 14.55% relative gains.

MACRec: A Multi-View Subspace Alignment Framework for Contrastive Sampling Calibration in Recommendation

Formal verification has emerged as a promising method to ensure the safety and reliability of neural networks.
However, many relevant properties, such as fairness or global robustness, pertain to the entire input space. If one applies verification techniques naively, the neural network is checked even on inputs that do not occur in the real world and have no meaning.
To tackle this shortcoming, we propose the VeriFlow architecture as a flow-based density model tailored to allow any verification approach to restrict its search to some data distribution of interest.
We argue that our architecture is particularly well suited for this purpose because of two major properties. 
First, we show that the transformation that is defined by our model is piecewise affine. Therefore, the model allows the usage of verifiers based on constraint solving with linear arithmetic.
Second, upper density level sets (UDL) of the data distribution are definable via linear constraints in the latent space. As a consequence, representations of UDLs specified by a given probability are effectively computable in the latent space. This property allows for effective verification with a fine-grained, probabilistically interpretable control of how (a-)typical the inputs subject to verification are.

VeriFlow: Modeling Distributions for Neural Network Verification

Equitability is a well-studied fairness notion in fair division, where an allocation is equitable if all agents receive equal utility from their allocation. For indivisible items, an exactly equitable allocation may not exist, hence, a natural relaxation is EQ1, which stipulates that any inequitability should be resolved by the removal of a single item. In this paper, we study equitability in the context of randomized allocations. Specifically, we aim to achieve equitability in expectation (ex ante EQ) and require that each deterministic outcome in the support satisfies ex post EQ1. Such an allocation is commonly known as a `Best of Both Worlds' allocation, and has been studied, e.g., for envy-freeness and MMS.

We characterize the existence of such allocations using a geometric condition on convex combinations of allocations, and use this to give comprehensive results on both existence and computation. For two agents, we show that ex ante EQ and ex post EQ1 allocations always exist and can be computed in polynomial time. For three or more agents, however, such allocations may not exist. We prove that deciding existence of such allocations is strongly NP-complete in general, and weakly NP-complete even for three agents. We also present a pseudo-polynomial time algorithm for a constant number of agents. Additionally, we show that when agents have binary valuations, best of both worlds allocations that additionally satisfy welfare guarantees exist and are efficiently computable.

Best of Both Worlds Guarantees for Equitable Allocations

Beyond user-item modeling, item-to-item relationships are increasingly used to enhance recommendation. However, common methods largely rely on co-occurrence, making them prone to item popularity bias and user attributes, which degrades embedding quality and performance. Meanwhile, although diversity is acknowledged as a key aspect of recommendation quality, existing research offers limited attention to it, with a notable lack of causal perspectives and theoretical grounding. To address these challenges, we propose Cadence: Diversity Recommendation via Causal Deconfounding of Co-purchase Relations and Counterfactual Exposure—a plug-and-play framework built upon LightGCN as the backbone, primarily designed to enhance recommendation diversity while preserving accuracy. First, we compute the Unbiased Asymmetric Co-purchase Relationship (UACR) between items—excluding item popularity and user attributes—to construct a deconfounded directed item graph, with an aggregation mechanism to refine embeddings. Second, we leverage UACR to identify diverse categories of items that exhibit strong causal relevance to a user's interacted items but have not yet been engaged with. We then simulate their behavior under high-exposure scenarios, thereby significantly enhancing recommendation diversity while preserving relevance. Extensive experiments on real-world datasets demonstrate that our method consistently outperforms state-of-the-art diversity models in both diversity and accuracy, and further validates its effectiveness, transferability, and efficiency over baselines.

Diversity Recommendation via Causal Deconfounding of Co-purchase Relations and Counterfactual Exposure

Large language models (LLMs) frequently generate fluent yet factually inaccurate content, a phenomenon known as hallucination. Recent inference-time approaches aim to improve truthfulness by steering model activations toward semantically meaningful directions. While effective to some extent, these methods typically process activations independently, neglecting the internal coordination structure of multi-head attention (MHA), where attention heads interact to form semantic representations. In this work, we propose CoFact, an adaptive inference-time mechanism that improves factual consistency by dynamically coordinating attention head behaviors. Inspired by cooperative game theory, CoFact conceptualizes attention heads as collaborative agents. It models the semantic utility and redundancy of each head and adaptively modulates their contributions to the final attention output. Notably, rather than directly altering intermediate representations, CoFact performs token-level coordination to encourage diverse and complementary attention patterns across heads. CoFact is plug-and-play compatible with mainstream LLM architectures and requires no additional supervision or model retraining. Experimental results across multiple standard factuality benchmarks demonstrate that CoFact consistently enhances factual accuracy while maintaining generation fluency.

CoFact: Dynamic Coordination of Attention Heads for Improving Factual Consistency in LLMs

3D object detection in adverse weather remains a critical challenge for autonomous driving systems, particularly in smoke-obscured environments where sparse and noisy LiDAR measurements degrade perception performance. To address the scarcity of real-world smoke data, this paper proposes a physically-grounded simulation framework to synthesize realistic LiDAR point clouds of smoke and augment large-scale driving datasets for improved perception robustness. First, we present a 3D fluid dynamics-based smoke simulation framework in Unity3D, which models the realistic spatial diffusion and temporal evolution of smoke particles. Coupled with a physically accurate LiDAR perception module, our system captures complex light interactions—such as beam attenuation, scattering, and multi-path effects—to generate high-fidelity, physically consistent smoke point clouds. Second, we propose a range image-based data fusion strategy that seamlessly integrates the simulated smoke point clouds into large-scale real-world LiDAR datasets (e.g., Waymo). This approach accurately emulates LiDAR scanning characteristics and naturally incorporates occlusion effects, enabling realistic smoke integration without compromising spatial consistency. To validate our approach, we collect a real-world LiDAR smoke dataset (LiSmoke) and conduct extensive experiments using state-of-the-art 3D detectors. Results demonstrate that models trained with our augmented synthetic data achieve significant improvements in smoke-affected scenarios, while maintaining competitive performance in clear-weather conditions. Our work provides a cost-effective solution for enhancing perception robustness in safety-critical environments.

Downloads

Next from AAAI 2026

T-SKM-Net: Trainable Neural Network Framework for Linear Constraint Satisfaction via Sampling Kaczmarz-Motzkin Method

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

T-SKM-Net: Trainable Neural Network Framework for Linear Constraint Satisfaction via Sampling Kaczmarz-Motzkin Method

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads