Singapore

The success of large language models (LLMs) in cognitive tasks prompts the question of whether their next-token prediction (NTP) paradigm can be adapted to model physiological signals from wearable devices. A key target for this adaptation is photoplethysmography (PPG), the most prevalent sensing modality in consumer wearables for non-invasive monitoring of diverse physiological conditions. Unlike in NLP, where NTP aligns with generative objectives, physiological signal analysis involves fundamentally different tasks, such as continuous parameter estimation (regression) and discrete state recognition (classification). This disparity creates a semantic mismatch between the pre-training paradigm and the downstream tasks. To bridge this gap, we propose PPGPT, the first foundation model that reformulates NTP into next-feature token prediction (NFTP), learning hierarchical feature transition probabilities to unify pre-training and downstream objectives. PPGPT features a novel dual-stream encoder that generates feature tokens by jointly modeling temporal dynamics and local-global morphological patterns. The model is developed using a two-stage training framework: it is first pre-trained on a large-scale mixed dataset of 1.6 billion data points and then validated on our newly released BioMTL benchmark, which includes data from 172 subjects over 285 days across seven different tasks. Extensive experiments show that PPGPT significantly outperforms competing methods, achieving a 16.5\% improvement in F1-score and a 25.9\% reduction in Mean Absolute Error (MAE). Furthermore, the model demonstrates robust few-shot learning capabilities.

AAAI 2026

PPGPT: Transferring Next-Token Modeling from Language to PPG Signals

ml: deep neural architectures and foundation models

ml: time-series/data streams

ml: classification and regression

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Retrieval-Augmented Generation (RAG) effectively enhances Large Language Models (LLMs) by incorporating retrieved external knowledge into the generation process. 
Reasoning models improve LLM performance in multi-hop QA tasks, which require integrating and reasoning over multiple pieces of evidence across different documents to answer a complex question. 
However, they often introduce substantial computational costs, including increased token consumption and inference latency. 
To better understand and mitigate this trade-off, we conduct a comprehensive study of reasoning strategies for reasoning models in RAG multi-hop QA tasks. Our findings reveal that reasoning models adopt structured strategies to integrate retrieved and internal knowledge, primarily following two modes: Context-Grounded Reasoning, which relies directly on retrieved content, and Knowledge-Reconciled Reasoning, which resolves conflicts or gaps using internal knowledge. 
To this end, we propose a novel Lightweight Rerank Reasoning Strategy Framework for RAG (LiR$^3$AG) to enable non-reasoning models to transfer reasoning strategies by restructuring retrieved evidence into coherent reasoning chains. 
LiR$^3$AG significantly reduce the average 98\% output tokens overhead and 58.6\% inferencing time while improving 8B non-reasoning model's F1 performance ranging from 6.2\% to 22.5\% to surpass the performance of 32B reasoning model in RAG, offering a practical and efficient path forward for RAG systems.

LiR3AG: A Lightweight Rerank Reasoning Strategy Framework for Retrieval-Augmented Generation

Deep Unrolling Networks (DUNs) integrate classical optimization recovery problems in Compressed Sensing (CS) with sophisticated deep learning network architectures, leading to substantial breakthroughs. However, prevailing DUNs generally face challenges concerning solidified gradient descent step size strategies, inadequate feature extraction within the iterative stage and limited information interaction between iterative stages. To overcome these obstacles, we propose SCU-Net, a channel-focused unrolling network inspired by the renowned spectral projected gradient optimization algorithm. In particular, we tailore two pivotal components, Barzilai-Borwein-gradient Descent Optimizer (BBDO) and Channel-guided Cross-attention Reconstruction Module (CCRM), to collaboratively undertake the reconstruction task. BBDO leverages a gradient calculation strategy based on BB step size to enhance data fidelity optimization, while CCRM addresses the intricate mapping issue associated with sparse induction, encompassing customized functionalities from Adaptive Channel Interaction Layer (ACIL) and Spatially Augmented Channel-aware Unit (SACU). Among them, ACIL amalgamates convolution operations and channel attention mechanisms to achieve meticulous information screening alongside efficient feature enhancement. SACU introduces dual reinforcement variables to bolster information exchange across different iterative stages, coupled with the optimization of cross-attention to facilitate the modeling of long-distance dependencies. Extensive experiments in both image CS and magnetic resonance imaging exhibit that our SCU-Net manifests superior performance, surpassing state-of-the-art methods.

Spectrally Adaptive Channel-aware Unrolling Network for Compressed Sensing

While Semi-asynchronous federated learning (SAFL) combines the efficiency of synchronous training with the flexibility of asynchronous updates, it inherently suffers from participation bias, which is further exacerbated by non-IID data distributions. More importantly, hierarchical architecture shifts participation from individual clients to client groups, thereby further intensifying this issue. Despite notable advancements in SAFL research, most existing works still focus on conventional cloud-end architectures while largely overlooking the critical impact of non-IID data on scheduling across the cloud–edge–client hierarchy. To tackle these challenges, we propose FedCure, a innovative semi-asynchronous Federated learning framework that leverages coalition construction and participation-aware scheduling to mitigate participation bias with non-IID data. Specifically, FedCure operates through three key rules: (1) a preference rule that optimizes coalition formation by maximizing collective benefits and establishing theoretically stable partitions to reduce non-IID-induced performance degradation; (2) a scheduling rule that integrates the virtual queue technique with Bayesian-estimated coalition dynamics, mitigating efficiency loss while ensuring mean rate stability; and (3) a resource allocation rule that enhances computational efficiency by optimizing client CPU frequencies based on estimated coalition dynamics while satisfying delay requirements. Comprehensive experiments on four real-world datasets demonstrate that FedCure improves accuracy by up to 5.1x compared with four state-of-the-art baselines, while significantly enhancing efficiency with the lowest coefficient of variation 0.0223 for per-round latency and maintaining long-term balance across diverse scenarios.

FedCure: Mitigating Participation Bias in Semi-Asynchronous Federated Learning with Non-IID Data

Knowledge distillation (KD) has proven highly effective for compressing large models and enhancing the performance of smaller ones. However, its effectiveness diminishes in cross-modal scenarios, such as vision-to-language distillation, where inconsistencies in representation across modalities lead to difficult knowledge transfer. To address this challenge, we propose frequency-decoupled cross-modal knowledge distillation, a method designed to decouple and balance knowledge transfer across modalities by leveraging frequency-domain features. We observed that low-frequency features exhibit high consistency across different modalities, whereas high-frequency features demonstrate extremely low cross-modal similarity. Accordingly, we apply distinct losses to these features: enforcing strong alignment in the low-frequency domain and introducing relaxed alignment for high-frequency features. We also propose a scale consistency loss to address distributional shifts between modalities, and employ a shared classifier to unify feature spaces. Extensive experiments across multiple benchmark datasets show our method substantially outperforms traditional KD and state-of-the-art cross-modal KD approaches. Our code is available at: https://github.com/Johumliu/FD-CMKD.

Distilling Cross-Modal Knowledge via Feature Disentanglement

This paper proposes SR-KI, a novel approach for integrating real-time and large-scale structured knowledge bases (KBs) into large language models (LLMs). SR-KI begins by encoding KBs into key-value pairs using a pretrained encoder, and injects them into LLMs' KV cache. Building on this representation, we employ a two-stage training paradigm: first locating a dedicated retrieval layer within the LLM, and then applying an attention-based loss at this layer to explicitly supervise attention toward relevant KB entries. Unlike traditional retrieval-augmented generation methods that rely heavily on the performance of external retrievers and multi-stage pipelines, SR-KI supports end-to-end inference by performing retrieval entirely within the model’s latent space. This design enables efficient compression of injected knowledge and facilitates dynamic knowledge updates. Comprehensive experiments demonstrate that SR-KI enables the integration of up to 40K KBs into a 7B LLM on a single A100 40GB GPU, and achieves strong retrieval performance—maintaining over 98% Recall@10 on the best-performing task and exceeding 88% on average across all tasks. Task performance on question answering and KB ID generation also demonstrates that SR-KI maintains strong performance while achieving up to 99.75% compression of the injected KBs.

SR-KI: Scalable and Real-Time Knowledge Integration into LLMs via Supervised Attention

Temporal Graph Neural Network (TGNN) explanation has attracted increasing attention due to its applicability in dynamic scenarios such as recommendation systems. However, existing explanation methods for TGNNs face two key limitations: (1) computational inefficiency and (2) a restricted focus on either factual or counterfactual explanations, but not both. In this paper, we propose TGX-QIEA, an efficient and unified explanation algorithm based on a quantum-inspired evolutionary algorithm. TGX-QIEA effectively generates explanatory subgraphs that significantly influence TGNN predictions, without requiring additional model training or extensive inference. Experimental results on real-world datasets demonstrate that TGX-QIEA improves explanation fidelity by up to 31\% while reducing computation time by up to 92\% compared to state-of-the-art baselines.

Explaining Temporal Graph Neural Network via Quantum-Inspired Evolutionary Algorithm

This paper introduces Conformal Interquantile Regression (CIR), a novel conformal regression method designed to rapidly produce the smallest possible prediction intervals with guaranteed coverage. CIR employs black-box machine learning models to directly estimate outcome distributions through interquantile ranges and then converts these estimates into concise prediction intervals, achieving approximate conditional coverage. Base on CIR, we also introduce a variant, Conditional Interquantile Regression with More Comparation (CIR+), which incorporates an additional decision mechanism that evaluates whether to retain or discard a specific interquantile interval based on its length. The additional step in CIR+ results in slightly narrower prediction set widths while maintaining comparable coverage performance. Both of methods solve two main problems found in other distributional conformal prediction methods: they work well with skewed data, which is challenging for methods like Conformalized Quantile Regression, and they are computationally far more efficient than Conformal Histogram Regression by avoiding the histogram construction process. Empirical studies using both synthetic and real-world datasets demonstrate that our methods achieve the best balance between predictive performance and computational efficiency compared to other approaches.

Fast Conformal Prediction Using Conditional Interquantile Intervals

Learning diagnosis is a critical task that monitors students' cognitive state during educational activities, with the goal of enhancing learning outcomes. With advancements in language models (LMs), many AI-driven educational studies have shifted towards conversational learning scenarios, where students engage in multi-turn interactive dialogues with tutors. However, conversational learning diagnosis remains underdeveloped, and most existing techniques acquire students' cognitive state through intuitive instructional prompts on LMs to analyze the dialogue text. This direct prompting approach lacks a solid psychological foundation and fails to ensure the reliability of the generated analytical text. In this study, we introduce ParLD, a preview-analyze-reason framework for conversational learning diagnosis, which leverages multi-agent collaboration to diagnose students' cognitive state over multiple dialogue turns. Specifically, ParLD comprises main components: (1) Behaviour Previewer, which generates a student behavior schema based on previous states and learning content; (2) State Analyzer, which diagnose the tutor-student dialogue and behavior schema to update the cognitive state; and (3) Performance Reasoner, which predicts the student's future responses and provides verifiable feedback to support ParLD's self-reflection with the Chain reflector. They operate sequentially and iteratively during each interaction turn to diagnose the student’s cognitive state. We conduct experiments to evaluate both performance prediction and tutoring support, emphasizing the effectiveness of ParLD in providing reliable and insightful learning diagnosis. Code is available at \url{https://anonymous.4open.science/status/ParLD-67D6}.

Conversational Learning Diagnosis via Reasoning Multi-Turn Interactive Learning

Fine-grained Visual Recognition (FGVR) aims to distinguish between categories with subtle inter-class differences and large intra-class variations. While Vision Transformers with attention mechanisms have been widely adopted for FGVR, they usually suffer from high computational complexity and entangled global representations. Recent advancements in state-space models, exemplified by Mamba, have showcased substantial potential in vision-related tasks due to their linear scalability and rich sequence modeling capacity. To this end, we propose DHMamba, a novel Mamba based FGVR method. The proposed method leverages hypergraph to guide selective scanning and strengthen Mamba’s capability in modeling fine-grained semantics. Furthermore, a Disentangled Local Scanning (DLS) module is introduced to utilize hyperedges to allocate distinct informative patches into independent channels for mitigating the representational entanglement. Extensive experiments conducted on multiple FGVR benchmarks demonstrate that the proposed DHMamba outperforms the state-of-the-art methods, validating the efficacy of combining state-space modeling with hypergraph-based feature structuring.

Disentangled Hypergraph-Guided Mamba Scanning for Fine-Grained Visual Recognition

This paper presents an investigation of vision transformer learning for multi-view geometry tasks, such as optical flow estimation, by fine-tuning video foundation models. Unlike previous methods that involve custom architectural designs and task-specific pretraining, our research finds that general-purpose models pretrained on videos can be readily transferred to multi-view problems with minimal adaptation. The core insight is that general-purpose attention between patches learns temporal and spatial information for geometric reasoning. We demonstrate that appending a linear decoder to the Transformer backbone produces satisfactory results, and iterative refinement can further elevate performance to state-of-the-art levels. This conceptually simple approach achieves top cross-dataset generalization results for optical flow estimation with end-point error (EPE) of 0.69, 1.78, and 3.15 on the Sintel clean, Sintel final, and KITTI datasets, respectively. Our method additionally establishes a new record on the online test benchmark with EPE values of 0.79, 1.88, and F1 value of 3.79. Applications to 3D depth estimation and stereo matching also show strong performance, illustrating the versatility of video-pretrained models in addressing geometric vision tasks.

Downloads

Next from AAAI 2026

LiR3AG: A Lightweight Rerank Reasoning Strategy Framework for Retrieval-Augmented Generation

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

LiR3AG: A Lightweight Rerank Reasoning Strategy Framework for Retrieval-Augmented Generation

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads