Singapore

Dysarthric speech reconstruction (DSR) aims to enhance the intelligibility of dysarthric speech. Compared with normal speech, the dysarthric speech is characterized by its pathological features, including discontinuous pronunciation, slow speech, hoarseness, and improper pauses. Significant disparities in the feature space between normal and dysarthric speech may result in suboptimal speech reconstruction, thereby degrading speech intelligibility. To enhance the reconstruction ability of speech feature spaces, this paper proposes a DSR model named the Encoding-Aligned Variational Autoencoder (EA-VAE). By incorporating alignment modules of frame-level embedding features, prior distributions, and duration into the encoder of the VAE, the model explicitly aligns the dysarthric speech encoding with a representation of the parallel normal speech. A shared decoder is then used to generate speech with improved intelligibility. Experimental results on the UASpeech benchmark confirm that EA-VAE achieves state-of-the-art performance, with a 31.7% relative word error rate reduction and the highest subjective MOS score (4.48), thoroughly validating the effectiveness and advancements of the proposed method in dysarthric speech reconstruction.

AAAI 2026

EA-VAE: Learning to Reconstruct Dysarthric Speech via Variational Autoencoder with Encoding Alignment

dysarthric speech reconstruction self-supervised learning encoding alignment

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Data-Centric Sequential Recommendation (DaCSR) has emerged as a promising technique that enhances dataset quality to better capture user preferences without increasing training complexity. However, mining item relations to improve data quality remains challenging due to the intricate nature of interaction sequences. Existing methods predominantly either: 1) optimize models to learn such item relations from fixed datasets at significant training cost, or 2) employ generative models to adaptively learn only interaction patterns, which lack interpretability and cannot guarantee effective data quality enhancement. In this paper, we pioneer a relation-guided dataset augmentation and regeneration framework for sequential recommendation called \textbf{RaSR}. This framework can significantly improve model performance on original datasets while maintaining training efficiency without modifying the model architecture. Specifically, we first preprocess user interactions to construct standardized sequential data and extract semantic representations via a Large Language Model (LLM). We then build a multi-relation graph with manually predefined metrics and semantic representations to generate augmented datasets. Finally, a relation-aware generator can produce regenerated datasets with both the multi-relation graph and the augmented dataset. To verify the effectiveness of RaSR, we conduct experiments on various backbone models and datasets, and achieve significant performance improvement compared to training the model only on the original dataset. The anonymous code is available at: \url{https://anonymous.4open.science/r/RaSR}.

Data-Centric Sequential Recommendation with Relation-Augmented Generation

Electroencephalography (EEG) plays a vital role in clinical and cognitive applications such as epilepsy diagnosis and emotion recognition. However, the low signal-to-noise ratio, inter-subject variability, and inherent non-stationarity of EEG signals present substantial modeling challenges. While recent Transformer-based models offer promising long-range modeling capabilities, their self-attention mechanism behaves as a low-pass filter, suppressing high-frequency neural patterns critical for decoding transient events. In this work, we provide the first formal analysis demonstrating this low-pass behavior in self-attention mechanisms when applied to EEG signals, revealing a fundamental limitation of deep attention-based EEG models. To address this, we propose SEBSFormer, a spectral-enhanced bi-Stream Transformer that jointly models temporal dependencies and spectral structures. SEBSFormer integrates three key modules: a spectral compensation module that restores high-frequency components via residual correction in the Fourier domain; a multi-scale temporal attention module for saliency-guided temporal compression; and a graph-guided dynamic fusion module for adaptive spatial aggregation across electrodes. Extensive experiments on three benchmark datasets—TUAB, TUEV, and SEED—demonstrate that SEBSFormer consistently outperforms existing state-of-the-art models across both clinical and affective tasks. Our findings establish a new paradigm for frequency-aware EEG modeling.

SEBSFormer: A Spectral-Enhanced Bi-Stream Transformer for Robust EEG Decoding

The maximally diverse grouping problem (MDGP) seeks to partition the vertices of a complete graph into a fixed number of groups under capacity constraints, maximizing the sum of edge weights within each group. MDGP is an NP-hard combinatorial optimization problem and has wide real-world applications. In this paper, we propose an adaptive configuration-aware simulated annealing (ACSA) algorithm to solve MDGP. First, ACSA adopts a relaxation-based insertion strategy, which temporarily relaxes capacity constraints to expand the neighborhood and allow effective exploration of promising regions. Second, a memory-based swap mechanism is introduced to integrate high-potential suboptimal swap moves into the conventional best-swap operation, thereby achieving a better balance between diversification and intensification of the search. Finally, ACSA employs a vertex-wise sequential coordination strategy to dynamically organize the insertion and swap moves, which enhances the search flexibility. Experiments on 500 benchmark instances demonstrate the strong competitiveness of ACSA, as it improves the best results among the state-of-the-art algorithms on 460 instances and matches them on 39 instances.

An Adaptive Configuration-Aware Simulated Annealing for the Maximally Diverse Grouping Problem

We present a class of stochastic interventions for causal inference with discrete treatments defined through the marginals of the solution to a cost-penalized Kullback–Leibler projection problem. This formulation seeks a stochastic plan that minimizes the expected cost of reallocating treatments while penalizing the divergence from the independent product of the organic propensity scores and a prespecified target distribution. This problem arises as a limiting case of a relaxed optimal transport problem without marginal constraints. The resulting source marginal yields a generalization of incremental propensity score interventions (IPIs), accommodating arbitrary target distributions and cost structures. Like classical IPIs, these interventions are governed by a tunable parameter and do not require the positivity assumption for identification. We derive the influence functions for the corresponding expected outcomes and develop semiparametric estimators that remain robust under model misspecification, as demonstrated in a simulation study. We illustrate the practical utility of these methods by analyzing policies for ADHD treatment and their impact on children’s academic achievement.

Interpolated Stochastic Interventions Based on Propensity Scores, Target Policies and Treatment-Specific Costs

In voting with ranked ballots each agent submits a strict ranking of the form $a \succ b \succ c \succ d$ over the alternatives, and the voting rule decides on the winner based on these rankings. Although this ballot format has desirable characteristics, there is a question of whether it is expressive enough for the agents. Kahng et. al. address this issue by adding intensities to the rankings. They introduce ranking with intensities ballot format, where agents can use both $\succ\!\succ$ and $\succ$ in their rankings to express intensive and normal preferences between consecutive alternatives in their rankings.
While Kahng et. al. focus on analyzing this ballot format in the utilitarian distortion framework, in this work, we look at the potentials of using this ballot format from the metric distortion view point. We design a class of voting rules coined Positional Scoring Rules, which can be used for different problems in the metric setting, and show that by solving a zero-sum game we can find the optimal member of this class for our problem. This rule takes intensities into account and achieves a lower distortion. In addition, by proving a bound on the price of ignoring intensities, we show that we might lose a great deal in terms of distortion by not taking the intensities into account.

Metric Distortion with Preference Intensities

The rise of powerful generative models has sparked concerns over data authenticity. While detection methods have been extensively developed for images and text, the case of tabular data, despite its ubiquity, has been largely overlooked. Yet, detecting synthetic tabular data is especially challenging due to its heterogeneous structure and unseen formats at test time.
We address the underexplored task of detecting synthetic tabular data in the wild, where tables have variable and previously unseen schemas. We introduce a novel datum-wise transformer architecture that significantly outperforms the only previously published baseline, improving both AUC and accuracy by 7 points. By incorporating a table-adaptation component, our model gains an additional 7 accuracy points, demonstrating enhanced robustness. This work provides the first strong evidence that detecting synthetic tabular data in real-world conditions is not only feasible, but can be done with high reliability.

Robust Detection of Synthetic Tabular Data Under Schema Variability

Generating co-speech gestures in real time requires both temporal coherence and efficient sampling. We introduce a novel framework for streaming gesture generation that extends Rolling Diffusion models with structured progressive noise scheduling, enabling seamless long-sequence motion synthesis while preserving realism and diversity. Our framework is universally compatible with existing diffusion-based gesture generation model, transforming them into streaming methods capable of continuous generation without requiring post-processing. We evaluate our framework on ZEGGS and BEAT, strong benchmarks for real-world applicability. Applied to state-of-the-art baselines on both datasets, it consistently outperforms them, demonstrating its effectiveness as a generalizable and efficient solution for real-time, high-fidelity co-speech gesture synthesis. We further propose Rolling Diffusion Ladder Acceleration (RDLA), a new approach that employs a ladder-based noise scheduling strategy to simultaneously denoise multiple frames. This significantly improves sampling efficiency while maintaining motion consistency, achieving up to a 4× speedup with high visual fidelity and temporal coherence in our experiments. Comprehensive user studies further validate our framework’s ability to generate realistic, diverse gestures closely synchronized with the audio input.

Streaming Generation of Co-Speech Gestures via Accelerated Rolling Diffusion

We propose a new General Game Playing (GGP) system called Regular Games (RG).
The main goal of RG is to be both computationally efficient and convenient for game design.
The system consists of several languages.
The core component is a low-level language that defines the rules by a finite automaton.
It is minimal with only a few mechanisms, which makes it easy for automatic processing (by agents, analysis, optimization, etc.).
The language is universal for the class of all finite turn-based games with imperfect information.
Higher-level languages are introduced for game design (by humans or Procedural Content Generation), which are eventually translated to a low-level language.
RG generates faster forward models than the current state of the art, beating other GGP systems (Regular Boardgames, Ludii) in terms of efficiency.
Additionally, RG's ecosystem includes an editor with LSP, automaton visualization, benchmarking tools, and a debugger of game description transformations.

Regular Games -- an Automata-Based General Game Playing Language

Although current large audio language models (LALMs) extend text large language models (LLMs) with generic acoustic understanding abilities, they usually suffer from instruction sensitivity, where different instructions of the same intention can yield drastically different outcomes. In this work, we propose AHAMask, where we simply mask some of the attention heads in the decoder-only LLM backbone of LALMs, to trigger specific acoustic task functionalities without instructions. These masks are efficiently obtained by training on an LALM, with the number of trainable parameters equal to the attention head count in its LLM backbone. We show by experiments that applying such selective attention head masks achieves comparable or even better performance than using instructions, either on single or composite tasks. Besides achieving reliable acoustic task specification for LALMs, this also reveals that LALMs exhibit certain ``functional pathways'' in their attention heads. We will open-source the codebase after the anonymity period.

AHAMask: Reliable Task Specification for Large Audio Language Models Without Instructions

Time Series forecasting (TSF) in the modern era faces significant computational and storage cost challenges due to the massive scale of real-world data. Dataset Distillation (DD), a paradigm that synthesizes a small, compact dataset to achieve training performance comparable to that of the original dataset, has emerged as a promising solution. However, conventional DD methods are not tailored for time series and suffer from architectural overfitting and limited scalability. To address these issues, we propose Harmonic Dataset Distillation for Time Series Forecasting (HDT). HDT decomposes the time series into its sinusoidal basis through the FFT and aligns the core periodic structure by Harmonic Matching. Since this process operates in the frequency domain, all updates during distillation are applied globally without disrupting temporal dependencies of time series. Extensive experiments demonstrate that HDT achieves strong cross-architecture generalization and scalability, validating its practicality for large-scale, real-world applications.

Downloads

Next from AAAI 2026

Data-Centric Sequential Recommendation with Relation-Augmented Generation

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

Data-Centric Sequential Recommendation with Relation-Augmented Generation

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads