United States

In the era of costly pre-training of large language models, ensuring the intellectual property rights of model owners, and insuring that said models are responsibly deployed, is becoming increasingly important. To this end, we propose model watermarking via passthrough layers, which are added to existing pre-trained networks and trained using a self-supervised loss such that the model produces high-entropy output when prompted with a unique private key, and acts normally otherwise. Unlike existing model watermarking methods, our method is fully task-agnostic, and can be applied to both classification and sequence-to-sequence tasks without requiring advanced access to downstream fine-tuning datasets. We evaluate the proposed passthrough layers on a wide range of downstream tasks, and show experimentally our watermarking method achieves a near-perfect watermark extraction accuracy and false-positive rate in most cases without damaging original model performance. Additionally, we show our method is robust to both downstream fine-tuning, fine-pruning, and layer removal attacks, and can be trained in a fraction of the time required to train the original model. Code is available as supplementary material.

AAAI 2025

Task-Agnostic Language Model Watermarking via High Entropy Passthrough Layers

snlp

language models

poster

We are pleased to announce the Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25), which will be held in Philadelphia, Pennsylvania at the Pennsylvania Convention Center from February 25 to March 4, 2025.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.

### [Invited Speakers](https://aaai.org/conference/aaai/aaai-25/aaai-25-invited-speakers/)

Register [here](https://aaai.org/conference/aaai/aaai-25/registration/)

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.



Accurately gauging the confidence level of Large Language Models' (LLMs) predictions is pivotal for their reliable application. However, LLMs are often uncalibrated inherently and elude conventional calibration techniques due to their proprietary nature and massive scale.
In this work, we derive model spirit from the distribution of multiple randomly sampled generations, using three measures of consistency. We extensively evaluate eleven open and closed-source models on nine reasoning datasets. Results show that consistency-based calibration methods outperform existing post-hoc approaches in terms of calibration error. Meanwhile, we find that factors such as intermediate explanations, model scaling, and larger sample sizes enhance calibration, while instruction-tuning makes calibration more difficult. Moreover, confidence scores obtained from consistency can potentially enhance model performance. Finally, we offer guidance on choosing suitable consistency metrics for calibration, tailored to model characteristics such as the exposure to instruction-tuning and RLHF.

Calibrating Large Language Models with Sample Consistency

Model knowledge editing has become a widely researched topic because it enables efficient and rapid injection of new knowledge into language models or the correction of erroneous or outdated knowledge. Existing model knowledge editing methods typically categorized into single-instance sequential editing and massive one-time editing. However, in practical applications, the batched and iterative editing manner better aligns with model updating patterns. In this work, we explored the performance of parameter-update-based models in a new batched iterative editing benchmark. Our findings show that with an increase in the number of editing iterations, the accumulation of updated parameters leads to a greater change in the distribution of model parameters, making it more challenging to maintain editing performance and model stability. To address this degradation issue, we propose two methods: the Wasserstein distance constraint and update parameter sparsification, where the Wasserstein distance constraint optimizes the transition of parameter distribution before and after the editing, and update parameter sparsification significantly reduces the number of update parameters, thereby alleviating the issue of instability in the parameter distribution caused by the accumulation of update parameters through iterations. Our methods can be generally applied to different parameter-update-based knowledge editing models. Experiments on the zsRE and CounterFact datasets demonstrate that our methods can improve editing performance and enhance the later-stage stability of batched iterative editing across different models.

Wasserstein Distance Constraint and Parameter Sparsification for Batched and Iterative Knowledge Editing

Test Time Adaptation (TTA) addresses the problem of distribution shift by adapting a pretrained model to a new domain during inference. When faced with challenging shifts, most methods collapse and perform worse than the original pretrained model. In this paper, we find that not all layers are equally receptive to the adaptation, and the layers with the most misaligned gradients often cause performance degradation. To address this, we propose GALA, a novel layer selection criterion to identify the most beneficial updates to perform during test time adaptation. This criterion can also filter out unreliable samples with noisy gradients. Its simplicity allows seamless integration with existing TTA loss functions, thereby preventing degradation and focusing adaptation on the most trainable layers. This approach also helps to regularize adaptation to preserve the pretrained features, which are crucial for handling unseen domains. Through extensive experiments, we demonstrate that the proposed layer selection framework improves the performance of existing TTA approaches across multiple datasets, domain shifts, model architectures, and TTA losses.

A Layer Selection Approach to Test Time Adaptation

This study addresses the challenge of detecting anomalies in multivariate time series data. Considering a bag (e.g., multi-sensor data) consisting of two-dimensional spaces of time points and multivariate instances (e.g., individual sensors), we aim to detect anomalies at both the bag and instance level with a unified model. To circumvent the practical difficulties of labeling at the instance level in such spaces, we adopt a multiple instance learning (MIL)-based approach, which enables learning at both the bag- and instance- levels using only the bag-level labels. In this study, we introduce time-aware and instance-learnable MIL (simply, TAIL-MIL). We propose two specialized attention mechanisms designed to effectively capture the relationships between different types of instances. We innovatively integrate these attention mechanisms with conjunctive pooling applied to the two-dimensional structure at different levels (i.e., bag- and instance-level), enabling TAIL-MIL to effectively pinpoint both the timing and causative multivariate factors of anomalies. We provide theoretical evidence demonstrating TAIL-MIL's efficacy in detecting instances with two-dimensional structures. Furthermore, we empirically validate the superior performance of TAIL-MIL over the state-of-the-art MIL methods and multivariate time-series anomaly detection methods.

TAIL-MIL: Time-Aware and Instance-Learnable Multiple Instance Learning for Multivariate Time Series Anomaly Detection

The reward signal plays a central role in defining the desired behaviors of agents in reinforcement learning (RL). Rewards collected from realistic environments could be perturbed, corrupted, or noisy due to an adversary, sensor error, or because they come from subjective human feedback. Thus, it is important to construct agents that can learn under such rewards. Existing methodologies for this problem make strong assumptions, including that the perturbation is known in advance, clean rewards are accessible, or that the perturbation preserves the optimal policy. We study a new, more general, class of unknown perturbations, and introduce a distributional reward critic framework for estimating reward distributions and perturbations during training. Our proposed methods are compatible with any RL algorithm. Despite their increased generality, we show that they achieve comparable or better rewards than existing methods in a variety of environments, including those with clean rewards. Under the challenging and generalized perturbations we study, we win/tie the highest return in 44/48 tested settings (compared to 11/48 for the best baseline). Our results broaden and deepen our ability to perform RL in reward-perturbed environments.

The Distributional Reward Critic Framework for Reinforcement Learning Under Perturbed Rewards

Causal learning relies on the computationally demanding task of estimating causal graphs. In this paper, a new divide-and-conquer approach called DCILP is proposed for causal graph learning. The divide phase proceeds by identifying the Markov blanket MB$(X_i)$ of each variable $X_i$, and then addressing simultaneously the subproblems restricted on each MB$(X_i)$. Each subproblem benefits from a more favorable ratio between the number of data samples and the number of variables considered; however, it is adversely affected by the presence of hidden confounders (as variables external to MB$(X_i)$ might influence the variables inside). 
The novelty of DCILP lies in the conquer phase, which tackles the problem of aggregating the local causal graphs from the divide phase. Such an aggregation is a challenging combinatorial optimization problem especially in large-scale applications. We show that this aggregation can be formulated as an integer linear programming (ILP) problem, which is delegated to an ILP solver. Through experiments and comparisons with state-of-the-art methods, the proposed approach demonstrates comparable or improved learning accuracy while achieving significant improvements in terms of scalability in the graph size.

DCILP: A Distributed Approach for Large-Scale Causal Structure Learning

Time series forecasting requires reliable uncertainty estimates. Gaussian process regression provides a powerful framework for modelling this in a probabilistic fashion. However, its application to large time series is challenging, due to its cubic time complexity and quadratic memory requirement. In this work, we present KernelMatmul, a novel method that accelerates Gaussian process inference and thus facilitates scaling of Gaussian process regression to large, irregularly sampled and multi-output time series. Leveraging conjugate gradients in combination with sparsity approximation, KernelMatmul achieves time and memory complexity linear in the number of samples. We thoroughly benchmark our new method against multiple baselines to demonstrate its benefits and limitations, both in efficiency and accuracy.

KernelMatmul: Scaling Gaussian Processes to Large Time Series

We present and release MIDI-GPT, a generative system based on the Transformer architecture that is designed for computer-assisted music composition workflows. MIDI-GPT supports the infilling of musical material at the track and bar level, and can condition generation on particular attributes including: instrument type, musical style, note density, polyphony level, and note duration. In order to integrate these features, we employ an alternative representation for musical material, creating a time-ordered sequence of musical events for each track and concatenating several tracks into a single sequence, rather than using a single time-ordered sequence where the musical events corresponding to different tracks are interleaved. We also propose a variation of our representation allowing for expressiveness. We present experimental results that demonstrate that MIDI-GPT is able to consistently avoid duplicating the musical material it was trained on, generate music that is stylistically similar to the training dataset, and that attribute controls allow enforcing various constraints on the generated material. We also outline several real-world applications of MIDI-GPT, including collaborations with industry partners that explore integrating and evaluating MIDI-GPT, into real-world products, as well as several artistic works produced using it.

MIDI-GPT: A Controllable Generative Model for Computer-Assisted Multitrack Music Composition

Continuous control tasks often involve high-dimensional, dynamic, and non-linear environments. State-of-the-art performance in these tasks is achieved through complex black-box policies that are effective, but suffer from an inherent opacity. Interpretable policies, while generally underperforming compared to their black-box counterparts, advantageously facilitate transparent decision-making within automated systems. Hence, their usage is often essential for diagnosing and mitigating errors, supporting ethical and legal accountability, and fostering trust among stakeholders. In this paper, we propose SMoSE, a novel method to train sparsely activated interpretable controllers, based on a top-1 Mixture-of-Experts architecture. SMoSE combines a set of interpretable decision-makers, trained to be experts in different basic skills, and an interpretable router that assigns tasks among the experts. The training is carried out via state-of-the-art Reinforcement Learning algorithms, exploiting load-balancing techniques to ensure fair expert usage. We then distill decision trees from the weights of the router, significantly improving the ease of interpretation. We evaluate SMoSE on six benchmark environments from MuJoCo: our method outperforms recent interpretable baselines and narrows the gap with non-interpretable state-of-the-art algorithms.

SMoSE: Sparse Mixture of Shallow Experts for Interpretable Reinforcement Learning in Continuous Control Tasks

Multimodal large language models (MLLMs) can simultaneously process visual, textual, and auditory data, capturing insights that complement human analysis. 
However, existing video question-answering (VidQA) benchmarks and datasets often exhibit a bias toward a single modality, despite the goal of requiring advanced reasoning skills that integrate diverse modalities to answer the queries.

In this work, we introduce the modality importance score (MIS) to identify such bias. It is designed to assess which modality embeds the necessary information to answer the question. 
Additionally, we propose an innovative method using state-of-the-art MLLMs to estimate the modality importance, which can serve as a proxy for human judgments of modality perception.
With this MIS, we demonstrate the presence of unimodal bias and the scarcity of genuinely multimodal questions in existing datasets. 
We further validate the modality importance score with multiple ablation studies to evaluate the performance of MLLMs on permuted feature sets. 
Our results indicate that current models do not effectively integrate information due to modality imbalance in existing datasets. 
Our proposed MLLM-derived MIS can guide the curation of modality-balanced datasets that advance multimodal learning and enhance MLLMs' capabilities to understand and utilize synergistic relations across modalities.

Premium content

Next from AAAI 2025

Calibrating Large Language Models with Sample Consistency

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES