United States

We present *generative clustering* (GC) for clustering a set of documents, $\mathbf{X}$, by using texts $\mathbf{Y}$ generated by large language models (LLMs) instead of by clustering the original documents $\mathbf{X}$.  Because LLMs provide probability distributions, the similarity between two documents can be rigorously defined in an information-theoretic manner by the KL divergence. We also propose a natural, novel clustering algorithm by using importance sampling. We show that GC outperforms any previous clustering method, often by a large margin. Furthermore, we show an application to generative document retrieval in which documents are indexed via hierarchical clustering and our method improves the retrieval accuracy.

AAAI 2025

Information-Theoretic Generative Clustering of Documents

poster

We are pleased to announce the Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25), which will be held in Philadelphia, Pennsylvania at the Pennsylvania Convention Center from February 25 to March 4, 2025.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.

### [Invited Speakers](https://aaai.org/conference/aaai/aaai-25/aaai-25-invited-speakers/)

Register [here](https://aaai.org/conference/aaai/aaai-25/registration/)

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.



Tabular data plays a vital role in various real-world scenarios and finds extensive applications. Although recent deep tabular models have shown remarkable success, they still struggle to handle data distribution shifts, leading to performance degradation when testing distributions change. To remedy this, a robust tabular model must adapt to generalize to unknown distributions during testing. In this paper, we investigate the problem of fully test-time adaptation (FTTA) for tabular data, where the model is adapted using only the testing data. We identify three key challenges: the existence of label and covariate distribution shifts, the lack of effective data augmentation, and the sensitivity of adaptation, which render existing FTTA methods ineffective for tabular data. To this end, we propose the Fully Test-time Adaptation for Tabular data, namely FTAT, which enables FTTA methods to robustly optimize the label distribution of predictions, adapt to shifted covariate distributions, and dynamically adapt the model for various tasks and models. We conduct comprehensive experiments on six benchmark datasets, which are evaluated using three metrics. The experimental results demonstrate that FTAT outperforms state-of-the-art methods by a margin.

Fully Test-time Adaptation for Tabular Data

While novel gradient-based attacks are continuously proposed to improve the optimization of adversarial examples, each is shown to outperform its predecessors using different experimental setups, implementations, and computational budgets, leading to biased and unfair comparisons.
In this work, we overcome this issue by proposing *AttackBench*, i.e., an attack evaluation framework that evaluates the effectiveness of each attack (along with its different library implementations) under the same maximum available computational budget. To this end, we (i) define a novel *optimality* metric that quantifies how close each attack is to the optimal solution (empirically estimated by ensembling all attacks), and (ii) limit the maximum number of forward and backward queries that each attack can execute on the target model. 
Our extensive experimental analysis compares more than $100$ attack implementations over $800$ different configurations, considering both CIFAR-10 and ImageNet models, and shows that only few attack implementations outperform all the remaining approaches. These findings suggest that novel defenses should be evaluated against different attacks than those normally used in the literature to avoid overly-optimistic robustness evaluations.
We release *AttackBench* as a publicly-available benchmark that will be continuously updated with new attack implementations to maintain an up-to-date ranking of the best gradient-based attacks.

AttackBench: Evaluating Gradient-based Attacks for Adversarial Examples

In recent years, the application of transformer-based models in time-series forecasting has received significant attention.
While often demonstrating promising results, the transformer architecture encounters challenges in fully exploiting the temporal relations within time series data due to its attention mechanism.
In this work, we design e**X**ponential **Patch** (xPatch for short), a novel dual-stream architecture that utilizes exponential decomposition.
Inspired by the classical exponential smoothing approaches, xPatch introduces the innovative seasonal-trend exponential decomposition module.
Additionally, we propose a dual-flow architecture that consists of an MLP-based linear stream and a CNN-based non-linear stream.
This model investigates the benefits of employing patching and channel-independence techniques within a non-transformer model.
Finally, we develop a robust arctangent loss function and a sigmoid learning rate adjustment scheme, which prevent overfitting and boost forecasting performance.

xPatch: Dual-Stream Time Series Forecasting with Exponential Seasonal-Trend Decomposition

For general users, training a neural network from scratch is usually challenging and labor-intensive. Fortunately, neural network zoos enable them to find a well-performing model for directly use or fine-tuning it in their local environments. Although current model retrieval solutions attempt to convert neural network models into vectors to avoid complex multiple inference processes required for model selection, it is still difficult to choose a suitable model due to inaccurate vectorization and biased correlation alignment between the query dataset and models. From the perspective of knowledge consistency, i.e., whether the knowledge possessed by the model can meet the needs of query tasks,  we propose a model retrieval scheme, named Know2Vec, that acts as a black-box retrieval proxy for model zoo. Know2Vec first accesses to models via a black-box interface in advance, capturing vital decision knowledge from models while ensuring their privacy. Next, it employs an effective encoding technique to transform the knowledge into precise model vectors.  Secondly, it maps the user's query task to a knowledge vector by probing the semantic relationships within query samples. 
Furthermore, the proxy ensures the knowledge-consistency between query vector and model vectors within their alignment space, which is optimized through the supervised learning with diverse loss functions, and finally it can identify the most suitable model for a given task during the inference stage. Extensive experiments show that our Know2Vec achieves superior retrieval accuracy against the state-of-the-art methods in diverse neural network retrieval tasks.

Know2Vec: A Black-Box Proxy for Neural Network Retrieval

Advanced Deep Neural Networks (DNNs) perform well for high-quality images, but their performance dramatically decreases for degraded images. Data augmentation is commonly used to alleviate this problem, while using too many perturbed data might seriously decrease the performance for pristine images. To tackle this challenge, we take our cue from the spatial coincidence assumption about human visual perception, i.e. multiscale and varying receptive fields are required for understanding pristine and degraded images. Correspondingly, we propose a novel plug-and-play network architecture, dubbed Quality-Adaptive Receptive Fields (QuARF), to automatically select the optimal receptive fields based on the quality of the input image.  To this end, we first design a multi-kernel convolutional block, which comprises multiscale continuous receptive fields. Afterward, we design a quality-adaptive routing network to predict the significance of each kernel, based on the quality features extracted from the input image. In this way, QuARF automatically selects the optimal inference route for each image.  To further boost efficiency and effectiveness, the input feature map is split into multiple groups, with each group independently learning its quality-adaptive routing parameters. We apply QuARF to a variety of DNNs, and conduct experiments in both discriminative and generation tasks, including semantic segmentation, image translation, and restoration. Thorough experimental results show that QuARF significantly and robustly improves the performance for degraded images, and outperforms data augmentation in most cases. Our code will be released after peer review.

QuARF: Quality-Adaptive Receptive Fields for Degraded Image Perception

Molecular design inherently involves the optimization of multiple conflicting objectives, such as enhancing bio-activity and ensuring synthesizability. Evaluating these objectives often requires resource-intensive computations or physical experiments. Current molecular design methodologies typically approximate the Pareto set using a limited number of molecules. In this paper, we present an innovative approach, called Multi-Objective Molecular Design through Learning Latent Pareto Set (MLPS). MLPS initially utilizes an encoder-decoder model to seamlessly transform the discrete chemical space into a continuous latent space. We then employ local Bayesian optimization models to efficiently search for local optimal solutions (i.e., molecules) within predefined trust regions. Using surrogate objective values derived from these local models, we train a global Pareto set learning model to understand the mapping between direction vectors (called “preferences”) in the objective space and the entire Pareto set in the continuous latent space. Both the global Pareto set learning model and local Bayesian optimization models collaborate to discover high-quality solutions and adapt the trust regions dynamically. Our work is an effective endeavor towards learning the Pareto set for multi-objective molecular design, providing decision-makers with the capability to fine-tune their preferences and thoroughly explore the Pareto set. Experimental results demonstrate that MLPS achieves state-of-the-art performance across various multi-objective scenarios, encompassing diverse objective types and varying numbers of objectives. The effectiveness of MLPS was further validated through real-world challenges in discovering antifungal peptides with low toxicity and high activity.

Multi-Objective Molecular Design Through Learning Latent Pareto Set

Continual Learning (CL) for malware classification tackles the rapidly evolving nature of malware threats and the frequent emergence of new types. Generative Replay (GR)-based CL systems utilize a generative model to produce synthetic versions of past data, which are then combined with new data to retrain the primary model. Traditional machine learning techniques in this domain often struggle with catastrophic forgetting, where a model's performance on old data degrades over time.

In this paper, we introduce a GR-based CL system that employs Generative Adversarial Networks (GANs) with feature matching loss to generate high-quality malware samples. Additionally, we implement innovative selection schemes for replay samples based on the model’s hidden representations.

Our comprehensive evaluation across Windows and Android malware datasets in a class-incremental learning scenario -- where new classes are introduced continuously over multiple tasks -- demonstrates substantial performance improvements over previous methods. For example, our system achieves an average accuracy of 55\% on Windows malware samples, significantly outperforming other GR-based models by 28\%. This study provides practical insights for advancing GR-based malware classification systems.

MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic Forgetting in Malware Classification

The learnware paradigm aims to establish a market of numerous pre-trained machine learning models, enabling users to reuse existing helpful models for their tasks instead of starting from scratch. Each learnware in the market is a well-established model submitted by its developer, associated with a specification generated by the learnware market. The specification characterizes the specialty of the corresponding model, enabling it to be identified accurately for new task requirements. Existing specification generation methods are mostly based on the Reduced Kernel Mean Embedding (RKME) technique, which seeks a reduced set in the Reproducing Kernel Hilbert Space (RKHS) determined by the kernel function that approximates the feature distribution of the training data to generate the specification. However, such RKME-based specification methods solely utilize the feature information while leave the label information, which is capable of providing rich semantic characterization, untouched. Furthermore, the quality of the generated specification heavily relies on the choice of the kernels, which makes it prohibitive to adapt to all real-world scenarios. In this paper, to overcome the above limitations, we propose a novel specification approach named Lane, i.e., Label-Aware Neural Embedding. In Lane, the neural embedding space is utilized to replace the RKHS, effectively circumventing the step of kernel selection and thereby addressing the dependency on kernels in existing RKME-based specification methods. More importantly, Lane uses the label information as additional supervision to enhance the generation process, resulting in specifications of superior quality. Extensive experiments demonstrate the effectiveness and superiority of the proposed Lane approach in the learnware paradigm.

Learnware Specification via Label-Aware Neural Embedding

Lately, deep generative models have achieved excellent results after learning pre-defined and static data distribution. Meanwhile, their performance on continual learning suffers from degeneration, caused by catastrophic forgetting. In this paper, we study unsupervised generative modelling in a more realistic continual learning scenario in which the class and task information are absent during both training and inference learning phases. We address the challenges raised by this configuration by proposing a novel memory management approach, derived from a biological perspective that the brain can quickly remember temporary information while gradually preserving essential information whenever necessary. To implement this goal, the proposed memory approach consists of a temporary memory system, which stores given data examples while a dynamic expansion memory system would gradually preserve those samples that are crucial for long-term memorization. A novel memory expansion mechanism is then proposed, by employing optimal transport distances between the statistics of memorized samples and each newly seen datum. The optimal transport mechanism is represented by a memory expansion signal by means of the Sinkhorn scaling algorithm, which preserves a diversity of samples using a compact memory capacity. The memory approach does not require to interact with the model's training process and can be optimized independently in both supervised and unsupervised learning without any modifications. Moreover, we propose a novel dynamic model expansion mechanism to automatically increase the model's capacity whenever necessary, which can deal with infinite data streams and further improve the model's performance. Experimental results show that the proposed approach achieves state-of-the-art performance in both supervised and unsupervised learning.

Continual Unsupervised Generative Modelling via Online Optimal Transport

Effective performance in vision-language tasks is fundamentally dependent on strong multimodal alignment. The CoCa model marks notable progress in this area by combining contrastive learning with image captioning into a single framework. However, its reliance on global representations and the one-way flow of information from images to text limits its capacity to accurately reconstruct visual content from textual descriptions. To address these limitations, we propose BiMAC, a novel framework that enables bidirectional interactions between images and text at both global and local levels. Our model includes a text-driven visual reconstruction (TDVR) component that not only reconstructs visual content but also uses visual cues to generate corresponding textual descriptions. This bidirectional mechanism improves the integration of visual and textual data. We also implement a text-region alignment to select relevant image patches for interaction, effectively avoiding information clutter and ensuring precise mappings between images and text. BiMAC demonstrates superior performance across image-text understanding tasks, including retrieval, captioning, and classification.

Premium content

Next from AAAI 2025

Fully Test-time Adaptation for Tabular Data

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES