United States

This paper proposes a novel $k$-medoids approximation algorithm to handle large-scale datasets with reasonable computational time and memory complexity. We develop a local-search algorithm that iteratively improves the medoid selection based on the estimation of the $k$-medoids objective. A single batch of size $m \ll n$ provides the estimation, which reduces the required memory size and the number of pairwise dissimilarities computations to $\mathcal{O}(mn)$, instead of $\mathcal{O}(n^2)$ compared to most $k$-medoids baselines. We obtain theoretical results highlighting that a batch of size $m=\mathcal{O}(\log(n))$ is sufficient to guarantee, with strong probability, the same performance as the original local-search algorithm. Multiple experiments conducted on real datasets of various sizes and dimensions show that our algorithm provides similar performances as state-of-the-art methods such as FasterPAM and BanditPAM++ with a drastically reduced running time.

AAAI 2025

OneBatchPAM: A Fast and Frugal K-Medoids Algorithm

clustering

poster

We are pleased to announce the Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25), which will be held in Philadelphia, Pennsylvania at the Pennsylvania Convention Center from February 25 to March 4, 2025.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.

### [Invited Speakers](https://aaai.org/conference/aaai/aaai-25/aaai-25-invited-speakers/)

Register [here](https://aaai.org/conference/aaai/aaai-25/registration/)

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.



Compositional generalization is crucial for artificial intelligence agents tackling intricate reasoning over vision and language (V\&L) problems. While neuro-symbolic methods have demonstrated potential in understanding compositional structures, they face challenges such as the need for symbolic domain representations that typically involve a set of predefined predicates, difficulties in deriving domain predicates from raw data, and the requirement for differentiable operations to compose primitive concepts. To address these issues, we propose NeSyCoCo, which is built on the existing neuro-symbolic frameworks that leverage large language models (LLMs) for obtaining symbolic representations of the domain and map them to differentiable neural computations for V\&L reasoning. Our approach a) augments the natural language inputs with their dependency structure to improve the accuracy of symbolic representations, b) utilizes distributed word representations for handling the variety of linguistically motivated logical predicates that are linked to neural modules, and c) utilizes soft composition of normalized predicate scores for better semantic alignment between symbolic compositions and differentiable operations. NeSyCoCo achieves state-of-the-art results on the ReaSCAN and CLEVR-CoGenT compositional generalization benchmarks, as well as the CLEVR vision-language benchmark. It also maintains high accuracy with new, similar concepts in the CLEVR-SYN benchmark.

NeSyCoCo: A Neuro-Symbolic Concept Composer for Compositional Generalization

Real-time object detection is critical for the decision-making process for many real-world applications, such as collision avoidance and path planning in autonomous driving. This work presents an innovative real-time streaming perception method, Transtreaming, which addresses the challenge of real-time object detection with dynamic computational delay. The core innovation of Transtreaming lies in its adaptive delay-aware transformer, which can concurrently predict multiple future frames and select the output that best matches the real-world present time, compensating for any system-induced computation delays.
The proposed model outperforms the existing state-of-the-art methods, even in single-frame detection scenarios, by leveraging a transformer-based methodology. It demonstrates robust performance across a range of devices, from powerful V100 to modest 2080Ti, achieving the highest level of perceptual accuracy on all platforms. Unlike most state-of-the-art methods that struggle to complete computation within a single frame on less powerful devices, Transtreaming meets the stringent real-time processing requirements on all kinds of devices. The experimental results emphasize the system's adaptability and its potential to significantly improve the safety and reliability for many real-world systems, such as autonomous driving.
Our code is open-sourced on https://anonymous.4open.science/r/Transtreaming-7333.

Transtreaming: Adaptive Delay-aware Transformer for Real-time Streaming Perception

Recent vision-language foundation models still frequently produce outputs misaligned with their inputs, evidenced by object hallucination in captioning and prompt misalignment in the text-to-image generation model. To build a reliable system, recent research has begun to explore methods to identify misaligned elements to enhance both interpretability and improve model performance. However, current approaches mainly rely on large foundation models or human annotations, limiting scalability due to substantial computational and resource costs. This work proposes a novel approach for detecting dense misalignments from pre-trained CLIP, specifically focusing on pinpointing misaligned words between images and text. We carefully revamp the gradient-based attribution computation method, enabling negative gradients of individual text tokens to indicate misalignment. Then, we propose F-CLIPScore, which aggregates misaligned attributions to detect fine-grained misalignments. We evaluate our method on various dense misalignment detection benchmarks, covering various image and text domains and misalignment types. Our experiments show that our method demonstrates state-of-the-art performance among zero-shot models and competitive performance with fine-tuned models while maintaining superior efficiency. Our qualitative examples show that our method has a unique strength to detect entity-level objects, intangible objects, and attributes that can not be easily detected for existing works. We present comprehensive ablation studies and analyses to elucidate the strengths and limitations of our proposed approach.

Extract Free Dense Misalignment from CLIP

In many critical applications, sensitive data is inherently distributed and cannot be centralized due to privacy concerns. A wide range of federated learning approaches have been proposed to train models locally at each client without sharing their sensitive local data. Most of these approaches share either local model parameters or probabilistic predictions (soft labels) on a public dataset, or a combination of both. However, this still discloses private information and restricts local models to those that can be trained using gradient-based methods. To reduce the amount of shared information, we propose sharing only definitive class labels (hard labels) on a public unlabeled dataset. Clients then use a consensus of these shared labels as pseudo-labels for their local training. This federated co-training approach empirically enhances privacy without compromising model quality. Additionally, it allows the use of local models that are not suited for parameter aggregation in traditional federated learning, such as gradient-boosted decision trees, rule ensembles, and random forests.

Little Is Enough: Boosting Privacy by Sharing Only Hard Labels in Federated Semi-Supervised Learning

Today we rely on networks that are created and maintained by smart devices. For such networks, there is no governing central authority but instead  the network structure is shaped by the decisions of selfish intelligent agents. A key property of such communication networks is that they should be easy to navigate for routing data. For this, a common approach is greedy routing, where every device simply routes data to a neighbor that is closer to the respective destination.    

Networks of intelligent agents can be analyzed via a game-theoretic approach and in the last decades many variants of network creation games have been proposed and analyzed. In this paper we present the first game-theoretic network creation model that incorporates greedy routing, i.e., the strategic agents in our model are embedded in some metric space and strive for creating a network among themselves where all-pairs greedy routing is enabled. Besides this, the agents optimize their connection quality within the created network by aiming for greedy routing paths with low stretch.

For our model, we analyze the existence of (approximate)-equilibria and the computational hardness in different underlying metric spaces. E.g., we characterize the set of equilibria in 1-2-metrics and tree metrics and show that Nash equilibria always exist. For Euclidean space, the setting which is most relevant in practice, we prove that equilibria are not guaranteed to exist but that the well-known $\Theta$-graph construction yields networks having a low stretch that are game-theoretically almost stable. For general metric spaces, we show that approximate equilibria exist where the approximation factor depends on the cost of maintaining any link.

Strategic Network Creation for Enabling Greedy Routing

While machine learning models have proven effective across various scenarios, it is widely acknowledged that many models are vulnerable to adversarial attacks. Recently, there have emerged numerous efforts in adversarial defense. Among them, certified defense is well known for its theoretical guarantees against arbitrary adversarial perturbations on input within a certain range (e.g.,  $l_2$ ball). However, most existing works in this line struggle to generalize their certified robustness in other data domains with distribution shifts. This issue is rooted in the difficulty of eliminating the negative impact of spurious correlations on robustness in different domains. To address this problem, in this work, we propose a novel certified defense framework GLEAN, 
which incorporates a causal perspective into the generalization problem in certified defense. More specifically, our framework integrates a certifiable causal factor learning component to disentangle the causal relations and spurious correlations between input and label, and thereby exclude the negative effect of spurious correlations on defense. On top of that, we design a causally certified defense strategy to handle adversarial attacks on latent causal factors. In this way, our framework is not only robust against malicious noises on data in the training distribution but also can generalize its robustness across domains with distribution shifts. Extensive experiments on benchmark datasets validate the superiority of our framework in certified robustness generalization in different data domains. Code is available in the supplementary materials.

Certified Causal Defense with Generalizable Robustness

Out-of-distribution (OOD) detection, determining whether a given sample is part of the in-distribution (ID) or not, has been newly explored by a generative model-based outlier synthesizing approach, especially with diffusion models. Nonetheless, existing diffusion models often produce outliers that are considerably distant from the ID in pixel-space, showing limited efficacy for capturing subtle distinctions between ID and OOD. To address these issues, we propose a novel framework, Semantic Outlier generation via Nuisance Awareness (SONA), which directly utilizes informative pixel-space ID images in diffusion models. Thereby, the generated outliers achieve two crucial properties: (i) they closely resemble the ID mainly in nuisances, while (ii) represent discriminative semantic information. To facilitate the separate effect on semantics and nuisances, we introduce SONA guidance, providing region-specific guidance. Extensive experiments demonstrate the effectiveness of our framework, achieving an impressive AUROC of 87% on near-OOD datasets, which surpasses the performance of baseline methods by a significant margin of approximately 6%.

Diffusion-based Semantic Outlier Generation via Nuisance Awareness for Out-of-Distribution Detection

Due to the superior ability of global dependency, transformer and its variants have become the primary choice in Masked Time-series Modeling (MTM) towards time-series classification task. In this paper, we experimentally analyze that existing transformer-based MTM methods encounter with two under-explored issues when dealing with time series data: (1) they encode features by performing long-dependency ensemble averaging, which easily results in rank collapse and feature homogenization as the layer goes deeper; (2) they exhibit distinct priorities in fitting different frequency components contained in the time-series, inevitably leading to spectrum energy imbalance of encoded feature. To tackle these issues, we propose an auxiliary content-aware balanced decoder (CBD) to optimize the encoding quality in the spectrum space within masked modeling scheme. Specifically, the CBD iterates on a series of fundamental blocks, and thanks to two tailored units, each block could progressively refine the masked representation via adjusting the interaction pattern based on local content variations of time-series and learning to recalibrate the energy distribution across different frequency components. Moreover, dual-constraint loss is devised to enhance the mutual optimization of vanilla decoder and our CBD. Extensive experimental results on ten time-series classification datasets show that our method nearly surpasses a bunch of baselines. Meanwhile, a series of explanatory results are showcased to sufficiently demystify the behaviors of our method.

Content-aware Balanced Spectrum Encoding in Masked Modeling for Time Series Classification

We present Connected-Component~(CC)-Metrics, a novel semantic segmentation evaluation protocol, targeted to align existing semantic segmentation metrics to a multi-instance detection scenario in which each connected component matters. We motivate this setup in the common medical scenario of semantic metastases segmentation in a full-body PET/CT. We show how existing semantic segmentation metrics suffer from a bias towards larger connected components contradicting the clinical assessment of scans in which tumor size and clinical relevance are uncorrelated. To rebalance existing segmentation metrics, we propose to evaluate them on a per-component basis thus giving each tumor the same weight irrespective of its size. To match predictions to ground-truth segments, we employ a proximity-based matching criterion, evaluating common metrics locally at the component of interest. Using this approach, we break free of biases introduced by large metastasis for overlap-based metrics such as Dice or Surface Dice. CC-Metrics also improves distance-based metrics such as Hausdorff Distances which are uninformative for small changes that do not influence the maximum or $95^{\text{th}}$ percentile, and avoids pitfalls introduced by directly combining counting-based metrics with overlap-based metrics as it is done in Panoptic Quality. 
We will make the code for CC-Metrics publicly available.

Every Component Counts: Rethinking the Measure of Success for Medical Semantic Segmentation in Multi-Instance Segmentation Tasks

As one of the key technologies leading to Artificial General Intelligence (AGI), Large Language Models (LLMs) have achieved remarkable accomplishments. Exploring the capabilities of LLMs is crucial for scientific research, and many studies propose new challenges from various aspects to explore the capability boundaries of LLMs. This paper attempts to push the challenges of information understanding, synthesizing and reasoning to the extreme, in order to explore the boundaries of more advanced dimensional cognitive capabilities in LLMs. It is defined as the task of High-Level Cognition(HLC), which involves obtaining high-level conclusions from low-level and fragmented foundational information. To evaluate HLC, we construct a dataset based on soccer matches. Experiments and analysis on this dataset show that current state-of-the-art LLMs lack the ability to solve the task of HLC, because their performance is equivalent to random-level. However, by fine-tuning Llama3-8B-Instruct, there are improvements of 14.4%, 48.1%, and 19.4% over random-level in three types of evaluation tasks. This indicates that LLMs have great potential to solve the task of HLC.

Premium content

Next from AAAI 2025

NeSyCoCo: A Neuro-Symbolic Concept Composer for Compositional Generalization

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES