United States

Distributed optimization is the standard way of speeding up machine learning training, and most of the research in the area focuses on distributed first-order, gradient-based methods. Yet, there are settings where some computationally-bounded nodes may not be able to implement \emph{first-order, gradient-based} optimization, while they could still contribute to joint optimization tasks.
In this paper, we initiate the study of hybrid decentralized optimization, studying settings where nodes with zeroth-order and first-order optimization capabilities co-exist in a distributed system, and attempt to jointly solve an optimization task over some data distribution. We essentially show that, under reasonable parameter settings, such a system can not only withstand noisier zeroth-order agents but can even benefit from integrating such agents into the optimization process, rather than ignoring their information. At the core of our approach is a new analysis of distributed optimization with noisy and possibly-biased gradient estimators, which may be of independent interest. Our results hold for both convex and non-convex objectives. Experimental results on standard optimization tasks confirm our analysis, showing that hybrid first-zeroth order optimization can be practical, even when training deep neural networks.

AAAI 2025

Hybrid Decentralized Optimization: Leveraging Both First- and Zeroth-Order Optimizers for Faster Convergence

distributed machine learning

federated learning

poster

We are pleased to announce the Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25), which will be held in Philadelphia, Pennsylvania at the Pennsylvania Convention Center from February 25 to March 4, 2025.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.

### [Invited Speakers](https://aaai.org/conference/aaai/aaai-25/aaai-25-invited-speakers/)

Register [here](https://aaai.org/conference/aaai/aaai-25/registration/)

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.



In many critical applications, sensitive data is inherently distributed and cannot be centralized due to privacy concerns. A wide range of federated learning approaches have been proposed to train models locally at each client without sharing their sensitive local data. Most of these approaches share either local model parameters or probabilistic predictions (soft labels) on a public dataset, or a combination of both. However, this still discloses private information and restricts local models to those that can be trained using gradient-based methods. To reduce the amount of shared information, we propose sharing only definitive class labels (hard labels) on a public unlabeled dataset. Clients then use a consensus of these shared labels as pseudo-labels for their local training. This federated co-training approach empirically enhances privacy without compromising model quality. Additionally, it allows the use of local models that are not suited for parameter aggregation in traditional federated learning, such as gradient-boosted decision trees, rule ensembles, and random forests.

Little Is Enough: Boosting Privacy by Sharing Only Hard Labels in Federated Semi-Supervised Learning

Today we rely on networks that are created and maintained by smart devices. For such networks, there is no governing central authority but instead  the network structure is shaped by the decisions of selfish intelligent agents. A key property of such communication networks is that they should be easy to navigate for routing data. For this, a common approach is greedy routing, where every device simply routes data to a neighbor that is closer to the respective destination.    

Networks of intelligent agents can be analyzed via a game-theoretic approach and in the last decades many variants of network creation games have been proposed and analyzed. In this paper we present the first game-theoretic network creation model that incorporates greedy routing, i.e., the strategic agents in our model are embedded in some metric space and strive for creating a network among themselves where all-pairs greedy routing is enabled. Besides this, the agents optimize their connection quality within the created network by aiming for greedy routing paths with low stretch.

For our model, we analyze the existence of (approximate)-equilibria and the computational hardness in different underlying metric spaces. E.g., we characterize the set of equilibria in 1-2-metrics and tree metrics and show that Nash equilibria always exist. For Euclidean space, the setting which is most relevant in practice, we prove that equilibria are not guaranteed to exist but that the well-known $\Theta$-graph construction yields networks having a low stretch that are game-theoretically almost stable. For general metric spaces, we show that approximate equilibria exist where the approximation factor depends on the cost of maintaining any link.

Strategic Network Creation for Enabling Greedy Routing

While machine learning models have proven effective across various scenarios, it is widely acknowledged that many models are vulnerable to adversarial attacks. Recently, there have emerged numerous efforts in adversarial defense. Among them, certified defense is well known for its theoretical guarantees against arbitrary adversarial perturbations on input within a certain range (e.g.,  $l_2$ ball). However, most existing works in this line struggle to generalize their certified robustness in other data domains with distribution shifts. This issue is rooted in the difficulty of eliminating the negative impact of spurious correlations on robustness in different domains. To address this problem, in this work, we propose a novel certified defense framework GLEAN, 
which incorporates a causal perspective into the generalization problem in certified defense. More specifically, our framework integrates a certifiable causal factor learning component to disentangle the causal relations and spurious correlations between input and label, and thereby exclude the negative effect of spurious correlations on defense. On top of that, we design a causally certified defense strategy to handle adversarial attacks on latent causal factors. In this way, our framework is not only robust against malicious noises on data in the training distribution but also can generalize its robustness across domains with distribution shifts. Extensive experiments on benchmark datasets validate the superiority of our framework in certified robustness generalization in different data domains. Code is available in the supplementary materials.

Certified Causal Defense with Generalizable Robustness

Out-of-distribution (OOD) detection, determining whether a given sample is part of the in-distribution (ID) or not, has been newly explored by a generative model-based outlier synthesizing approach, especially with diffusion models. Nonetheless, existing diffusion models often produce outliers that are considerably distant from the ID in pixel-space, showing limited efficacy for capturing subtle distinctions between ID and OOD. To address these issues, we propose a novel framework, Semantic Outlier generation via Nuisance Awareness (SONA), which directly utilizes informative pixel-space ID images in diffusion models. Thereby, the generated outliers achieve two crucial properties: (i) they closely resemble the ID mainly in nuisances, while (ii) represent discriminative semantic information. To facilitate the separate effect on semantics and nuisances, we introduce SONA guidance, providing region-specific guidance. Extensive experiments demonstrate the effectiveness of our framework, achieving an impressive AUROC of 87% on near-OOD datasets, which surpasses the performance of baseline methods by a significant margin of approximately 6%.

Diffusion-based Semantic Outlier Generation via Nuisance Awareness for Out-of-Distribution Detection

Due to the superior ability of global dependency, transformer and its variants have become the primary choice in Masked Time-series Modeling (MTM) towards time-series classification task. In this paper, we experimentally analyze that existing transformer-based MTM methods encounter with two under-explored issues when dealing with time series data: (1) they encode features by performing long-dependency ensemble averaging, which easily results in rank collapse and feature homogenization as the layer goes deeper; (2) they exhibit distinct priorities in fitting different frequency components contained in the time-series, inevitably leading to spectrum energy imbalance of encoded feature. To tackle these issues, we propose an auxiliary content-aware balanced decoder (CBD) to optimize the encoding quality in the spectrum space within masked modeling scheme. Specifically, the CBD iterates on a series of fundamental blocks, and thanks to two tailored units, each block could progressively refine the masked representation via adjusting the interaction pattern based on local content variations of time-series and learning to recalibrate the energy distribution across different frequency components. Moreover, dual-constraint loss is devised to enhance the mutual optimization of vanilla decoder and our CBD. Extensive experimental results on ten time-series classification datasets show that our method nearly surpasses a bunch of baselines. Meanwhile, a series of explanatory results are showcased to sufficiently demystify the behaviors of our method.

Content-aware Balanced Spectrum Encoding in Masked Modeling for Time Series Classification

We present Connected-Component~(CC)-Metrics, a novel semantic segmentation evaluation protocol, targeted to align existing semantic segmentation metrics to a multi-instance detection scenario in which each connected component matters. We motivate this setup in the common medical scenario of semantic metastases segmentation in a full-body PET/CT. We show how existing semantic segmentation metrics suffer from a bias towards larger connected components contradicting the clinical assessment of scans in which tumor size and clinical relevance are uncorrelated. To rebalance existing segmentation metrics, we propose to evaluate them on a per-component basis thus giving each tumor the same weight irrespective of its size. To match predictions to ground-truth segments, we employ a proximity-based matching criterion, evaluating common metrics locally at the component of interest. Using this approach, we break free of biases introduced by large metastasis for overlap-based metrics such as Dice or Surface Dice. CC-Metrics also improves distance-based metrics such as Hausdorff Distances which are uninformative for small changes that do not influence the maximum or $95^{\text{th}}$ percentile, and avoids pitfalls introduced by directly combining counting-based metrics with overlap-based metrics as it is done in Panoptic Quality. 
We will make the code for CC-Metrics publicly available.

Every Component Counts: Rethinking the Measure of Success for Medical Semantic Segmentation in Multi-Instance Segmentation Tasks

As one of the key technologies leading to Artificial General Intelligence (AGI), Large Language Models (LLMs) have achieved remarkable accomplishments. Exploring the capabilities of LLMs is crucial for scientific research, and many studies propose new challenges from various aspects to explore the capability boundaries of LLMs. This paper attempts to push the challenges of information understanding, synthesizing and reasoning to the extreme, in order to explore the boundaries of more advanced dimensional cognitive capabilities in LLMs. It is defined as the task of High-Level Cognition(HLC), which involves obtaining high-level conclusions from low-level and fragmented foundational information. To evaluate HLC, we construct a dataset based on soccer matches. Experiments and analysis on this dataset show that current state-of-the-art LLMs lack the ability to solve the task of HLC, because their performance is equivalent to random-level. However, by fine-tuning Llama3-8B-Instruct, there are improvements of 14.4%, 48.1%, and 19.4% over random-level in three types of evaluation tasks. This indicates that LLMs have great potential to solve the task of HLC.

Can Large Language Models Derive High-Level Cognition from Low-Level and Fragmented Foundational Information?

Large Language Models (LLMs) are susceptible to malicious influence by cyber attackers through intrusions such as adversarial, backdoor, and embedding inversion attacks. 
In response, the burgeoning field of LLM Security aims to study and defend against such threats. 
Thus far, the majority of works in this area have focused on monolingual English models, however, emerging research suggests that multilingual LLMs may be more vulnerable to various attacks than their monolingual counterparts. 
While previous work has investigated embedding inversion over a small subset of European languages, it is challenging to extrapolate these findings to languages from different linguistic families and with differing scripts. 
To this end, we explore the security of multilingual LLMs in the context of embedding inversion attacks and investigate cross-lingual and cross-script 
inversion across 20 languages, spanning over 8 language families and 12 scripts.
Our findings indicate that languages written in Arabic script and Cyrillic script are particularly vulnerable to embedding inversion, as are languages within the Indo-Aryan language family. 
We further observe that inversion models tend to suffer from language confusion, sometimes greatly reducing the efficacy of an attack. 
Accordingly, we systematically explore this bottleneck for inversion models, uncovering predictable patterns which could be leveraged by attackers. 
Ultimately, this study aims to further the field's understanding of the outstanding security vulnerabilities facing multilingual LLMs and raise awareness for the languages most at risk of negative impact from these attacks.

Against All Odds: Overcoming Typology, Script, and Language Confusion in Multilingual Embedding Inversion Attacks

The ability to estimate temporal relationships is critical for both animals and artificial agents. Cognitive science and neuroscience provide remarkable insights into behavioral and neural aspects of temporal credit assignment. In particular, scale-invariance of learning dynamics observed in behavior and supported by neural data is one of the key principles that governs animal perception: if the temporal relationships in the environment rescale, the number of trials required for learning will remain constant. Here we integrate a computational neuroscience model of scale-invariant memory into deep reinforcement learning (RL) agents. We first provide a theoretical analysis and then demonstrate through experiments that such agents can learn robustly across a wide range of temporal scales, unlike agents built with commonly used recurrent memory architectures such as LSTM. This result illustrates that building computational principles from neuroscience and cognitive science into deep neural networks can result in systems that are flexible learners, similar to humans.

Deep Reinforcement Learning with Time-Scale Invariant Memory

In this paper, we draw an analogy between processing natural languages and processing multivariate event streams from vehicles in order to predict when and what error pattern is most likely to occur in the future for a given car. Our approach leverages the temporal dynamics and contextual relationships of our event data from a fleet of cars. Event data is composed of discrete values of error codes as well as continuous values such as time and mileage. Modelled by two causal transformers, we can anticipate critical vehicle failures and malfunctions before they happen. Thus, we introduce CarFormer, a transformer model trained via a new self-supervised learning strategy, and Epredictor, an autoregressive transformer decoder model capable of predicting when and what error pattern will most likely occur after some error code apparition. Despite the challenges of high cardinality of event types, their unbalanced frequency of appearance and a limited labeled data, our experimental results demonstrate the excellent predictive ability of our novel model. Specifically, with on average sequences of 160 error codes, our model is able with only half of the error codes to achieve $80$% F1 score for predicting what error pattern will occur and achieves an average absolute error of $58.4 \pm 13.2$h when forecasting the time of occurrence,e, thus enabling confident predictive maintenance and enhancing vehicle safety.

Premium content

Next from AAAI 2025

Little Is Enough: Boosting Privacy by Sharing Only Hard Labels in Federated Semi-Supervised Learning

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES