United States

As AI agents are increasingly adopted to collaborate on complex objectives, ensuring the security of multi-agent systems becomes crucial. The risk of security breaches in these systems creates a fundamental trade-off between increasing protective measures and maintaining collaborative effectiveness.

To study these security risks and trade-offs, we create simulations of agents collaborating on assigned tasks. We focus on scenarios where an attacker compromises one agent, using it to steer the entire system towards misaligned outcomes by corrupting other agents. In this context, we observe &quot;infectious jailbreaks&quot; - the multi-hop spreading of malicious prompts. To mitigate this risk, we evaluate several strategies: two &quot;vaccination&quot; approaches that insert false memories of safely handling malicious inputs into the agents&#39; memory stream, and two versions of a generic safety prompt strategy.

We find that while these mitigation strategies significantly reduce the likelihood of infectious jailbreaks, they differentially impact the collaboration capabilities of the multi-agent system. Our findings demonstrate a general trade-off between security and collaborative efficiency in multi-agent systems, providing insights for designing more secure yet effective AI collaborations.

AAAI 2025

Multi-Agent Security Tax: Trading Off Security and Collaboration Capabilities in Multi-Agent Systems

As AI agents are increasingly adopted to collaborate on complex objectives, ensuring the security of multi-agent systems becomes crucial. The risk of security breaches in these systems creates a fundamental trade-off between increasing protective measures and maintaining collaborative effectiveness.

To study these security risks and trade-offs, we create simulations of agents collaborating on assigned tasks. We focus on scenarios where an attacker compromises one agent, using it to steer the entire system towards misaligned outcomes by corrupting other agents. In this context, we observe "infectious jailbreaks" - the multi-hop spreading of malicious prompts. To mitigate this risk, we evaluate several strategies: two "vaccination" approaches that insert false memories of safely handling malicious inputs into the agents' memory stream, and two versions of a generic safety prompt strategy.

We find that while these mitigation strategies significantly reduce the likelihood of infectious jailbreaks, they differentially impact the collaboration capabilities of the multi-agent system. Our findings demonstrate a general trade-off between security and collaborative efficiency in multi-agent systems, providing insights for designing more secure yet effective AI collaborations.

technical paper

We are pleased to announce the Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25), which will be held in Philadelphia, Pennsylvania at the Pennsylvania Convention Center from February 25 to March 4, 2025.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.

### [Invited Speakers](https://aaai.org/conference/aaai/aaai-25/aaai-25-invited-speakers/)

Register [here](https://aaai.org/conference/aaai/aaai-25/registration/)

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.



We consider infinite-horizon Markov Decision Processes where parameters, such as transition probabilities, are unknown and estimated from data. The popular distributionally robust approach to addressing the parameter uncertainty can sometimes be overly conservative. In this paper, we utilize the recently proposed formulation, Bayesian risk Markov Decision Process (BR-MDP), to address parameter (or epistemic) uncertainty in MDPs. To solve the infinite-horizon BR-MDP with a class of convex risk measures, we propose a computationally efficient approach of approximate bilevel difference convex programming (ABDCP). The optimization is performed offline and produces the optimal policy that is represented as a finite state controller with desirable performance guarantees. We also demonstrate the empirical performance of the infinite-horizon BR-MDP formulation and proposed algorithms.

Approximate Bilevel Difference Convex Programming for Bayesian Risk Markov Decision Processes

Human-object contact (HOT) is designed to accurately identify the areas where humans and objects come into contact. Current methods frequently fail to account for scenarios where objects are frequently blocking the view, resulting in inaccurate identification of contact areas. To tackle this problem, we suggest using a perspective interaction HOT detector called PIHOT, which utilizes a depth map generation model to offer depth information of humans and objects related to the camera, thereby preventing false interaction detection. Furthermore, we use mask dilatation and object restoration techniques to restore the texture details in covered areas, improve the boundaries between objects, and enhance the perception of humans interacting with objects. Moreover, a spatial awareness perception is intended to concentrate on the characteristic features close to the points of contact. The experimental results show that the PIHOT algorithm achieves state-of-the-art performance on three benchmark datasets for HOT detection tasks. Compared to the most recent DHOT, our method enjoys an average improvement of 13\%, 27.5\%, 16\%, and 18.5\% on SC-Acc., C-Acc., mIoU, and wIoU metrics, respectively.

Precision-Enhanced Human-Object Contact Detection via Depth-Aware Perspective Interaction and Object Texture Restoration

It is widely known that state-of-the-art machine learning models, including vision and language models, can be seriously compromised by adversarial perturbations. It is therefore increasingly relevant to develop capabilities to certify their performance in the presence of the most effective adversarial attacks. Our paper offers a new approach to certify the performance of machine learning models in the presence of adversarial attacks with population level risk guarantees. In particular, we introduce the notion of $(\alpha,\zeta)$-safe machine learning model. We propose a hypothesis testing procedure, based on the availability of a calibration set, to derive statistical guarantees providing that the probability of declaring that the adversarial (population) risk of a machine learning model is less than $\alpha$ (i.e. the model is safe), while the model is in fact unsafe (i.e. the model adversarial population risk is higher than $\alpha$), is less than $\zeta$. We also propose Bayesian optimization algorithms to determine efficiently whether a machine learning model is $(\alpha,\zeta)$-safe in the presence of an adversarial attack, along with statistical guarantees. We apply our framework to a range of machine learning models — including various sizes of vision Transformer (ViT) and ResNet models — impaired by a variety of adversarial attacks, such as PGDAttack, MomentumAttack, GenAttack and BanditAttack, to illustrate the operation of our approach. Importantly, we show that ViT's are generally more robust to adversarial attacks than ResNets, and ViT-large is more robust than smaller models. Our approach goes beyond existing empirical adversarial risk-based certification guarantees. It formulates rigorous (and provable) performance guarantees that can be used to satisfy regulatory requirements mandating the use of state-of-the-art technical tools.

PROSAC: Provably Safe Certification for Machine Learning Models under Adversarial Attacks

Time series data, prevalent in fields like medical, e-commerce, finance, etc., is used for forecasting, such as predicting next quarter’s product demand based on past trends. However, some problems necessitate causal models to answer questions like “What the product demand would have been without a specific intervention (e.g., products with slower delivery time suppressed from the search results)?” 
Such questions require causal models to estimate unobserved counterfactual outcome. In this paper, we propose a novel Graph Causal Forecasting (GCF) model, that predicts the unobserved demand leveraging the relationship of a product with other similar products in the marketplace (spatial aspect), along with change in demand over time for each product (temporal aspect). The core idea is to estimate the counterfactual outcome using a synthetic control unaffected by the treatment. Our approach uses RGCN-dilated CNN based network, which leverages domain knowledge to automatically design a synthetic control during training. Using GCF for our demand forecasting problem, we achieve 75.3% lower MAPE compared to baseline. We use the forecasted values to recommend high demand products, in terms of our business metric (discussed later) which tracks the quality of these recommendations, we achieve a significant jump of 61.2%. Moreover, it adds 67.8% more high demand products to the marketplace, compared to existing model in production.
Deployment of GCF in 2023, led to +1399 bps improvement in number of products with a view from customers, and +310 bps improvement in number of products with a sale. We also compare GCF with state of the art forecasting methods on a semi-synthetic data, created by simulating a treatment on open source traffic data METR-LA. We achieve 30% lower MSE against TGCN, a time series forecasting approach and 30% lower MSE against CRN and 25% lower MSE against Google Causal Impact model, both of which are causal forecasting approaches.

GCF: Estimating Unobserved Demand Using Graph Causal Forecasting

Cotton is a critical agricultural product and industrial raw material, playing a key role in the national economies and people's living conditions, particularly in developing countries. However, cotton picking and processing often result in the contamination with various foreign fibers, such as hair, hemp rope, plastic film, and polypropylene rope. These contaminants are difficult to remove during textile processing and tend to break into small fragments, significantly reducing the quality of cotton products and negatively impacting the cotton industry. In this paper, we present an AI-enabled hardware-software integrated system--XCotton, for identifying and removing foreign fibers. Our system has been deployed in actual cotton production environments in the multiple regions in China, Central Asia, and Africa. XCotton achieves a cleaning efficiency of 1000kg/h, representing a 43% improvement, with only 14 kWh energy consumption (63% less). Moreover, XCotton brings significant business values to its manufacturer and clients. XCotton not only enhances the quality of cotton products but also contributes to the value-adding and upgrading of the cotton industry in developing regions, supporting economic growth and improving living conditions.

XCotton: Advancing AI-Enabled Hardware/Software Integrated System for Foreign Fiber Cleaning

In the domain of merchant-oriented risk control decisions within e-commerce, balancing the effectiveness of risk management with merchant satisfaction remains a critical challenge. Strict risk control strategies, while effectively mitigating risks, often lead to increased merchant dissatisfaction. Conversely, loose policies could enhance the merchant experience but raise the likelihood of incidents, potentially incurring substantial financial losses. Additionally, determining personalized risk control strategies for different merchants to achieve optimal overall risk management effectiveness is crucial. Given the high uncertainty in the outcomes of different risk control decisions, manual strategy allocation and real-time adjustments are commonly implemented in practice, leading to significant human and resource costs. In this work, we present a novel automated risk control decision framework that utilizes unbiased data-driven decision-making and dynamic optimization to automate the allocation and adjustment of risk control strategies. Our proposed solution adapts to various online business requirements, demonstrating exceptional risk management performance and significantly reducing overall costs. This approach has been extensively deployed and validated in Alibaba's risk control operations, achieving large-scale automated risk control decisions.

Adaptive Merchant-Centric Risk Control via Unbiased Decision-Making and Dynamic Optimization in E-Commerce

Recent years have witnessed tremendous successes of learning for sequential decision-making, and in particular, Reinforcement Learning (RL). Prominent application examples include playing Go and video games, robotics, autonomous driving, and recently large language models. Most such success stories naturally involve "multi-agents". Hence, there has been surging research interest in advancing Multi-Agent Learning in Dynamic Environments, particularly, multi-agent RL (MARL), to which my research has led and made significant contributions. My work has established both sample and computational complexities of learning in Stochastic Games, the most fundamental model of MARL, and advocated a unique Economics perspective of independent learning in Stochastic Games. My work has also initiated the recent studies of distributed and networked MARL, with applications in robust adversarial RL, offline RL, and Robotics. This paper will survey my notable contributions along this journey of developing the foundations of multi-agent learning in dynamic environments.

Foundations of Multi-Agent Learning in Dynamic Environments: Where Reinforcement Learning Meets Strategic Decision-Making

We introduce LLM Stinger, a novel approach that leverages Large Language Models (LLMs) to automatically generate adversarial suffixes for jailbreak attacks. Unlike traditional methods, which require complex prompt engineering or white-box access, LLM Stinger uses a reinforcement learning (RL) loop to fine-tune an attacker LLM, generating new suffixes based on existing attacks for harmful questions from the HarmBench benchmark. Our method significantly outperforms existing red-teaming approaches (we compared against 15 of the latest methods), achieving a +57.2% improvement in Attack Success Rate (ASR) on LLaMA2-7B-chat and a +50.3% ASR increase on Claude 2, both models known for their extensive safety measures. Additionally, we achieved a 94.97% ASR on GPT-3.5 and 99.4% on Gemma-2B-it, demonstrating the robustness and adaptability of LLM Stinger across open and closed-source models.

LLM Stinger: Jailbreaking LLMs Using RL Fine-Tuned LLMs (Student Abstract)

The quality of interactions between parents and children is a critical factor in child development. Recent years have seen programs to improve parenting behaviors through evidence-based approaches, such as attachment-based interventions. A vital element of these programs is to assess the quality of parenting behaviors via video recordings of parent-child interactions, which is often time-intensive. In our previous work, we explored machine learning models to predict expert ratings of parenting behaviors from video recordings of semi-structured parent-child play. However, the large set of low-level multimodal features struggled to provide explainable insights, which created barriers to communicating with domain experts and improving the models further. In this work, we developed a machine learning pipeline that combines sparse multiple canonical correlation analysis with causal discovery techniques to uncover explainable causal relationships between nine categories of behavioral features and the quality ratings of parent-child interactions. This approach offers valuable insights into the otherwise black-box models and contributes to the growing body of work on transparent and trustworthy machine learning models of parenting behaviors.

Causal Explanation of Quality of Parent-Child Interactions with Multimodal Behavioral Features (Student Abstract)

Knowledge bases traditionally require manual optimization to ensure reasonable performance when answering queries. We build on previous neurosymbolic approaches by improving the training of an embedding model for logical statements that maximizes similarity between unifying atoms and minimizes similarity of non-unifying atoms. In particular, we evaluate different approaches to training this model.

Premium content

Downloads

Next from AAAI 2025

Approximate Bilevel Difference Convex Programming for Bayesian Risk Markov Decision Processes

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES