Ümit Ulusoy, Mehmet Serdar Güzel and Erkan Bostanci (April 22nd 2020). A Q-Learning-Based Approach for Simple and Multi-Agent Systems, Multi Agent Systems - Strategies and Applications, Ricardo López - Ruiz, IntechOpen, DOI: 10.5772/intechopen.88484.

Mengchun Xie (March 18th 2019). Improvement of Cooperative Action for Multi-Agent System by Rewards Distribution, Assistive and Rehabilitation Engineering, Yves Rybarczyk, IntechOpen, DOI: 10.5772/intechopen.85109. 

New Zealand

State-of-the-art multi-agent reinforcement learning has achieved remarkable success in recent years. The success has been mainly based on the assumption that all teammates perfectly cooperate to optimize a global objective in order to achieve a common goal. While this may be true in the ideal case, these approaches could fail in practice, since in multi-agent systems (MAS), all agents may be a potential source of failure. In this presentation, we focus on resilience in cooperative MAS and propose an Antagonist-Ratio Training Scheme (ARTS) by reformulating the original target MAS as a mixed cooperative-competitive game between a group of protagonists which represent agents of the target MAS and a group of antagonists which represent failures in the MAS. While the protagonists can learn robust policies to ensure resilience against failures, the antagonists can learn malicious behavior to provide an adequate test suite for other MAS. We empirically evaluate ARTS in a cyber physical production domain and show the effectiveness of ARTS w.r.t. resilience and testing capabilities.


AAMAS 2020

Learning and Testing Resilience in Cooperative Multi-Agent Systems

2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics.

**Whova App** 
Stay in touch with your fellow conference attendees via the [Whova App](https://whova.com/portal/webapp/nacon_202106/)

**Conference Structure**
https://2021.naacl.org/blog/conference-structure/

**Walkthrough video of how to NAACL 2021** 

Please take a moment to view this video explaining how to navigate the platform, attend sessions network with other attendees. 


<figure class="video_container">
  <iframe src="https://screencast-o-matic.com/watch/crhwbGVh3vx?v=6&ff=1&title=0&controls=1" width=640  height=350  frameborder="0" allowfullscreen="true"> </iframe>
</figure>

<br>
<br>

NAACL 2021

technical paper

AAMAS is the leading scientific conference for research in autonomous agents and multi-agent systems. The AAMAS conference series was initiated in 2002 as the merging of three respected scientific meetings: the International Conference on Multi-Agent Systems (ICMAS), the International Workshop on Agent Theories, Architectures, and Languages (ATAL), and the International Conference on Autonomous Agents (AA). The aim of the joint conference is to provide a single, high-profile, internationally-respected archival forum for scientific research in the theory and practice of autonomous agents and multi-agent systems.

Browse keynotes, discussions, panels and over 300 presentations.


AAMAS is the leading scientific conference for research in autonomous agents and multi-agent systems. The AAMAS conference series was initiated in 2002 as the merging of three respected scientific meetings: the International Conference on Multi-Agent Systems (ICMAS), the International Workshop on Agent Theories, Architectures, and Languages (ATAL), and the International Conference on Autonomous Agents (AA). 

Multi-agent resource allocation is an important and well-studied problem within AI and economics. It is generally assumed that the quantity of each resource is known a priori. However, in many real-world problems, such as the production of renewable energy which is typically weather dependent, the exact amount of each resource may not be known at the time of decision making. In this paper we investigate fair division of a homogeneous divisible resource where the available amount is given by a probability distribution.
Specifically, we study the notion of ex-ante envy-freeness, where, in expectation, agents weakly prefer their allocation over every other agent's allocation. We analyse the trade-off between fairness and social welfare. We show that allocations satisfying ex-ante envy-freeness can result in higher social welfare compared to those satisfying ex-post envy-freeness. Nevertheless, the price of envy-freeness is at least $\Omega(n)$, where $n$ is the number of agents, and this is tight under concave valuation functions. Principally, we show that the problem of optimising ex-ante social welfare subject to ex-ante envy-freeness is NP-hard in the strong sense. Finally, we devise an integer program to calculate the optimal ex-ante envy-free allocation for linear satiable valuation functions.


Fair Allocation of Resources with Uncertain Availability

As part of the DARPA SocialSim challenge, we address the problem of predicting behavioral phenomena including information spread involving hundreds of thousands of users across three major linked social networks: Twitter, Reddit and GitHub. Our approach develops a framework for data-driven agent simulation that begins with a discrete-event simulation of the environment populated with generic, ﬂexible agents, then optimizes the decision model of the agents by combining a number of machine learning classification problems. The ML problems predict when an agent will take a certain action in its world and are designed to combine aspects of the agents, gathered from historical data, with dynamic aspects of the environment including the resources, such as tweets, that agents interact with at a given point in time. In this way, each of the agents makes individualized decisions based on their environment, neighbors and history during the simulation, although global simulation data is used to learn accurate generalizations. This approach showed the best performance of all participants in the DARPA challenge across a broad range of metrics. We describe the performance of models both with and without machine learning on measures of cross-platform information spread defned both at the level of the whole population and at the community level. The best-performing model overall combines learned agent behaviors with explicit modeling of bursts in global activity. Because of the general nature of our approach, it is applicable to a range of prediction problems that require modeling individualized, situational agent behavior from trace data that combines many agents.


Massive Cross-Platform Simulations of Online Social Networks

**Please click on the button bellow to see this lecture on SlidesLive:**

[![](https://assets.underline.io/uploads/markdown_image/1/image/08e10ad349922f32e7322b77b8df9019.png)](https://slideslive.com/38946802/boston-dynamics)

**Abstract:**

In epidemiology science, the importance to explore innovative modeling tools for acutely analyzing epidemic diffusion is turning into a big challenge considering the myriad of real-world aspects to capture. Typically, equation-based models, such as SIS and SIR, are used to study the propagation of diseases over a population. Improved approaches also include human-mobility patterns as network information to describe contacts among individuals. However, there still is the need to incorporate in these models information about different types of contagion, geographical information, humans habits, and environmental properties.
In this paper, we propose a novel approach that takes into account: 1. direct and indirect epidemic contagion pathways to explore the dynamics of the epidemic, 2. the times of possible contagions, and 3. human-mobility patterns. We combine these three features exploiting time-varying hypergraphs, and we embed this model into a design-methodology for agent-based models (ABMs), able to improve the correctness in the epidemic estimations of classical contact-network approaches. We further describe a diffusion algorithm suitable for our design-methodology and adaptable to the peculiarities of any disease spreading policies and models.
Finally, we tested our methodology by developing an ABM, realizing the SIS epidemic compartmental model, for simulating an epidemic propagation over a population of individuals. We experimented with the model using real user-mobility data from the location-based social network Foursquare, and we demonstrated the high-impact of temporal direct and indirect contagion pathways.


A Design-Methodology for Epidemic Dynamics via Time-Varying Hypergraphs

Collective learning can be greatly enhanced when agents effectively exchange knowledge with their peers. In particular, recent work studying agents that learn to teach other teammates has demonstrated that action advising accelerates team-wide learning. However, the prior work has simplified the learning of advising policies by using simple function approximations and only considered advising with primitive (low-level) actions, limiting the scalability of learning and teaching to complex domains. This paper introduces a novel learning-to-teach framework, called hierarchical multiagent teaching (HMAT), that improves scalability to complex environments by using the deep representation for student policies and by advising with more expressive extended action sequences over multiple levels of temporal abstraction. Our empirical evaluations demonstrate that HMAT improves team-wide learning progress in large, complex domains where previous approaches fail. HMAT also learns teaching policies that can effectively transfer knowledge to different teammates with knowledge of different tasks, even when the teammates have heterogeneous action spaces.

Learning Hierarchical Teaching Policies for Cooperative Agents

In classical elections, voters only submit their ballot once, whereas, in iterative voting, the ballots may be changed iteratively. Following the work by Wilczynski [2019], we consider the case where a social network represents an underlying structure between the voters, meaning that each voter can see her neighbors’ ballots. In addition, there is a polling agency, which publicly announces the result for the initial vote. This paper investigates the manipulative power of the polling agency. Previously, Wilczynski [2019] studied constructive manipulation for the plurality rule. We introduce destructive manipulation and extend the study to the veto rule. Several restricted variants are considered with respect to their parameterized complexity. The theoretical results are complemented by experiments
using different heuristics.

Manipulation of Opinion Polls to Influence Iterative Elections

Off Policy Deep Reinforcement Learning with Analogous Disentangled Exploration

Reinforcement learning (RL), like any on-line learning method, inevitably faces the exploration-exploitation dilemma. When a learning algorithm requires as few data samples as possible, it is called sample efficient. The design of sample-efficient algorithms is an important area of research. Interestingly, all currently known provably efficient model-free RL algorithms utilize the same well-known principle of optimism in the face of uncertainty. We unite these existing algorithms into a single general model-free optimistic RL framework. We show how this facilitates the design of new optimistic model-free RL algorithms by simplifying the analysis of their efficiency. Finally, we propose one such new algorithm and demonstrate its performance in an experimental study.

Generalized Optimistic Q-Learning with Provable Efficiency

A patrol of robot teams, where the robots are required to repeatedly visit a target area, is a useful tool in detecting an adversary trying to penetrate. In this work we examine the Closed Perimeter Patrol problem, in which the robots travel along a closed perimeter and the adversary is aware of the robots' patrol policy. The goal is to maximize the probability of penetration detection. Previous work dealt with symmetric tracks, in which all parts of the track have similar properties, and suggested non-deterministic patrol schemes, characterized by a uniform policy along the entire area. We consider more realistic scenarios of asymmetric tracks, with various parts of the track having different properties, and suggest a patrol policy with a non-uniform policy along different points of the track. We compare the achievements of both models and show the advantage of the non-uniform model. We further explore methods to efficiently calculate the attributes needed to maximize the probability of penetration detection and compare their implementation in various scenarios.


Non-Uniform Policies for Multi-Robot Asymmetric Perimeter Patrol in Adversarial Domains

Multi-Agent Path Finding (MAPF) plays an important role in many real-life applications where autonomous agents must coordinate to reach their goals without collisions. MAPF problems often take place in structured environments that are usually assumed to be static and known in advance. In this paper, we introduce C-MAPF, i.e., MAPF in Configurable environments, a novel variant of the MAPF problem in which the environment is configurable, namely its structure and topology can be controlled within some given constraints. Consider, for instance, a warehouse logistics application: the environment can be changed (at least to some degree) by the managers of the warehouse, for example by re-arranging the positions of the shelves or by removing or adding temporary walls. We study the properties of the C-MAPF problem and we devise two algorithms for solving it, both based on Conflict-Based Search (CBS), a state-of-the-art MAPF algorithm. First, we present Parallel CBS (P-CBS), that searches for a solution by simultaneously considering all the possible configurations of the environment. We then present Abstract CBS (A-CBS), an extended version of the CBS algorithm that solves C-MAPF problems by introducing a new type of conflict on the allowable configurations of the environment. We prove that our solvers are both complete and optimal and we experimentally assess their performance in different settings.


Multi-Agent Path Finding in Configurable Environments

In many settings where multiple agents interact, the optimal choices for each agent depend heavily on the choices of the others.
These coupled interactions are well-described by a general-sum differential game, in which players have differing objectives, the state evolves in continuous time, and optimal play may be characterized by one of many equilibrium concepts, e.g., a Nash equilibrium.
Often, problems admit multiple equilibria.
From the perspective of a single agent in such a game, this multiplicity of solutions can introduce uncertainty about how other agents will behave.
This paper proposes a general framework for resolving ambiguity between equilibria by reasoning about the equilibrium other agents are aiming for.
We demonstrate this framework in simulations of a multi-player human-robot navigation problem that yields two main conclusions:
First, by inferring which equilibrium humans are operating at, the robot is able to predict trajectories more accurately, and second, by discovering and aligning itself to this equilibrium the robot is able to reduce the cost for all players.


Downloads

Next from AAMAS 2020

Fair Allocation of Resources with Uncertain Availability

Similar lecture

A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAMAS 2020

Fair Allocation of Resources with Uncertain Availability

Similar lecture

A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads