Belkhatir Mohammed (August 1st 2008). Towards Intelligible Query Processing in Relevance Feedback-Based Image Retrieval Systems, Tools in Artificial Intelligence, Paula Fritzsche, IntechOpen, DOI: 10.5772/6095. 

Takashi Kuremoto, Takaomi Hirata, Masanao Obayashi, Shingo Mabu and Kunikazu Kobayashi (April 3rd 2019). Training Deep Neural Networks with Reinforcement Learning for Time Series Forecasting, Time Series Analysis - Data, Methods, and Applications, Chun-Kit Ngan, IntechOpen, DOI: 10.5772/intechopen.85457. 

Masakazu Sato, Kaori Koga, Tomoyuki Fujii and Yutaka Osuga (June 27th 2018). Can Reinforcement Learning Be Applied to Surgery?, Artificial Intelligence - Emerging Trends and Applications, Marco Antonio Aceves-Fernandez, IntechOpen, DOI: 10.5772/intechopen.76146. 

New Zealand

Reinforcement learning has been successful in training autonomous agents to accomplish goals in complex environments. Although this has been adapted to multiple settings, including robotics and computer games, human players often find it easier to obtain higher rewards in some environments than reinforcement learning algorithms. This is especially true of high-dimensional state spaces where the reward obtained by the agent is sparse or extremely delayed. In this presentation, we introduce the FRESH (Feedback-based REward SHaping) framework, which eﬀectively integrates feedback signals supplied by a human operator with deep reinforcement learning algorithms in high-dimensional state spaces. During training, a human operator is presented with trajectories from a replay buﬀer and then provides feedback on states and actions in the trajectory. In order to generalize feedback signals provided by the human operator to previously unseen states and actions at test-time, we use a feedback neural network. We use an ensemble of neural networks with a shared network architecture to represent model uncertainty and the confidence of the neural network in its output. The output of the feedback neural network is converted to a shaping reward that is augmented to the reward provided by the environment. We evaluate our approach on the Bowling and Skiing Atari games in the arcade learning environment. Although human experts have achieved high scores in these environments, state-of-the-art deep learning algorithms perform poorly. We observe that FRESH achieves much higher scores than state-of-the-art deep learning algorithms in both environments. FRESH also achieves a 21.4% higher score than a human expert in Bowling and does as well as an expert in Skiing.

AAMAS 2020

FRESH: Interactive Reward Shaping in High-Dimensional State Spaces using Human Feedback

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-23 is the Thirty-Seventh AAAI Conference on Artificial Intelligence. The theme of this conference is to create collaborative bridges within and beyond AI. Like previous AAAI conferences, AAAI-23 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and two new activities: a Bridge Program and a Lab Program. Many of these activities are tailored to the theme of bridges and all are selected according to the highest standards, with additional programs for students and young researchers. 
AAAI is providing you with a conference planner, which you can use to help organize your itinerary of activities. This includes talks to attend in person, talks to attend remotely, breaks with colleagues and your site seeing activities. To access this conference planner, please go to [https://aaai-2023.takemobi.io](https://aaai-2023.takemobi.io).

In order to access this site, you need to register. If you haven't already, please register [here](https://aaai.org/Conferences/AAAI-23/registration/).


AAAI 2023

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines.

technical paper

AAMAS is the leading scientific conference for research in autonomous agents and multi-agent systems. The AAMAS conference series was initiated in 2002 as the merging of three respected scientific meetings: the International Conference on Multi-Agent Systems (ICMAS), the International Workshop on Agent Theories, Architectures, and Languages (ATAL), and the International Conference on Autonomous Agents (AA). The aim of the joint conference is to provide a single, high-profile, internationally-respected archival forum for scientific research in the theory and practice of autonomous agents and multi-agent systems.

Browse keynotes, discussions, panels and over 300 presentations.


AAMAS is the leading scientific conference for research in autonomous agents and multi-agent systems. The AAMAS conference series was initiated in 2002 as the merging of three respected scientific meetings: the International Conference on Multi-Agent Systems (ICMAS), the International Workshop on Agent Theories, Architectures, and Languages (ATAL), and the International Conference on Autonomous Agents (AA). 

When multiple agents learn in a decentralized manner, the environment appears non-stationary from the perspective of an individual agent due to the exploration and learning of the other agents. Recently proposed deep multi-agent reinforcement learning methods have tried to mitigate this non-stationarity by attempting to determine which samples are from other agent exploration or suboptimality and take them less into account during learning. Based on the same philosophy, this paper introduces a decentralized quantile estimator, which aims to improve performance by distinguishing non-stationary samples based on the likelihood of returns. In particular, each agent considers the likelihood that other agent exploration and policy changes are occurring, essentially utilizing the agent’s own estimations to weigh the learning rate that should be applied towards the given samples. We introduce a formal method of calculating differences of our return distribution representations and methods for utilizing it to guide updates. We also explore the effect of risk-seeking strategies for adjusting learning over time and propose adaptive risk distortion functions which guides risk sensitivity. Our experiments, on traditional benchmarks and new domains, show our methods are more stable, sample efficient and more likely to converge to a joint optimal policy than previous methods.


Likelihood Quantile Networks for Coordinating Multi-Agent Reinforcement Learning

Previous approaches to adversary modeling in network security games (NSGs) have been caught in the paradigm of first building a full adversary model, either from expert input or historical attack data, and then solving the game. Motivated by the need to disrupt the multibillion dollar illegal smuggling networks, such as wildlife and drug trafficking, this paper introduces a fundamental shift in learning adversary behavior in NSGs by focusing on the accuracy of the model using the downstream game that will be solved. Further, the paper addresses technical challenges in building such a game-focused learning model by i) applying graph convolutional networks to NSGs to achieve tractability and differentiability and ii) using randomized block updates of the coefficients of the defender's optimization in order to scale the approach to large networks. We show that our game-focused approach yields scalability and higher defender expected utility than models trained for accuracy only.

Scalable Game-Focused Learning of Adversary Models: Data-to-Decisions in Network Security Games

A Qualitative Approach to Composing Value-Align Norm Systems

State-of-the-art multi-agent reinforcement learning has achieved remarkable success in recent years. The success has been mainly based on the assumption that all teammates perfectly cooperate to optimize a global objective in order to achieve a common goal. While this may be true in the ideal case, these approaches could fail in practice, since in multi-agent systems (MAS), all agents may be a potential source of failure. In this presentation, we focus on resilience in cooperative MAS and propose an Antagonist-Ratio Training Scheme (ARTS) by reformulating the original target MAS as a mixed cooperative-competitive game between a group of protagonists which represent agents of the target MAS and a group of antagonists which represent failures in the MAS. While the protagonists can learn robust policies to ensure resilience against failures, the antagonists can learn malicious behavior to provide an adequate test suite for other MAS. We empirically evaluate ARTS in a cyber physical production domain and show the effectiveness of ARTS w.r.t. resilience and testing capabilities.


Learning and Testing Resilience in Cooperative Multi-Agent Systems

Multi-agent resource allocation is an important and well-studied problem within AI and economics. It is generally assumed that the quantity of each resource is known a priori. However, in many real-world problems, such as the production of renewable energy which is typically weather dependent, the exact amount of each resource may not be known at the time of decision making. In this paper we investigate fair division of a homogeneous divisible resource where the available amount is given by a probability distribution.
Specifically, we study the notion of ex-ante envy-freeness, where, in expectation, agents weakly prefer their allocation over every other agent's allocation. We analyse the trade-off between fairness and social welfare. We show that allocations satisfying ex-ante envy-freeness can result in higher social welfare compared to those satisfying ex-post envy-freeness. Nevertheless, the price of envy-freeness is at least $\Omega(n)$, where $n$ is the number of agents, and this is tight under concave valuation functions. Principally, we show that the problem of optimising ex-ante social welfare subject to ex-ante envy-freeness is NP-hard in the strong sense. Finally, we devise an integer program to calculate the optimal ex-ante envy-free allocation for linear satiable valuation functions.


Fair Allocation of Resources with Uncertain Availability

As part of the DARPA SocialSim challenge, we address the problem of predicting behavioral phenomena including information spread involving hundreds of thousands of users across three major linked social networks: Twitter, Reddit and GitHub. Our approach develops a framework for data-driven agent simulation that begins with a discrete-event simulation of the environment populated with generic, ﬂexible agents, then optimizes the decision model of the agents by combining a number of machine learning classification problems. The ML problems predict when an agent will take a certain action in its world and are designed to combine aspects of the agents, gathered from historical data, with dynamic aspects of the environment including the resources, such as tweets, that agents interact with at a given point in time. In this way, each of the agents makes individualized decisions based on their environment, neighbors and history during the simulation, although global simulation data is used to learn accurate generalizations. This approach showed the best performance of all participants in the DARPA challenge across a broad range of metrics. We describe the performance of models both with and without machine learning on measures of cross-platform information spread defned both at the level of the whole population and at the community level. The best-performing model overall combines learned agent behaviors with explicit modeling of bursts in global activity. Because of the general nature of our approach, it is applicable to a range of prediction problems that require modeling individualized, situational agent behavior from trace data that combines many agents.


Massive Cross-Platform Simulations of Online Social Networks

**Please click on the button bellow to see this lecture on SlidesLive:**

[![](https://assets.underline.io/uploads/markdown_image/1/image/08e10ad349922f32e7322b77b8df9019.png)](https://slideslive.com/38946802/boston-dynamics)

**Abstract:**

In epidemiology science, the importance to explore innovative modeling tools for acutely analyzing epidemic diffusion is turning into a big challenge considering the myriad of real-world aspects to capture. Typically, equation-based models, such as SIS and SIR, are used to study the propagation of diseases over a population. Improved approaches also include human-mobility patterns as network information to describe contacts among individuals. However, there still is the need to incorporate in these models information about different types of contagion, geographical information, humans habits, and environmental properties.
In this paper, we propose a novel approach that takes into account: 1. direct and indirect epidemic contagion pathways to explore the dynamics of the epidemic, 2. the times of possible contagions, and 3. human-mobility patterns. We combine these three features exploiting time-varying hypergraphs, and we embed this model into a design-methodology for agent-based models (ABMs), able to improve the correctness in the epidemic estimations of classical contact-network approaches. We further describe a diffusion algorithm suitable for our design-methodology and adaptable to the peculiarities of any disease spreading policies and models.
Finally, we tested our methodology by developing an ABM, realizing the SIS epidemic compartmental model, for simulating an epidemic propagation over a population of individuals. We experimented with the model using real user-mobility data from the location-based social network Foursquare, and we demonstrated the high-impact of temporal direct and indirect contagion pathways.


A Design-Methodology for Epidemic Dynamics via Time-Varying Hypergraphs

Collective learning can be greatly enhanced when agents effectively exchange knowledge with their peers. In particular, recent work studying agents that learn to teach other teammates has demonstrated that action advising accelerates team-wide learning. However, the prior work has simplified the learning of advising policies by using simple function approximations and only considered advising with primitive (low-level) actions, limiting the scalability of learning and teaching to complex domains. This paper introduces a novel learning-to-teach framework, called hierarchical multiagent teaching (HMAT), that improves scalability to complex environments by using the deep representation for student policies and by advising with more expressive extended action sequences over multiple levels of temporal abstraction. Our empirical evaluations demonstrate that HMAT improves team-wide learning progress in large, complex domains where previous approaches fail. HMAT also learns teaching policies that can effectively transfer knowledge to different teammates with knowledge of different tasks, even when the teammates have heterogeneous action spaces.

Learning Hierarchical Teaching Policies for Cooperative Agents

In classical elections, voters only submit their ballot once, whereas, in iterative voting, the ballots may be changed iteratively. Following the work by Wilczynski [2019], we consider the case where a social network represents an underlying structure between the voters, meaning that each voter can see her neighbors’ ballots. In addition, there is a polling agency, which publicly announces the result for the initial vote. This paper investigates the manipulative power of the polling agency. Previously, Wilczynski [2019] studied constructive manipulation for the plurality rule. We introduce destructive manipulation and extend the study to the veto rule. Several restricted variants are considered with respect to their parameterized complexity. The theoretical results are complemented by experiments
using different heuristics.

FRESH: Interactive Reward Shaping in High-Dimensional State Spaces using Human Feedback

Downloads

Next from AAMAS 2020

Likelihood Quantile Networks for Coordinating Multi-Agent Reinforcement Learning

Similar lecture

CEMA – Cost-Efficient Machine-Assisted Document Annotations

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

FRESH: Interactive Reward Shaping in High-Dimensional State Spaces using Human Feedback

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAMAS 2020

Likelihood Quantile Networks for Coordinating Multi-Agent Reinforcement Learning

Similar lecture

CEMA – Cost-Efficient Machine-Assisted Document Annotations

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads