Takashi Kuremoto, Takaomi Hirata, Masanao Obayashi, Shingo Mabu and Kunikazu Kobayashi (April 3rd 2019). Training Deep Neural Networks with Reinforcement Learning for Time Series Forecasting, Time Series Analysis - Data, Methods, and Applications, Chun-Kit Ngan, IntechOpen, DOI: 10.5772/intechopen.85457.

Ranka Kulic (June 1st 2008). Modification of Kohonen Rule for Vehicle Path Planing by Behavioral Cloning, Motion Planning, Xing-Jian Jing, IntechOpen, DOI: 10.5772/6009.

New Zealand

While reinforcement learning (RL) agents have the remarkable ability to learn by interacting with their environments, this process is often slow and data inefficient. Because environment interaction is typically expensive, many approaches have been studied to speed up RL. One popular method for doing so is to leverage human knowledge via imitation learning (IL), in which a demonstrator provides an example of the desired behavior, and the agent seeks to imitate. In this in-progress work, we propose a new way of integrating IL and deep RL, which we call corrected self imitation learning, where an agent provided with demonstration can learn faster compared to an agent with no demonstration. Our method does not increase the number of environmental interactions compared to a baseline RL method, and works well even in the case when the demonstrator is not an expert. We evaluate our method in the Atari game of Ms. Pac-Man and achieve promising results indicating our method has the potential to speed up deep RL algorithms.


AAMAS 2020

Work-in-progress: Corrected Self Imitation Learning via Demonstrations

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-23 is the Thirty-Seventh AAAI Conference on Artificial Intelligence. The theme of this conference is to create collaborative bridges within and beyond AI. Like previous AAAI conferences, AAAI-23 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and two new activities: a Bridge Program and a Lab Program. Many of these activities are tailored to the theme of bridges and all are selected according to the highest standards, with additional programs for students and young researchers. 
AAAI is providing you with a conference planner, which you can use to help organize your itinerary of activities. This includes talks to attend in person, talks to attend remotely, breaks with colleagues and your site seeing activities. To access this conference planner, please go to [https://aaai-2023.takemobi.io](https://aaai-2023.takemobi.io).

In order to access this site, you need to register. If you haven't already, please register [here](https://aaai.org/Conferences/AAAI-23/registration/).


AAAI 2023

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines.

technical paper

AAMAS is the leading scientific conference for research in autonomous agents and multi-agent systems. The AAMAS conference series was initiated in 2002 as the merging of three respected scientific meetings: the International Conference on Multi-Agent Systems (ICMAS), the International Workshop on Agent Theories, Architectures, and Languages (ATAL), and the International Conference on Autonomous Agents (AA). The aim of the joint conference is to provide a single, high-profile, internationally-respected archival forum for scientific research in the theory and practice of autonomous agents and multi-agent systems.

Browse keynotes, discussions, panels and over 300 presentations.


AAMAS is the leading scientific conference for research in autonomous agents and multi-agent systems. The AAMAS conference series was initiated in 2002 as the merging of three respected scientific meetings: the International Conference on Multi-Agent Systems (ICMAS), the International Workshop on Agent Theories, Architectures, and Languages (ATAL), and the International Conference on Autonomous Agents (AA). 

Reinforcement learning (RL) is a powerful learning paradigm in which agents can learn to maximize sparse and delayed reward signals. Although RL has had many impressive successes in complex domains, learning can take hours, days, or even years of training data. A major challenge of contemporary RL research is to discover how to learn with less data. Previous work has shown that domain information can be successfully used to shape the reward; by adding additional reward information, the agent can learn with much less data. Furthermore, if the reward is constructed from a potential function, the optimal policy is guaranteed to be unaltered. While such potential-based reward shaping (PBRS) holds promise, it is limited by the need for a well-defined potential function. Ideally, we would like to be able to take arbitrary advice from a human or other agent and improve performance without affecting the optimal policy. The recently introduced dynamic potential based advice (DPBA) method tackles this challenge by admitting arbitrary advice from a human or other agent and improves performance without affecting the optimal policy. The main contribution of this paper is to expose, theoretically and empirically, a flaw in DPBA. Alternatively, to achieve the ideal goals, we present a simple method called policy invariant explicit shaping (PIES) and show theoretically and empirically that PIES succeeds where DPBA fails.



Useful Policy Invariant Shaping from Arbitrary Advice

Learning Context-aware Task Reasoning for Meta Reinforcement Learning

We provide a tutorial of the Virtual Human Toolkit. The Toolkit is a collection of modules, tools, and libraries designed to aid and support researchers and developers with the creation of virtual human conversational characters. The Toolkit is developed at the University of Southern California Institute for Creative Technologies, is freely available for the academic research community, and supports speech recognition, natural language processing, nonverbal behavior generation, nonverbal behavior realization, text-to-speech generation, and rendering. This tutorial focuses on three areas: 1) overview of the main technologies, 2) overview of the overall architecture, and 3) hands-on creation of a basic interactive character. The target audience is interested researchers within the field of interactive agents and related areas with an affinity for technology.

Virtual Human Toolkit Tutorial

Increasingly, much work in AI – from machine learning and natural language processing to planning, perception, and robotics – is based on classical (continuous) optimization. While this foundation has proved to be of considerable practical utility, the increasingly decentralized and networked nature of computation in the 21st century, implies that classical optimization may prove increasingly restrictive, as AI tackles applications that involve massively large networked environments, where data is stored heterogeneously in the “cloud”, and computation involves balancing multiple competing objectives, including cost, privacy, reliability, and security. The aim of this tutorial is to present an elegant mathematical formalism for solving large games that has been extensively studied in network economics, but so far, has not yet played a major role in multiagent AI. The framework is based on variational inequalities, a formalism that extends classical optimization to vector fields. We use a variety of real-world problems, from modeling traffic flow to content distribution on the Internet and green supply chains in sustainable manufacturing, to illustrate the power of this formalism. We also show that work in deep learning on generative adversarial networks results in complex network dynamics, and can be profitably studied this framework. The tutorial will introduce all the necessary mathematics, and should be of interest to AAMAS researchers from a wide variety of backgrounds.

Network Economics, Variational Inequalities and Deep Learning

Fairness in algorithmic decision-making has received growing attention recently. However, fairness in the context of resource allocation has been formally studied for many decades in microeconomics, and for a few decades in computer science. This tutorial will present an overview of this literature, its various fairness definitions, and fair algorithms. The focus will be on recent advances, but no prior background will be required.

The first part of the tutorial will look at the classic setting of cake-cutting, which models allocation of a divisible resource. This part will cover classic fairness notions such as proportionality, envy-freeness, equitability, and Pareto optimality, and explore their interplay with game-theoretic notions such as strategyproofness. This part will end with a discussion on connections between fairness and market equilibria.

The second part of the tutorial will focus on the allocation of indivisible items. This will cover relaxations of proportionality, envy-freeness, and equitability, as well as other fairness notions such as maximin share guarantee. This part will explore static versus dynamic allocations, goods versus chores, private versus public goods, etc.

The tutorial will end with a high-level discussion on how to apply these fairness definitions in other contexts such as voting, machine learning, or ethical decision-making.

Recent Advances in Fair Resource Allocation Tutorial Part 2

Probabilistic Physical Search on General Graphs: Approximations and Heuristics

Task Allocation Strategy for Heterogeneus Robot Teams in Offshore MIssions

A Very Short Survey and Critique of Multiagent Deep Reinforcement Learning

AAMAS 2020  - Day 1

We study online learning settings in which experts act strategically to maximize their influence on the learning algorithm’s predictions by potentially misreporting their beliefs about a sequence of binary events. Our goal is twofold. First, we want the learning algorithm to be no-regret with respect to the best-fixed expert in hindsight. Second, we want incentive compatibility, a guarantee that each expert’s best strategy is to report his true beliefs about the realization of each event. To achieve this goal, we build on the literature on wagering mechanisms, a type of multi-agent scoring rule. We provide algorithms that achieve no regret and incentive compatibility for myopic experts for both the full and partial information settings. In experiments on datasets from FiveThirtyEight, our algorithms have regret comparable to classic no-regret algorithms, which are not incentive-compatible. Finally, we identify an incentive-compatible algorithm for forward-looking strategic agents that exhibits diminishing regret in practice.

Work-in-progress: Corrected Self Imitation Learning via Demonstrations

Downloads

Next from AAMAS 2020

Useful Policy Invariant Shaping from Arbitrary Advice

Similar lecture

Practical Parallel Algorithms for Submodular Maximization subject to a Knapsack Constraint with Nearly Optimal Adaptivity

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Work-in-progress: Corrected Self Imitation Learning via Demonstrations

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAMAS 2020

Useful Policy Invariant Shaping from Arbitrary Advice

Similar lecture

Practical Parallel Algorithms for Submodular Maximization subject to a Knapsack Constraint with Nearly Optimal Adaptivity

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads