New Zealand

Kunikazu Kobayashi, Koji Nakano, Takashi Kuremoto and Masanao Obayashi (February 1st 2010). Objective-based Reinforcement Learning System for Cooperative Behavior Acquisition, Application of Machine Learning, Yagang Zhang, IntechOpen, DOI: 10.5772/8615. 

Yoshiyuki Tajima and Takehisa Onisawa (February 1st 2010). Model-based Reinforcement Learning with Model Error and Its Application, Application of Machine Learning, Yagang Zhang, IntechOpen, DOI: 10.5772/8607.

&quot;This presentation shows and evaluates a family of AlphaZero value targets, subsuming previous variants and introducing AlphaZero with greedy backups (A0GB).
Current state-of-the-art algorithms for playing board games use sample-based planning, such as Monte Carlo Tree Search (MCTS), combined with deep neural networks (NN) to approximate the value function. These algorithms, of which AlphaZero is a prominent example, are computationally extremely expensive to train, due to their reliance on many neural network evaluations. This limits their practical performance. 
We improve the training process of AlphaZero by using more effective training targets for the neural network. We introduce a family of training targets, covering the original AlphaZero training target as well as the soft-Z and \emph{A0C} variants from the literature. We demonstrate that A0GB, using a specific new value target from this family, is able to find the optimal policy in a small tabular domain, whereas the original AlphaZero target fails to do so.
In addition, we show that soft-Z, A0C and A0GB achieve better performance and faster training than the original AlphaZero target on two benchmark board games (Connect-Four and Breakthrough).&quot;


AAMAS 2020

Value targets in off-policy AlphaZero: a new greedy backup

reinforcement learning; sample-based planning; alphazero; mcts

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-23 is the Thirty-Seventh AAAI Conference on Artificial Intelligence. The theme of this conference is to create collaborative bridges within and beyond AI. Like previous AAAI conferences, AAAI-23 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and two new activities: a Bridge Program and a Lab Program. Many of these activities are tailored to the theme of bridges and all are selected according to the highest standards, with additional programs for students and young researchers. 
AAAI is providing you with a conference planner, which you can use to help organize your itinerary of activities. This includes talks to attend in person, talks to attend remotely, breaks with colleagues and your site seeing activities. To access this conference planner, please go to [https://aaai-2023.takemobi.io](https://aaai-2023.takemobi.io).

In order to access this site, you need to register. If you haven't already, please register [here](https://aaai.org/Conferences/AAAI-23/registration/).


AAAI 2023

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines.

"This presentation shows and evaluates a family of AlphaZero value targets, subsuming previous variants and introducing AlphaZero with greedy backups (A0GB).
Current state-of-the-art algorithms for playing board games use sample-based planning, such as Monte Carlo Tree Search (MCTS), combined with deep neural networks (NN) to approximate the value function. These algorithms, of which AlphaZero is a prominent example, are computationally extremely expensive to train, due to their reliance on many neural network evaluations. This limits their practical performance. 
We improve the training process of AlphaZero by using more effective training targets for the neural network. We introduce a family of training targets, covering the original AlphaZero training target as well as the soft-Z and \emph{A0C} variants from the literature. We demonstrate that A0GB, using a specific new value target from this family, is able to find the optimal policy in a small tabular domain, whereas the original AlphaZero target fails to do so.
In addition, we show that soft-Z, A0C and A0GB achieve better performance and faster training than the original AlphaZero target on two benchmark board games (Connect-Four and Breakthrough)."


technical paper

AAMAS is the leading scientific conference for research in autonomous agents and multi-agent systems. The AAMAS conference series was initiated in 2002 as the merging of three respected scientific meetings: the International Conference on Multi-Agent Systems (ICMAS), the International Workshop on Agent Theories, Architectures, and Languages (ATAL), and the International Conference on Autonomous Agents (AA). The aim of the joint conference is to provide a single, high-profile, internationally-respected archival forum for scientific research in the theory and practice of autonomous agents and multi-agent systems.

Browse keynotes, discussions, panels and over 300 presentations.


AAMAS is the leading scientific conference for research in autonomous agents and multi-agent systems. The AAMAS conference series was initiated in 2002 as the merging of three respected scientific meetings: the International Conference on Multi-Agent Systems (ICMAS), the International Workshop on Agent Theories, Architectures, and Languages (ATAL), and the International Conference on Autonomous Agents (AA). 

Realistically modelling behaviour and interaction of heterogeneous road users (pedestrians and vehicles) in mixed-traffic zones (a.k.a. shared spaces) is challenging. The dynamic nature of the environment, heterogeneity of transport modes, and the absence of classical traffic rules make realistic microscopic traffic simulation hard problems. Existing multi-agent-based simulations of shared spaces largely use an expert-based approach, combining a symbolic (e.g. rule-based) modelling and reasoning paradigm (e.g. using BDI representations of beliefs and plans) with the hand-crafted encoding of the actual decision logic. More recently, deep learning (DL) models are largely used to derive and predict trajectories based on e.g. video data. In-depth studies comparing these two kinds of approaches are missing. In this work, we propose an expert-based model called GSFM that combines Social Force Model and Game theory and a DL model called LSTM-DBSCAN that manipulates Long Short-Term Memories and density-based clustering for multi-agent trajectory prediction. We create a common framework to run these two models in parallel to guarantee a fair comparison. Real-world mixed traffic data from shared spaces of different layout are used to calibrate/train and evaluate the models. The empirical results imply that both models can generate realistic predictions, but they differ in the way of handling collisions and mimicking heterogeneous behaviour. Via a thorough study, we draw the conclusion of their respective strengths and weaknesses. 


Trajectory Modelling in Shared Spaces: Expert-Based vs. Deep Learning Approach?

This presentation introduces a new cooperative multi-agent approach for segmenting brain Magnetic Resonance Images (MRIs). MRIs are manually processed by human radiology experts for the identification of many diseases and the monitoring of their evolution. However, such a task is time-consuming and depends on expert decision, which can be affected by many factors. Therefore, various types of research were and are still conducted to automate MRI processing, mainly MRI segmentation. The approach presented in this paper, without any parametrization or prior knowledge, uses a set of situated agents, locally interacting to segment images according to two main phases: the detection of discontinuities and the detection of similarities. An implementation of this approach was tested on phantom brain MR images to assess the results and prove its efficiency. Experimental results ensure a minimum of 89\% Dice coefficient with increasing values of the noise and the intensity non-uniformity.


A Cooperative Approach Based on Local Detection of Similarities and Discontinuities for Brain MR Images Segmentation

**Please click on the button bellow to see this lecture on SlidesLive:**

[![](https://assets.underline.io/uploads/markdown_image/1/image/08e10ad349922f32e7322b77b8df9019.png)](https://slideslive.com/38946802/boston-dynamics)

**Abstract:**

Distributed agent-based simulations often suffer from an imbalance in computational load, leading to a suboptimal use of resources. This happens when part of the computational resoures are waiting idle for another process to finish. Self-adaptive load-balancing algorithms have been developed to use these resources more optimally. These algorithms are typically implemented ad-hoc, making re-usability and maintenance difficult. In this work, we present a generic self-adaptive framework. This methodology is evaluated with the Acsim framework on two simulations: a micro-traffic simulation and a cellular automata simulation. For each of these scenarios a scalable and adaptive load-balancing algorithm is implemented, showing significant improvements in execution time of the simulation.


Adaptivity in distributed agent-based simulation

Although plenty of qualitative logical frameworks have been pro- posed to evaluate and model trust in multi-agent sittings, these ap- proaches generally ignore reasoning about quantitative aspects such as degrees of trust. In this paper, we address this limitation from the modelling and verification perspectives. We start by constructing TCTLG , a logical language to represent the quantitative aspect of trust and present a set of its reasoning postulates. Moreover, we develop and implement a new symbolic model checking algorithm and open source tool for quantifying the relationships among the interacting agents. Finally, we investigate the complexity and evaluate our approach using a case study in the health care domain.


Computationally Grounded Quantitative Trust with Time

"Agent-based simulations of social media platforms often need to be run for many repetitions at large scale. Often, researchers must compromise between available computational resources (memory, run-time), the scale of the simulation, and the quality of its predictions.

As a step to support this process, we present a systematic exploration of simplifications of agent simulations across a number of dimensions suitable for social media studies. Simplifications explored include sub-sampling, implementing agents representing teams or groups of users, simplifying agent behavior, and simplifying the environment.

We also propose a tool that helps apply simplifications to a simulation model, and helps find simplifications that approximate the behavior of the full-scale simulation within computational resource limits.

We present experiments in two social media domains, GitHub and Twitter, using data both to design agents and to test simulation predictions against ground truth. Sub-sampling agents often provides a simple and effective strategy in these domains, particularly in combination with simplifying agent behavior, yielding up to an order of magnitude improvement in run-time with little or no loss in predictive power. Moreover, some simplifications improve performance over the full-scale simulation by removing noise.

We describe domain characteristics that may indicate the most effective simplification strategies and discuss heuristics for automatic exploration of simplifications."


Optimization of Large-scale Agent-based Simulations through Automated Abstraction and Simplification

The Smart City and Internet-of-Things revolutions enable the collection of various types of data in real-time through sensors. This data can be used to improve the decision tools and simulations used by city planners. This work presents a new framework for real-time traﬃc simulation integrating an agent-based methodology with live CCTV and other sensor data while respecting the privacy regulations. The framework simulates traﬃc ﬂows of pedestrians, vehicles and bicycles and their interactions. The approach has been applied in Liverpool (NSW, Australia) showing promising preliminary results and can easily ingest additional sensor data, e.g. air quality.


Towards Agent-Based Traffic Simulation Using Live Data from Sensors for Smart Cities

This presentation introduces a teaching model BC-MDP to cultivate a population by tweaking the environment dynamics. The proposed BC-MDP combines the Behaviour Cultivation core with the recent advances of Curriculum MDPs. This allows BC-MDP to address several shortcomings of the classical BC, while preserving its strengths, such as the freedom from the teacher-learner value alignment. Our model exploits the knowledge of the learner population adaptation process to induce and proliferate a desired behaviour throughout the population. We experimentally show its effectiveness, and retention of key positive features of both BC and Curriculum-MDP.


Teaching Multiple Learning Agents by Environment-Dynamics Tweaks

"Norms are behavioral expectations in communities. Online communities are also expected to abide by the rules and regulations that are expressed in the code of conduct of a system. Even though community authorities continuously prompt their users to follow the regulations, it is observed that hate speech and abusive language usage are on the
rise. In this paper, we quantify and analyze the patterns of violations of normative behaviour among the users of Stack Overflow - a well-known technical question-answer site for professionals and enthusiast programmers, while posting a comment. Even though the site has been dedicated to technical problem solving and debugging, hate speech as well as posting offensive comments make the community ""toxic"". By identifying and minimising various patterns of norm violations in different SO communities, the community would become less toxic and thereby the community can engage more effectively in its goal of knowledge sharing. Moreover, through automatic detection of such comments, the authors can be warned by the moderators, so that it is less likely to be repeated, thereby the reputation of the site and community can be improved. Based on the comments extracted from two different data sources on SO, this work first presents a taxonomy of norms that are violated. Second, it demonstrates the sanctions for certain norm violations. Third, it proposes a recommendation system that can be used to warn users that they are about to violate a norm. This can help achieve norm adherence in online communities."


Norm violation identification in online communities : A study of Stack Overflow comments

We report a previously unidentified issue with model-free, value-based approaches to multiobjective reinforcement learning in the context of environments with stochastic state transitions. An example multiobjective Markov Decision Process (MOMDP) is used to demonstrate that under such conditions these approaches may be unable to discover the policy which maximises the Scalarised Expected Return, and in fact may converge to a Pareto-dominated solution. We discuss several alternative methods which may be
more suitable for maximising SER in MOMDPs with stochastic transitions.

A Demonstration of Issues with Value-Based Multiobjective Reinforcement Learning Under Stochastic State Transitions

Autonomous agents (AA) will increasingly be interacting with us in our daily lives. While we want the benefits attached to AAs, it is essential that their behavior is aligned with our values and norms. Hence, an AA will need to estimate the values and norms of the humans it interacts with, which is not a straightforward task when solely observing an agent's behavior. This paper analyses to what extent an AA is able to estimate the values and norms of a simulated human agent (SHA) based on its actions in the ultimatum game. We present two methods to reduce ambiguity in profiling the SHAs: one based on search space exploration and another based on counterfactual analysis. We found that both methods are able to increase the confidence in estimating human values and norms, but differ in their applicability, the latter being more efficient when the number of interactions with the agent is to be minimized. These insights are useful to improve the alignment of AAs with human values and norms.

Downloads

Next from AAMAS 2020

Trajectory Modelling in Shared Spaces: Expert-Based vs. Deep Learning Approach?

Similar lecture

ContraFeat: Contrasting Deep Features for Semantic Discovery

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAMAS 2020

Trajectory Modelling in Shared Spaces: Expert-Based vs. Deep Learning Approach?

Similar lecture

ContraFeat: Contrasting Deep Features for Semantic Discovery

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads