Yoshiyuki Tajima and Takehisa Onisawa (February 1st 2010). Model-based Reinforcement Learning with Model Error and Its Application, Application of Machine Learning, Yagang Zhang, IntechOpen, DOI: 10.5772/8607. 

Weiqi Li (December 12th 2019). Solution Attractor of Local Search System: A Method to Reduce Computational Complexity of the Traveling Salesman Problem [Online First], IntechOpen, DOI: 10.5772/intechopen.90521. 

Randa Khemiri, Nejmeddine Bahri, Fatma Belghith, Soulef Bouaafia, Fatma Elzahra Sayadi, Mohamed Atri and Nouri Masmoudi (October 23rd 2019). Fast Motion Estimation’s Configuration Using Diamond Pattern and ECU, CFM, and ESD Modes for Reducing HEVC Computational Complexity [Online First], IntechOpen, DOI: 10.5772/intechopen.86792.

New Zealand

Model-based reinforcement learning algorithms make decisions by building and utilizing a model of the environment. However, none of the existing algorithms attempts to infer the dynamics of any state-action pair from known state-action pairs before meeting it for sufficient times. We propose a new model-based method called Greedy Inference Model (GIM) that infers the unknown dynamics from known dynamics based on the internal spectral properties of the environment. In other words, GIM can &quot;learn by analogy&quot;. We further introduce a new exploration strategy which ensures that the agent rapidly and evenly visits unknown state-action pairs. GIM is much more computationally efficient than state-of-the-art model-based algorithms, as the number of dynamic programming operations is independent of the environment size. Lower sample complexity could also be achieved under mild conditions compared against methods without inferring. Experimental results demonstrate the effectiveness and efficiency of GIM in a variety of real-world tasks.


AAMAS 2020

Can Agents Learn by Analogy? An Inferable Model for PAC Reinforcement Learning

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-23 is the Thirty-Seventh AAAI Conference on Artificial Intelligence. The theme of this conference is to create collaborative bridges within and beyond AI. Like previous AAAI conferences, AAAI-23 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and two new activities: a Bridge Program and a Lab Program. Many of these activities are tailored to the theme of bridges and all are selected according to the highest standards, with additional programs for students and young researchers. 
AAAI is providing you with a conference planner, which you can use to help organize your itinerary of activities. This includes talks to attend in person, talks to attend remotely, breaks with colleagues and your site seeing activities. To access this conference planner, please go to [https://aaai-2023.takemobi.io](https://aaai-2023.takemobi.io).

In order to access this site, you need to register. If you haven't already, please register [here](https://aaai.org/Conferences/AAAI-23/registration/).


AAAI 2023

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines.

Model-based reinforcement learning algorithms make decisions by building and utilizing a model of the environment. However, none of the existing algorithms attempts to infer the dynamics of any state-action pair from known state-action pairs before meeting it for sufficient times. We propose a new model-based method called Greedy Inference Model (GIM) that infers the unknown dynamics from known dynamics based on the internal spectral properties of the environment. In other words, GIM can "learn by analogy". We further introduce a new exploration strategy which ensures that the agent rapidly and evenly visits unknown state-action pairs. GIM is much more computationally efficient than state-of-the-art model-based algorithms, as the number of dynamic programming operations is independent of the environment size. Lower sample complexity could also be achieved under mild conditions compared against methods without inferring. Experimental results demonstrate the effectiveness and efficiency of GIM in a variety of real-world tasks.


technical paper

AAMAS is the leading scientific conference for research in autonomous agents and multi-agent systems. The AAMAS conference series was initiated in 2002 as the merging of three respected scientific meetings: the International Conference on Multi-Agent Systems (ICMAS), the International Workshop on Agent Theories, Architectures, and Languages (ATAL), and the International Conference on Autonomous Agents (AA). The aim of the joint conference is to provide a single, high-profile, internationally-respected archival forum for scientific research in the theory and practice of autonomous agents and multi-agent systems.

Browse keynotes, discussions, panels and over 300 presentations.


AAMAS is the leading scientific conference for research in autonomous agents and multi-agent systems. The AAMAS conference series was initiated in 2002 as the merging of three respected scientific meetings: the International Conference on Multi-Agent Systems (ICMAS), the International Workshop on Agent Theories, Architectures, and Languages (ATAL), and the International Conference on Autonomous Agents (AA). 

While game-theoretic models and algorithms have been developed to combat illegal activities, such as poaching and over-fishing, in green security domains, none of the existing work considers the crucial aspect of community engagement: community members are recruited by law enforcement agencies as informants and can provide valuable tips, e.g., the location of ongoing illegal activities, to assist patrols. We fill this gap and (i) introduce a novel two-stage security game model for community engagement, with a bipartite graph representing the informant-attacker social network and a level-$\kappa$ response model for attackers inspired by cognitive hierarchy; (ii) provide complexity results and exact, approximate, and heuristic algorithms for selecting informants and allocating patrollers against level-$\kappa$ ($\kappa<\infty$) attackers; (iii) provide a novel algorithm to find the optimal defender strategy against level-$\infty$ attackers, which converts the problem of optimizing a parameterized fixed-point to a bi-level optimization problem, where the inner level is just a linear program, and the outer level has only a linear number of variables and a single linear constraint. We also evaluate the algorithms through extensive experiments.

Green Security Game with Community Engagement

We study a general multi-dueling bandit problem, where an agent compares multiple options simultaneously and aims to minimize the regret due to selecting suboptimal arms. This setting generalizes the traditional two-dueling bandit problem and finds many real-world applications involving subjective feedback on multiple options. We start with the two-dueling bandit setting and propose two efficient algorithms, DoublerBAI and MultiSBM-Feedback.DoublerBAI provides a generic schema for translating known results on best arm identification algorithms to the dueling bandit problem, and achieves a regret bound of $O(\ln T)$. MultiSBM-Feedback not only has an optimal $O(\ln T)$ regret, but also reduces the constant factor by almost a half compared to benchmark results. Then, we consider the general multi-dueling case and develop an efficient algorithm MultiRUCB. Using a novel finite-time regret analysis for the general multi-dueling bandit problem, we show that MultiRUCB also achieves an $O(\ln T)$ regret bound and the bound tightens as the capacity of the comparison set increases. Based on both synthetic and real-world datasets, we empirically demonstrate that our algorithms outperform existing algorithms. 


Dueling Bandits: From Two-dueling to Multi-dueling

Signalized intersections are managed by controllers that assign right of way (green, yellow, and red lights) to non-conflicting directions. Optimizing the actuation policy of such controllers is expected to alleviate traffic congestion and its adverse impact. Given such a safety-critical domain, the affiliated actuation policy is required to be interpretable in a way that can be understood and regulated by a human. This presentation demonstrates and gives an overview of how we approached solving such a problem.

Learning an Interpretable Traffic Signal Control Policy

Distributed Constraint Optimization Problems (DCOPs) are a powerful tool to model multi-agent coordination problems that are distributed by nature. The formulation is suitable for problems where variables are discrete and constraint utilities are represented in tabular form. However, many real-world applications have variables that are continuous and tabular forms thus cannot accurately represent constraint utilities. To overcome this limitation, researchers have proposed the Continuous DCOP (C-DCOP) model, which are DCOPs with continuous variables. But existing approaches usually come with some restrictions on the form of constraint utilities and are without quality guarantees. Therefore, in this paper, we (i) propose an exact algorithm to solve a specific subclass of C-DCOPs; (ii) propose an approximation method with quality guarantees to solve general C-DCOPs; (iii) propose additional C-DCOP algorithms that are more scalable; and (v) empirically show that our algorithms outperform existing state-of-the-art C-DCOP algorithms when given the same communication limitations.

New Algorithms for Continuous Distributed Constraint Optimization Problems

Recent years have witnessed a tremendous improvement of deep reinforcement learning. However, a challenging problem is that an agent may suffer from inefficient exploration, particularly for on-policy methods. Previous exploration methods either rely on complex structure to estimate the novelty of states, or incur sensitive hyper-parameters causing instability. We propose an efficient exploration method, Multi-Path Policy Optimization (MPPO), which does not incur high computation cost and ensures stability. MPPO maintains an efficient mechanism that effectively utilizes a population of diverse policies to enable better exploration, especially in sparse environments. We also give a theoretical guarantee of the stable performance. We build our scheme upon two widely-adopted on-policy methods, the Trust-Region Policy Optimization algorithm and Proximal Policy Optimization algorithm. We conduct extensive experiments on several MuJoCo tasks and their sparsified variants to fairly evaluate the proposed method. Results show that MPPO significantly outperforms state-of-the-art exploration methods in terms of both sample efficiency and final performance. 


Multi-Path Policy Optimization

In this work, we consider a student-project-resource matching-allocation problem, where students have preferences over projects and the projects have preferences over students. Although students are many-to-one matched to projects, indivisible resources are many-to-one allocated to projects whose capacities are endogenously determined by the resources allocated to them. Traditionally, this problem is decomposed into two separate problems: (1) resources are allocated to projects based on expectations (resource allocation problem), and (2) students are matched to projects based on the capacities determined in the previous problem (matching problem). Although both problems are well-understood, if the expectations used in the first are incorrect, we obtain a suboptimal outcome. Thus, this problem must be solved as a whole without dividing it in two parts. We show that no strategyproof mechanism satisfies fairness (i.e., no student has justified envy) and weak efficiency requirements on students' welfare. Given this impossibility result, we develop a new strategyproof mechanism that strikes a good balance between fairness and efficiency and assess it by experiments


Game Theoretic Analysis for Two-Sided Matching with Resource Allocation

The ability to cooperate is one of the key features of many multi- agent systems. In this paper, we extend the well-known model of graph-restricted games due to Myerson to signed graphs, where the link between any two players may be either positive or negative. Hence, in our model, it is possible to explicitly define not only that some players are friends (as in Myerson’s model) but also that some other players are enemies. As such our games can express a wider range of situations, e.g., animosities between political parties. We say that a coalition is feasible if every two players are connected by a path of positive edges and no two players are connected by a negative edge. We define the value for signed graph games using the axiomatic approach that closely follows the celebrated char- acterisation of the Myerson value. Furthermore, we propose an algorithm for computing an arbitrary semivalue, including the one proposed by us. Moreover, we consider signed graph games with a priori defined alliances (unions) between players and propose an algorithm for the extension of the Owen value to this setting.


Signed Graph Games: Coalitional Games with Friends, Enemies and Allies

We study the well-known Sequential Posted Pricing scheme with one item, under the Bayesian setting that the value of each participating agent to the item is drawn from her own value distribution, which is known to the auctioneer as prior information. Each agent comes in to the auction market sequentially, and is offered a take-it-or-leave-it price. The goal of the auctioneer is to maximize her expected revenue.
This family of mechanisms has been proved to perform well compared to optimal mechanism under the Bayesian framework in various settings, but nothing was previously known on the complexity of computing an optimal sequential posted pricing. 

In this paper, we show that finding an optimal sequential posted pricing is NP-complete even when the value distributions are of support size three. For the upper bound, we introduce polynomial-time algorithms when the distributions are of support size at most two, or their values are drawn from any identical distributions. As a by-product, we also show the same results hold for order-oblivious posted pricing scheme where after the auctioneer posts the prices, agents come into the auction in an adversarial order.
We also study the constrained sequential posted pricing where the auction only runs for a fixed number of $\tau$ rounds, and give polynomial-time algorithms when the distributions are of support size at most two. Moreover, we extend our algorithm to cases when the values are decayed with time or the item has several copies. To the best of our knowledge, this is the first result that fully characterizes the computational complexity of sequential posted pricing family. 


On the Complexity of Sequential Posted Pricing

The Belief Desire Intention (BDI) model of agency is a popular and mature paradigm for designing and implementing multiagent systems. There are several agent implementation platforms that follow the BDI model. In BDI systems, the agents typically have to pursue multiple goals, and often concurrently. The way in which the agents commit to achieving their goals forms their intentions. There has been much work on scheduling the intentions of agents. However, most of this work has focused on scheduling the intentions of a single agent with no awareness and consideration of other agents that may be operating in the same environment. They schedule the intentions of the single-agent in order to maximise the total number of goals achieved. In this work, we investigate techniques for scheduling the intentions of an agent in a multiagent setting, where an agent is aware (or partially aware) of the intentions of other agents in the environment. We use a Monte Carlo Tree Search (MCTS) based approach and show that our intention-aware scheduler generates better outcomes in cooperative, neutral (selfish) and adversarial settings than the state-of-the-art schedulers that do not consider other agents' intentions.

Intention-Aware Multiagent Scheduling

Threshold task games (TTGs) are a class of cooperative games in which participants form coalitions to complete tasks associated with different rewards and thresholds for success. We provide efficient algorithms for computing approximately optimal coalition structures in TTGs. We also present non-trivial bounds on the cost of stability for this class. We put our theoretical results to practice; we design a web-based framework which allows human players to interact in a collaborative task-based model. Our analysis of human play in two different countries shows that players succeed in general to form optimal coalition structures, and converge to approximately stable payoff divisions.


Can Agents Learn by Analogy? An Inferable Model for PAC Reinforcement Learning

Downloads

Next from AAMAS 2020

Green Security Game with Community Engagement

Similar lecture

Spectral Feature Augmentation for Graph Contrastive Learning and Beyond

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Can Agents Learn by Analogy? An Inferable Model for PAC Reinforcement Learning

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAMAS 2020

Green Security Game with Community Engagement

Similar lecture

Spectral Feature Augmentation for Graph Contrastive Learning and Beyond

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads