United States

Action-constrained reinforcement learning (RL) is a widely-used approach in various real-world applications, such as scheduling in networked systems with resource constraints and control of a robot with kinematic constraints. While the existing projection-based approaches ensure zero constraint violation, they could suffer from the zero-gradient problem due to the tight coupling of the policy gradient and the projection, which results in sample-inefficient training and slow convergence. To tackle this issue, we propose a learning algorithm that decouples the action constraints from the policy parameter update by leveraging state-wise Frank-Wolfe and a regression-based policy update scheme. Moreover, we show that the proposed algorithm enjoys convergence and policy improvement properties in the tabular case as well as generalizes the popular DDPG algorithm for action-constrained RL in the general case. Through experiments, we demonstrate that the proposed algorithm significantly outperforms the benchmark methods on a variety of control tasks.

UAI 2021

Escaping from Zero Gradient: Revisiting Action-Constrained Reinforcement Learning via Frank-Wolfe Policy Optimization

continuous control

frank-wolfe policy optimization

action-constrained reinforcement learning

[![](https://assets.underline.io/uploads/markdown_image/1/image/21d13e03491d7e9acf6b8c46c7979c25.png)](https://docs.google.com/spreadsheets/d/1SCDDfkQk7es4J6RULKHVx1RzbB0Mutla--Sle5SVfb0/edit?usp=sharing)

**RoboCup 2021 will be Worldwide!**

We are excited to announce that RoboCup 2021 is a fully remote event with RoboCup competitions and activities taking place all over the world. Everyone is invited to participate in, or observe, the [competition leagues](https://2021.robocup.org/leagues) and the [symposium](https://2021.robocup.org/symposium)!

**RoboCup Symposium**

The 24th RoboCup International Symposium will be held in conjunction with RoboCup 2021 in a purely online setting. Scientific papers will be presented reporting innovative, original research with relevance to robotics and artificial intelligence. Within the described scope of topics we also encouraged submissions of high-quality overview articles, papers describing real-world research, and papers reporting theoretical results. In addition to the main track with regular research papers, there will be a special track focused on systems, data, and benchmarks featuring papers describing novel, open-source hardware and software systems, as well as datasets and benchmarks relevant to the community. In addition to research paper presentations, the RoboCup Symposium 2021 includes keynote talks by Dieter Fox, Jean-Paul Laumond, and Stefanie Tellex, and several interactive sessions.

**RoboCup Introduction Video**
<iframe width="995" height="516" src="https://www.youtube.com/embed/Cnh43cCsDKU" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>


Registration is required for entrance into RoboCup 2021. <br><br>
**Fees:** 
* Major Faculty or Symposium Faculty - US$50
* Major Student, Junior Mentor, Junior Student, or Symposium Student - US$20 

***Members of qualified teams:** Please contact your league for your registration password and proceed to register with your team. *<br>
***All non-team members:** Please register as "Symposium Only".*

You need to be registered to explore this event!

RoboCup 2021

The 24th RoboCup International Symposium will be held in conjunction with RoboCup 2021 in a purely online setting. Scientific papers will be presented reporting innovative, original research with relevance to robotics and artificial intelligence. Within the described scope of topics we also encouraged submissions of high-quality overview articles, papers describing real-world research, and papers reporting theoretical results. In addition to the main track with regular research papers, there will be a special track focused on systems, data, and benchmarks featuring papers describing novel, open-source hardware and software systems, as well as datasets and benchmarks relevant to the community. In addition to research paper presentations, the RoboCup Symposium 2021 includes keynote talks by Dieter Fox, Jean-Paul Laumond, and Stefanie Tellex, and several interactive sessions.

technical paper

The Conference on Uncertainty in Artificial Intelligence (UAI) is one of the premier international conferences on research related to knowledge representation, learning, and reasoning in the presence of uncertainty. It is supported by the [Association for Uncertainty in Artificial Intelligence (AUAI).](https://www.auai.org/)

The 37th Conference on Uncertainty in Artificial Intelligence (UAI 2021) is held fully online. We have an exciting program consisting of keynote talks by [Susan Murphy](https://www.auai.org/uai2021/program?block=A), [Eric Horvitz](https://www.auai.org/uai2021/program?block=B), [Lenka Zdeborová](https://www.auai.org/uai2021/program?block=C), [Judea Pearl](https://www.auai.org/uai2021/program?block=D) and [Ankur Moitra](https://www.auai.org/uai2021/program?block=E), a town hall meeting, peer-reviewed papers, and three workshops. The main conference is single-track and each paper is show-cased with a live component (either a live discussion and Q&A, or a live lightning talk). All papers are also featured in video presentations and in poster sessions in gather.town. The three parallel workshops on Friday allow for smaller meetings on current topics.

The walkthrough video below shows how you can navigate this site and use the different possibilities for interaction: UAIsland2021 on gather.town for posters and social interactions, the mementor platform for virtual mentoring sessions, and the chat and Q&A systems for discussions about talks and posters.

Have a great UAI 2021! 
<br>
<br>Cassio de Campos and Marloes Maathuis
<br>UAI 2021 Program Chairs
<br>uai2021programchairs@gmail.com

<figure class="video_container">
  <iframe src="https://screencast-o-matic.com/watch/cri0lsViAGK?v=6&ff=1&title=0&controls=1" width=526  height=226  frameborder="0" allowfullscreen="true"> </iframe>
</figure>


A ticket is required to access this conference. Please use the provided link to register for the event.

https://auai.org/uai2021/registration

The Conference on Uncertainty in Artificial Intelligence (UAI) is one of the premier international conferences on research related to knowledge representation, learning, and reasoning in the presence of uncertainty. It is supported by the [Association for Uncertainty in Artificial Intelligence (AUAI).](https://www.auai.org/)

Green security domains feature defenders who plan patrols in the face of uncertainty about the adversarial behavior of poachers, illegal loggers, and illegal fishers. Importantly, the deterrence effect of patrols on adversaries' future behavior makes patrol planning a sequential decision-making problem. Therefore, we focus on robust sequential patrol planning for green security following the minimax regret criterion, which has not been considered in the literature. We formulate the problem as a game between the defender and nature who controls the parameter values of the adversarial behavior and design an algorithm MIRROR to find a robust policy. MIRROR uses two reinforcement learning-based oracles and solves a restricted game considering limited defender strategies and parameter values. We evaluate MIRROR on real-world poaching data.

Robust Reinforcement Learning Under Minimax Regret for Green Security

High sample complexity remains a barrier to the application of reinforcement learning (RL), particularly in multi-agent systems. A large body of work has demonstrated that exploration mechanisms based on the principle of optimism under uncertainty can significantly improve the sample efficiency of RL in single agent tasks. This work seeks to understand the role of optimistic exploration in non-cooperative multi-agent settings. We will show that, in zero-sum games, optimistic exploration can cause the learner to waste time sampling parts of the state space that are irrelevant to strategic play, as they can only be reached through cooperation between both players. To address this issue, we introduce a formal notion of strategically efficient} exploration in Markov games, and use this to develop two strategically efficient learning algorithms for finite Markov games. We demonstrate that these methods can be significantly more sample efficient than their optimistic counterparts.

Strategically Efficient Exploration in Competitive Multi-agent Reinforcement Learning

Max-Pooling operations are a core component of deep learning architectures. In particular, they are part of most convolutional architectures used in machine vision, since pooling is a natural approach to pattern detection problems. However, these architectures are not well understood from a theoretical perspective. For example, we do not understand when they can be globally optimized, and what is the effect of over-parameterization on generalization. Here we perform a theoretical analysis of a convolutional max-pooling architecture, proving that it can be globally optimized, and can generalize well even for highly over-parameterized models. Our analysis focuses on a data generating distribution inspired by pattern detection problem, where a ``discriminative'' pattern needs to be detected among ``spurious'' patterns. We empirically validate that CNNs significantly outperform fully connected networks in our setting, as predicted by our theoretical results.

An Optimization and Generalization Analysis for Max-Pooling Networks

Reinforcement learning policies based on deep neural networks are vulnerable to imperceptible adversarial perturbations to their inputs, in much the same way as neural network image classifiers. Recent work has proposed several methods to improve the robustness of deep reinforcement learning agents to adversarial perturbations based on training in the presence of these imperceptible perturbations (i.e. adversarial training). In this paper, we study the effects of adversarial training on the neural policy learned by the agent. In particular, we follow two distinct parallel approaches to investigate the outcomes of adversarial training on deep neural policies based on worst-case distributional shift and feature sensitivity. For the first approach, we compare the Fourier spectrum of minimal perturbations computed for both adversarially trained and vanilla trained neural policies. Via experiments in the OpenAI Atari environments we show that minimal perturbations computed for adversarially trained policies are more focused on lower frequencies in the Fourier domain, indicating a higher sensitivity of these policies to low frequency perturbations. For the second approach, we propose a novel method to measure the feature sensitivities of deep neural policies and we compare these feature sensitivity differences in state-of-the-art adversarially trained deep neural policies and vanilla trained deep neural policies. We believe our results can be an initial step towards understanding the relationship between adversarial training and different notions of robustness for neural policies.

Investigating Vulnerabilities of Deep Neural Policies

Inspired by recent developments on meta-learning with linear contextual bandit tasks, we study the benefit of feature learning in both the multi-task and meta-learning settings.

Multi-Task and Meta-Learning with Sparse Linear Bandits

Stochastic gradient MCMC methods, such as stochastic gradient Langevin dynamics (SGLD), employ fast but noisy gradient estimates to enable large-scale posterior sampling. Although we can easily extend SGLD to distributed settings, it suffers from two issues when applied to federated non-IID data. First, the variance of these estimates increases significantly. Second, delaying communication causes the Markov chains to diverge from the true posterior even for very simple models. To alleviate both these problems, we propose conducive gradients, a simple mechanism that combines local likelihood approximations to correct gradient updates. Notably, conducive gradients are easy to compute, and since we only calculate the approximations once, they incur negligible overhead. We apply conducive gradients to distributed stochastic gradient Langevin dynamics (DSGLD) and call the resulting method federated stochastic gradient Langevin dynamics (FSGLD). We demonstrate that our approach can handle delayed communication rounds, converging to the target posterior in cases where DSGLD fails. We also show that FSGLD outperforms DSGLD for non-IID federated data with experiments on metric learning and neural networks.

Federated Stochastic Gradient Langevin Dynamics

Error-Correcting Output Codes (ECOCs) offer a principled approach for combining binary classifiers into multiclass classifiers. In this paper, we study the problem of designing optimal ECOCs to achieve both nominal and adversarial accuracy using Support Vector Machines (SVMs) and binary deep neural networks. We develop a scalable Integer Programming (IP) formulation to design minimal codebooks with desirable error correcting properties. Our work leverages the advances in IP solution techniques to generate codebooks with optimality guarantees. To achieve tractability, we exploit the underlying graph-theoretic structure of the constraint set. Particularly, the size of the constraint set can be significantly reduced using edge clique covers. Using this reduction technique along with Plotkin's bound in coding theory, we demonstrate that our approach is scalable to a large number of classes. The resulting codebooks achieve a high nominal accuracy relative to standard codebooks (e.g., one-vs-all, one-vs-one, and dense/sparse codes). Interestingly, our codebooks provide non-trivial robustness to white-box attacks without any adversarial training.

Integer Programming-based Error-Correcting Output Code Design for Robust Classification

Despite their numerous successes, there are many scenarios where adversarial risk metrics do not provide an appropriate measure of robustness. For example, test-time perturbations may occur in a probabilistic manner rather than being generated by an explicit adversary, while the poor train--test generalization of adversarial metrics can limit their usage to simple problems. Motivated by this, we develop a probabilistic robust risk framework, the statistically robust risk (SRR), which considers pointwise corruption distributions, as opposed to worst-case adversaries. The SRR provides a distinct and complementary measure of robust performance, compared to natural and adversarial risk. We show that the SRR admits estimation and training schemes which are as simple and efficient as for the natural risk: these simply require noising the inputs, but with principled derivation for exactly how and why this should be done. Furthermore, we demonstrate both theoretically and experimentally that it can provide superior generalization performance compared with adversarial risks, enabling application to high-dimensional datasets.

Statistically Robust Neural Network Classification

In reinforcement learning, agents that consider the context, or current state, when selecting source policies for transfer have been shown to outperform context-free approaches. However, existing approaches suffer from limitations, including sensitivity to sparse or delayed rewards and estimation errors in value functions. One important insight is that explicit learned models of the source dynamics, when available, could benefit contextual transfer in such settings. In this paper, we assume a family of tasks with shared sub-goals but different dynamics, and availability of estimated dynamics and policies for source tasks. To deal with possible estimation errors in dynamics, we introduce a novel Bayesian mixture-of-experts for learning state-dependent beliefs over source task dynamics that match the target dynamics using state transitions collected from the target task. The mixture is easy to interpret, demonstrates robustness to estimation errors in dynamics, and is compatible with most learning algorithms. We incorporate it into standard policy reuse frameworks and demonstrate its effectiveness on benchmarks from OpenAI gym.

Contextual Policy Transfer in Reinforcement Learning Domains via Deep Mixtures-of-Experts

A variety of dimensionality reduction techniques have been applied for computations involving large matrices. The underlying matrix is randomly compressed into a smaller one, while approximately retaining many of its original properties. As a result, much of the expensive computation can be performed on the small matrix. The sketching of positive semidefinite (PSD) matrices is well understood, but there are many applications where the related matrices are not PSD, including Hessian matrices in non-convex optimization and covariance matrices in regression applications involving complex numbers. In this paper, we present novel dimensionality reduction methods for non-PSD matrices, as well as their ``square-roots", which involve matrices with complex entries. We show how these techniques can be used for multiple downstream tasks. In particular, we show how to use the proposed matrix sketching techniques for both convex and non-convex optimization, $\ell_p$-regression for every $1 \leq p \leq \infty$, and vector-matrix-vector queries.

Downloads

Next from UAI 2021

Robust Reinforcement Learning Under Minimax Regret for Green Security

Similar lecture

rSoccer: A Framework for Studying Reinforcement Learning in Small and VerySmall Size Robot Soccer

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES