United States

Reward-biased maximum likelihood estimation (RBMLE) is a classic principle in the adaptive control literature for tackling explore-exploit trade-offs. This paper studies the neural contextual bandit problem from a distributional perspective and proposes NeuralRBMLE, which leverages the likelihood of surrogate parametric distributions to learn the unknown reward distributions and thereafter adapts the RBMLE principle to achieve efficient exploration by properly adding a reward-bias term. NeuralRBMLE leverages the representation power of neural networks and directly encodes exploratory behavior in the parameter space, without constructing confidence intervals of the estimated rewards. We propose two variants of NeuralRBMLE algorithms: The first variant directly obtains the RBMLE estimator by gradient ascent, and the second variant simplifies RBMLE to a simple index policy through an approximation. We show that both algorithms achieve $\widetilde{\mathcal{O}}(\tilde{d}\sqrt{T})$ regret. Through extensive experiments, we demonstrate that the NeuralRBMLE algorithms achieve comparable or better empirical regrets than the state-of-the-art methods on real-world datasets with non-linear reward functions.

AAAI 2023

Reward-Biased Maximum Likelihood Estimation for Neural Contextual Bandits: A Distributional Learning Perspective

ml: reinforcement learning theory

ml: deep learning theory

ml: online learning & bandits

ml: reinforcement learning algorithms

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-23 is the Thirty-Seventh AAAI Conference on Artificial Intelligence. The theme of this conference is to create collaborative bridges within and beyond AI. Like previous AAAI conferences, AAAI-23 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and two new activities: a Bridge Program and a Lab Program. Many of these activities are tailored to the theme of bridges and all are selected according to the highest standards, with additional programs for students and young researchers. 
AAAI is providing you with a conference planner, which you can use to help organize your itinerary of activities. This includes talks to attend in person, talks to attend remotely, breaks with colleagues and your site seeing activities. To access this conference planner, please go to [https://aaai-2023.takemobi.io](https://aaai-2023.takemobi.io).

In order to access this site, you need to register. If you haven't already, please register [here](https://aaai.org/Conferences/AAAI-23/registration/).


The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines.

poster

Nested Rollout Policy Adaptation (NRPA) is an approach using online learning policies in a nested structure. It has achieved a great result in a variety of difficult combinatorial optimization problems. In this paper, we propose Meta-NRPA, which combines optimal stopping theory with NRPA for warm-starting and significantly improves the performance of NRPA. We also present several exploratory techniques for NRPA which enable it to perform better exploration. We establish this for three notoriously difficult problems ranging from telecommunication, transportation and coding theory namely Minimum Congestion Shortest Path Routing,  Traveling Salesman Problem with Time Windows and Snake-in-the-Box.
We also improve the lower bounds of the Snake-in-the-Box problem for multiple dimensions.


Warm-Starting Nested Rollout Policy Adaptation with Optimal Stopping

By adopting popular pixel-wise loss, existing methods for defocus deblurring heavily rely on well aligned training image pairs. Although training pairs of ground-truth and blurry images are carefully collected, e.g., DPDD dataset, misalignment
is inevitable between training pairs, making existing methods possibly suffer from deformation artifacts. In this paper, we propose a joint deblurring and reblurring learning (JDRL) framework for single image defocus deblurring with misaligned training pairs. Generally, JDRL consists of a deblurring module and a spatially invariant reblurring module, by which deblurred result can be adaptively supervised by ground-truth image to recover sharp textures while maintaining spatial consistency with the blurry image. First, in the deblurring module, a bi-directional optical flow-based deformation is introduced to tolerate spatial misalignment between deblurred and ground-truth images. Second, in the reblurring module, deblurred result is reblurred to be spatially aligned with blurry image, by predicting a set of isotropic blur kernels and weighting maps. Moreover, we establish a new single image defocus deblurring (SDD) dataset, further validating our JDRL and also benefiting future research. Our JDRL can be applied to boost defocus deblurring networks in terms of both quantitative metrics and visual quality on DPDD, RealDOF and our SDD datasets.

Learning Single Image Defocus Deblurring with Misaligned Training Pairs

Neural networks require careful weight initialization to prevent signals from exploding or vanishing.  Existing initialization schemes solve this problem in specific cases by assuming that the network has a certain activation function or topology.  It is difficult to derive such weight initialization strategies, and modern architectures therefore often use these same initialization schemes even though their assumptions do not hold. This paper introduces AutoInit, a weight initialization algorithm that automatically adapts to different neural network architectures.  By analytically tracking the mean and variance of signals as they propagate through the network, AutoInit appropriately scales the weights at each layer to avoid exploding or vanishing signals.  Experiments demonstrate that AutoInit improves performance of convolutional, residual, and transformer networks across a range of activation function, dropout, weight decay, learning rate, and normalizer settings, and does so more reliably than data-dependent initialization methods.  This flexibility allows AutoInit to initialize models for everything from small tabular tasks to large datasets such as ImageNet.  Such generality turns out particularly useful in neural architecture search and in activation function discovery.  In these settings, AutoInit initializes each candidate appropriately, making performance evaluations more accurate. AutoInit thus serves as an automatic configuration tool that makes design of new neural network architectures more robust. The AutoInit package provides a wrapper around TensorFlow models and is available at https://github.com/cognizant-ai-labs/autoinit.

AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks

 With the development of advanced driver assistance systems~(ADAS) and autonomous vehicles, conducting experiments in various scenarios becomes an urgent need. Although having been capable of synthesizing photo-realistic street scenes, conventional image-to-image translation methods cannot produce coherent scenes due to the lack of 3D information. In this paper, a large-scale neural rendering method is proposed to synthesize the autonomous driving scene~(READ), which makes it possible to generate large-scale driving scenes in real time on a PC through a variety of sampling schemes. In order to effectively represent driving scenarios, we propose an ω rendering network to learn neural descriptors from sparse point clouds. Our model can not only synthesize photo-realistic driving scenes but also stitch and edit them. The promising experimental results show that our model performs well in large-scale driving scenarios.


READ: Large-Scale Neural Scene Rendering for Autonomous Driving

Submodular maximization has attracted extensive attention due to its numerous applications in machine learning and artificial intelligence. Many real-world problems require maximizing multiple submodular objective functions at the same time. In such cases, a common approach is to select a representative subset of Pareto optimal solutions with different trade-offs among multiple objectives. To this end, in this paper, we investigate the regret ratio minimization (RRM) problem in multi-objective submodular maximization, which aims to find at most k solutions to best approximate all Pareto optimal solutions w.r.t. any linear combination of objective functions. We propose a novel HS-RRM algorithm by transforming RRM into HittingSet problems based on the notions of $\epsilon$-kernel and $\delta$-net, where any $\alpha$-approximation algorithm for single-objective submodular maximization is used as an oracle. We prove that the maximum regret ratio (MRR) of the output of HS-RRM is bounded by $1 - \alpha + O\big( (k-d)^{-\frac{2}{d-1}} \big)$, where d is the number of objectives, which improves upon the previous best-known bound of $1 - \alpha + O\big( (k-d)^{-\frac{1}{d-1}} \big)$ and is nearly asymptotically optimal for any fixed d. Experiments on real-world and synthetic data confirm that HS-RRM achieves lower MRRs than existing algorithms.

Improved Algorithm for Regret Ratio Minimization in Multi-Objective Submodular Maximization

Privacy in AI remains a topic that draws attention from researchers and the general public in recent years. As one way to implement privacy-preserving AI, differentially private learning is a framework that enables AI models to use differential privacy (DP). To achieve DP in the learning process, existing algorithms typically limit the magnitude of gradients with a constant clipping, which requires carefully tuned due to its significant impact on model performance. As a solution to this issue, latest works NSGD and Auto-S innovatively propose to use normalization instead of clipping to avoid hyperparameter tuning. However, normalization-based approaches like NSGD and Auto-S rely on a monotonic weight function, which imposes excessive weight on small gradient samples and introduces extra deviation to the update. In this paper, we propose a Differentially Private Per-Sample Adaptive Clipping (DP-PSAC) algorithm based on a non-monotonic adaptive weight function, which guarantees privacy without the typical hyperparameter tuning process of using a constant clipping while significantly reducing the deviation between the update and true batch-averaged gradient. We provide a rigorous theoretical convergence analysis and show that with convergence rate at the same order, the proposed algorithm achieves a lower non-vanishing bound, which is maintained over training iterations, compared with NSGD/Auto-S.  In addition, through extensive experimental evaluation, we show that DP-PSAC outperforms or matches the state-of-the-art methods on multiple main-stream vision and language tasks. 

Differentially Private Learning with Per-Sample Adaptive Clipping

Jigsaw puzzle solving has recently become an emerging research area. The developed techniques have been widely used in applications beyond puzzle solving. This paper focuses on solving Jigsaw Puzzles with Large Eroded Gaps (JPwLEG). We formulate the puzzle reassembly as a combinatorial optimization problem and propose a Siamese-Discriminant Deep Reinforcement Learning (SD$^2$RL) to solve it. A Deep Q-network (DQN) is designed to visually understand the puzzles, which consists of two sets of Siamese Discriminant Networks, one set to perceive the pairwise relations between vertical neighbors and another set for horizontal neighbors. The proposed DQN considers not only the evidence from the incumbent fragment but also the support from its four neighbors. The DQN is trained using replay experience with carefully designed rewards to guide the search for a sequence of fragment swaps to reach the correct puzzle solution. Two JPwLEG datasets are constructed to evaluate the proposed method, and the experimental results show that the proposed SD$^2$RL significantly outperforms state-of-the-art methods.

Siamese-Discriminant Deep Reinforcement Learning for Solving Jigsaw Puzzles with Large Eroded Gaps

Constructing useful representations across a large number of tasks is a key requirement for sample-efficient intelligent systems. A traditional idea in multitask learning (MTL) is building a shared representation across tasks which can then be adapted to new tasks by tuning last layers. A desirable refinement of using a shared one-fits-all representation is to construct task-specific representations. To this end, recent PathNet/muNet architectures represent individual tasks as pathways within a larger supernet. The subnetworks induced by pathways can be viewed as task-specific representations that are composition of modules within supernet's computation graph. This work explores the pathways proposal from the lens of statistical learning: We first develop novel generalization bounds for empirical risk minimization problems learning multiple tasks over multiple paths (Multipath MTL). In conjunction, we formalize the benefits of resulting multipath representation when adapting to new downstream tasks. Our bounds are expressed in terms of Gaussian complexity, lead to tangible guarantees for the class of linear representations, and provide novel insights into the quality and benefits of a multipath representation. When computation graph is a tree, Multipath MTL hierarchically clusters the tasks and builds cluster-specific representations. We provide further discussion and experiments for hierarchical MTL and rigorously identify the conditions under which Multipath MTL is provably superior to traditional MTL approaches with shallow supernets.

Provable Pathways: Learning Multiple Tasks over Multiple Paths

While person Re-identification (Re-ID) has progressed rapidly due to its wide real-world applications, it also causes severe risks of leaking personal information from training data. Thus, this paper focuses on quantifying this risk by membership inference (MI) attack. Most of the existing MI attack algorithms focus on classification models, while Re-ID follows a totally different training and inference paradigm. Re-ID is a fine-grained recognition task with complex feature embedding, and model outputs commonly used by existing MI like logits and losses are not accessible during inference. Since Re-ID focuses on modelling the relative relationship between image pairs instead of individual semantics, we conduct a formal and empirical analysis which validates that the distribution shift of the inter-sample similarity between training and test set is a critical criterion for Re-ID membership inference. As a result, we propose a novel membership inference attack method based on the inter-sample similarity distribution. Specifically, a set of anchor images are sampled to represent the similarity distribution conditioned on a target image, and a neural network with a novel anchor selection module is proposed to predict the membership of the target image. Our experiments validate the effectiveness of the proposed approach on both the Re-ID task and conventional classification task.

Downloads

Next from AAAI 2023

Warm-Starting Nested Rollout Policy Adaptation with Optimal Stopping

Similar lecture

Online Reinforcement Learning with Uncertain Episode Lengths

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES