Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
A fundamental challenge in visual reinforcement learning (RL) is achieving robust generalization across environments with varying visual distractions. Current RL methods struggle with generalization due to their inability to differentiate foreground and background features during augmentation,while their Q-consistency mechanisms rely on outdated actions from replay buffers that drift from the current policy.In this paper, we present PQDA, a novel framework that addresses generalization challenges in RL through two key innovations: (1) Foreground-Background Decoupled Augmentation leverages Gaussian mixture model-based segmentation to efficiently generate and cache masks in replay buffers, applying differentiated augmentation strategies to foreground and background regions, thereby enhancing data diversity while maintaining task-relevant features. (2) Policy-Aligned Q-Consistency enforces policy alignment by sampling actions from the current policy for Q-regularization, achieving faster and more stable convergence. Notably, PQDA eliminates auxiliary tasks entirely through a unified architecture that co-optimizes the encoder and RL components directly. Extensive experiments on DMControl benchmarks (including our newly proposed CVDMC benchmark) and robotic manipulation tasks demonstrate PQDA's superior generalization performance, outperforming state-of-the-art methods.The code and new CVDMC benchmark will be released to facilitate reproducibility.
