Content not yet available
This lecture has no active video or poster.
Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Offline Meta-Reinforcement Learning (OMRL) leverages pre-collected data to adapt to new tasks. Context-based methods learn task representations from contexts. However, the context is influenced by both the task and the behavior policy. The mismatch between the behavior policy and the testing policy causes a context distribution shift problem, which results in poor task representations and degraded performance. This problem is exacerbated in settings with data limitations. To address this, we propose a novel approach called Meta-Normalizing Flow (Meta-NF). First, it employs a highly expressive and sample-efficient normalizing flow policy. Second, it incorporates a metric for testing-time task representation selection to effectively mitigate the context shift problem. Empirical results demonstrate that Meta-NF outperforms existing OMRL methods, with both components contributing to its strong performance.
