AAAI 2026 Main Conference

January 23, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Offline reinforcement learning (RL) can learn policies from pre-collected offline datasets without interacting with the environment, but it suffers from the issue of out-of-distribution (OOD). Recent methods use the generative adversarial paradigm to learn policies, but easily fail to handle the conflict of fooling the discriminator and maximizing expected returns. In this paper, we propose a novel offline RL method named Distribution-Matching Generator-based Diffusion Policies (DMGDP). A distribution matching-based policy learning method is first developed, where the diffusion serves as the policy generator, to handle the conflict of fooling the discriminator and maximizing expected returns. Furthermore, a policy confidence mechanism based on discriminator regularization is designed to prevent the agent from taking OOD actions, with the aim of robust generative adversarial learning. We conducted extend experiments on the D4RL benchmarks, and the results demonstrate that DMGDP outperforms state-of-the-art methods.

Downloads

Paper

Next from AAAI 2026 Main Conference

Simulating Human-Like Counseling: A Path- and Scenario-Guided Framework for Psychological Support Dialogue
poster

Simulating Human-Like Counseling: A Path- and Scenario-Guided Framework for Psychological Support Dialogue

AAAI 2026 Main Conference

+3Fang Kong
Fang Kong and 5 other authors

23 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved