Lecture image placeholder

Premium content

Access to this content requires a subscription. You must be a premium user to view this content.

Monthly subscription - $9.99Pay per view - $4.99Access through your institutionLogin with Underline account
Need help?
Contact us
Lecture placeholder background

AAAI 2025

February 28, 2025

Philadelphia, United States

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

keywords:

online learning bandits

ml

In this paper, we investigate a variant of the classical stochastic Multi-armed Bandit (MAB) problem, where the payoff received by an agent (either cost or reward) is both delayed, and directly corresponds to the magnitude of the delay. This setting models faithfully many real world scenarios such as the time it takes for a data packet to traverse a network given a choice of route (where delay serves as the agent’s cost); or a user's time spent on a web page given a choice of content (where delay serves as the agent’s reward). Our main contributions are tight upper and lower bounds for both the cost and reward settings. For the case that delays serve as costs, which we are the first to consider, we prove optimal regret that scales as $\sum{i:\Delta_i > 0}\frac{\log T}{\Delta_i} + d^$ where $T$ is the number of rounds, $\Delta_i$ are the sub-optimality gaps and $d^$ is the minimal expected delay amongst arms. For the case that delays serves as rewards, we show optimal regret of $\sum{i:\Deltai > 0}\frac{\log T}{\Delta_i} + \bar{d}$ where $\bar d$ is the second maximal expected delay. These improve over the regret in the general delay-dependent payoff setting, which scales as $\sum{i:\Delta_i > 0}\frac{\log T}{\Delta_i} + D$, where $D$ is the maximum possible delay. Our regret bounds highlight the difference between the cost and reward scenarios, showing that the improvement in the cost scenario is more significant than for the reward. Finally, we accompany our theoretical results with an empirical evaluation.

Next from AAAI 2025

Qua2SeDiMo: Quantifiable Quantization Sensitivity of Diffusion Models
poster

Qua2SeDiMo: Quantifiable Quantization Sensitivity of Diffusion Models

AAAI 2025

+3Keith Mills
Keith Mills and 5 other authors

28 February 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2026 Underline - All rights reserved