
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Autonomous agents often need to decide between choosing actions that are familiar and have previously yielded positive results (exploitation) and seeking new information that could help uncover more effective actions (exploration). We present an “observe or bet” task that separates “pure exploration” from “pure exploitation”: 61 five-to-seven-year-old children, 60 adults and computational agents have to either observe an outcome without reward, or to bet on an action without immediate feedback at varying probability levels. Their performances were measured against solutions to the partially observable Markov decision process and meta-RL models. Children and adults tended to choose observation more than both algorithm classes would suggest. Children also modulated their betting policy based on the probability structure and amount of evidence, exhibiting “hedging behavior” a strategy not evident in standard bandit tasks. The results provide a benchmark for reasoning about reward and information in humans and neural network models.
Authors:
Eunice Yiu: UC Berkeley; Kai J Sandbrink: University of Oxford; Alison Gopnik: University of California at Berkeley
