Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Reinforcement Learning from Human Feedback (RLHF) is a methodology that aligns agent behavior with human preferences by utilizing signals such as scalar rewards, comparative preferences, or physical demonstrations from human users. We introduce NEURO-LOOP, a framework for Reinforcement Learning from Neural Feedback that leverages passive Brain-Computer Interfaces (BCI) to infer user assessments directly from brain activity. We present and release a novel dataset of functional near-infrared spectroscopy (fNIRS) recordings collected from 25 human participants observing agent behavior in three different domains: a Pick-and-Place Robot task, Lunar Lander, and Flappy Bird. We train classifiers to predict levels of agent performance (optimal, sub-optimal, or poor) from windows of preprocessed fNIRS feature vectors, achieving an average F1 score of 67% for binary classification and 46% for multi-class models across conditions. We also train regressors to predict the degree of deviation between the agent's chosen action and a set of near-optimal actions, providing a continuous measure of performance. To evaluate cross-subject generalization, we use a leave-one-subject-out approach to demonstrate that fine-tuning a pre-trained model with a small sample of witheld, subject-specific data increases average F1 scores by 17% for binary classification and 41% for multi-class models. Our work demonstrates that mapping implicit neural feedback to agent performance is feasible, laying the foundation for integrating brain data into future RLHF systems.
