Keywords: explore-exploit, stochastic Hopfield network, Thompson sampling, decision under uncertainty, brain-inspired algorithm, reinforcement learning
TL;DR: We demonstrate that a brain-inspired stochastic Hopfield network can achieve efficient, human-like, uncertainty-aware exploration in bandit and MDP tasks.
How to balance between exploration and exploitation in an uncertain environment is a central challenge in reinforcement learning. In contrast, humans and animals have demonstrated superior exploration efficiency in novel conditions. To understand how the brain’s neural network controls exploration under uncertainty, we analyzed the dynamical systems model of a biological neural network that controls explore-exploit decisions during foraging. Mathematically, this type of network (which is named the Brain Bandit Net, or BBN) is a special type of stochastic continuous Hopfield networks. We show through theory and simulation that BBN can perform posterior sampling of action values with a tunable bias towards or against uncertain options. We then demonstrate that, in multi-armed bandit (MAB) tasks, BBN can generate probabilistic choice behavior with an uncertainty bias in a way that resembles human and animal choice patterns. In addition to its high efficiency in MAB tasks, BBN can also be embedded with reinforcement learning algorithms to accelerate learning in MDP tasks. Altogether, our findings reveal the theoretical basis for efficient exploration in biological neural networks and proposes a general, brain-inspired algorithmic architecture for efficient exploration in RL.