Strategic Experimentation with Humped Bandits
Abstract: Models of learning and experimentation based on two-armed Poisson bandits addressed several important aspects related to strategic and motivational learning, but they are not suitable to study effects that accumulate over time.
We propose a new class of models of strategic experimentation which are almost as tractable as exponential models, but incorporate such realistic features as dependence of the expected rate of news arrival on the time elapsed since the start of an experiment. We show that, in these models, the experiment is stopped before news is realized whenever the rate of arrival of news reaches a critical level. This leads to longer experimentation times for experiments with possible breakthroughs than for equivalent experiments with failures. We show that, in experimentation models with multiple players, either no player stops before the first failure is observed, or all players stop simultaneously before the first failure. We also demonstrate a crowding out effect in models with profitable breakthroughs.