Multi armed Bandits

Untitled

Types of bandits

Sequencially: take decision and observe rewards

Stationary: Her reward distrubution does not change

Nonstationary: Enviromen

t can change, but not as a consequence of actions

Stationary:

Untitled

Untitled

Untitled

Objective is to maximize reward over n time steps