Sequencially: take decision and observe rewards
Stationary: Her reward distrubution does not change
Nonstationary: Enviromen
t can change, but not as a consequence of actions
Objective is to maximize reward over n time steps