TD Learning

TD can learn directly from experience without model

TD methods updates estimates based in part on other learned estimates

TD prediction

Advandtages of TD prediction methods

Untitled

TD can learn after each step, can learn from incomplete seqcuences, and work in non-terminating environments. MC cannot do any of this.

Untitled

Optimality of TD(0)

Untitled

Untitled

TD is better in Markov environments, MC is better in non Markov environments.

On policy approach - Sarsa (State action rewards state action)