Untitled

Untitled

Untitled

find optimal policy

a policy that acts greedy on an action value function will

Untitled

does converge, is a way to find the optimal epsilon greedy

sarsa can find a good policy without the state-action pair converge

Untitled

Untitled

If you apply a random policy will after enough steps converge to the optimal q-function bellmann optimally equation. This would not happen with sarsa.

Q learning vs Sarsa

Untitled