02456week6 1 1 reinforcement learning
02456week6 1 2 reinforcement learning approaches
02456week6 2 1 AlphaGo policy and value networks
02456week6 2 2 AlphaGo steps 1 to 4
State
Action
Agent
Policy
Reward
Policy Gradients (PGs) vs Q-Learning (DQN)
Monte Carlo Tree Search
Deep Neural Network
Markov Decision Process (MDP)
Policy Network