1.Q-learning:Learning the optimal policy in MDP. 2.Q-learning work:learning the Q-values for each state-action pair 3.Value iteration:Q-learning iteravely updates the Q-value(Will be more clear in future I thought) 4.Q-table:store the Q-value for all state-action pair 5.Exploration:exploring the environment to find out infomation about it. 6.Exploitation:exploiting the information that is already known about(tip:epison greed)

When using Q-learning sometimes the algorithms with be very slow to converge



