Reinforcement Learning Part 9: Actor-Critic Methods
In the last part we introduced Policy Gradient (PG) methods. We saw how we can identify our redefined optimal policies by approximating the policy as a neural network and using MC Learning under the algorithm REINFORCE. We also spoke about some of the limitations of traditional PG methods. In this part, we will discuss how we can use PG methods with TD Learning, and how we can overcome some of these limitations by using various algorithms built on top of PG methods.
Actor-Critic (AC) methods are a type of PG methods that approximate both the policy as well as the value...
Actor-Critic (AC) methods are a type of PG methods that approximate both the policy as well as the value...





















