Hi there,
Thanks for sharing your repo, it's helping me greatly explore the field. I have a question I'm not sure the answer of. In this implementation I believe you have implemented the V(s) function and therefore have a parameterised value function. I have read that a vanilla implementation of the reinforce algorithm does not parameterise any value function, and instead has only a single parameterisation network that maps states to actions. Am I wrong, or is this more of an implementation of an actor-critic algorithm considering the dual networks?
Cheers,
Laurence