Is this a reinforce implementation or actually AC2 given the second network?

Hi there,

Thanks for sharing your repo, it's helping me greatly explore the field. I have a question I'm not sure the answer of. In this implementation I believe you have implemented the V(s) function and therefore have a parameterised value function. I have read that a vanilla implementation of the reinforce algorithm does not parameterise any value function, and instead has only a single parameterisation network that maps states to actions. Am I wrong, or is this more of an implementation of an actor-critic algorithm considering the dual networks?

Cheers,

Laurence

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is this a reinforce implementation or actually AC2 given the second network? #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Is this a reinforce implementation or actually AC2 given the second network? #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions