Skip to content

__sK__ _RL: learning of v-values_ #101

@mauricerad

Description

@mauricerad

Create a training set for the ML method from sJ:

  1. sample trajectories from the current policy.
  2. create a training set and learn new parameters of ML (sJ)
    learning tuples (s,r), where r is the cumulative reward (paid only at end) and s is any state on the policy episode (rollout).
  3. goto 1.

step: 5

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions