What tolerance value should I use for sparse reward environments #3
Unanswered
RewardGuard
asked this question in
Q&A
Replies: 1 comment
-
|
Great question — this is a common gotcha with sparse reward setups.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
What tolerance value should I use for sparse reward environments?
I'm training an agent on a navigation task where rewards are very sparse — the agent only gets a reward signal when it reaches the goal, which happens maybe once every 200–300 steps. Most steps have reward=0 for most components.
When I set up RewardGuard with the default tolerance=5.0, it flags almost every window as critical even when training looks healthy. Is that expected? What tolerance should I use for sparse reward settings?
Beta Was this translation helpful? Give feedback.
All reactions