What tolerance value should I use for sparse reward environments #3

RewardGuard · 2026-04-26T04:35:16Z

RewardGuard
Apr 26, 2026
Maintainer

What tolerance value should I use for sparse reward environments?
I'm training an agent on a navigation task where rewards are very sparse — the agent only gets a reward signal when it reaches the goal, which happens maybe once every 200–300 steps. Most steps have reward=0 for most components.
When I set up RewardGuard with the default tolerance=5.0, it flags almost every window as critical even when training looks healthy. Is that expected? What tolerance should I use for sparse reward settings?

RewardGuard · 2026-04-26T04:36:22Z

RewardGuard
Apr 26, 2026
Maintainer Author

Great question — this is a common gotcha with sparse reward setups.
The default tolerance=5.0 (±5 percentage points) is designed for dense reward environments where every step contributes signal. In sparse settings, the rolling window captures mostly zeros, which makes the percentage distribution noisy and unstable — so yes, false critical flags are expected.
Two things to adjust:

Increase tolerance:
pythonmonitor = rg.Monitor(
expected={"task": 0.7, "safety": 0.3},
tolerance=15.0, # more lenient for sparse rewards
window=500 # larger window to smooth out the zeros
)
Only call check() after episodes where a reward was actually received, not every step. That way the analysis reflects real signal, not noise from zero-reward steps.
For very sparse tasks I'd recommend tolerance=10.0–20.0 and window=500+ as a starting point, then tighten as your agent starts reaching the goal more consistently.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What tolerance value should I use for sparse reward environments #3

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

What tolerance value should I use for sparse reward environments #3

Uh oh!

RewardGuard Apr 26, 2026 Maintainer

Replies: 1 comment

Uh oh!

RewardGuard Apr 26, 2026 Maintainer Author

RewardGuard
Apr 26, 2026
Maintainer

RewardGuard
Apr 26, 2026
Maintainer Author