Fix: correct unpacking order in play_one_step (truncated/info swapped) #216
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Chapter :18
Cell : Fixed Q-Value Targets , Double DQN , Dueling Double DQN
Play_one_step() returns (next_state, reward, done, truncated, info), but the training loops unpacked it as (obs, reward, done, info, truncated).
Fixed by using the correct order:
obs, reward, done, truncated, info = play_one_step(env, obs, epsilon)