Skip to content

feat: add DQN/IDQN#331

Open
lmBored wants to merge 3 commits intoalex-petrenko:masterfrom
lmBored:DQN
Open

feat: add DQN/IDQN#331
lmBored wants to merge 3 commits intoalex-petrenko:masterfrom
lmBored:DQN

Conversation

@lmBored
Copy link
Copy Markdown
Contributor

@lmBored lmBored commented Jan 29, 2026

In my project, Sample Factory was used in editable mode as a standalone folder, that's why all changes appear in a single commit in this PR (I merged the whole folder instead of cherry-picking)

Also pls note that some minor reformatting in cfg.py is included. These changes were applied automatically by my formatter, but they improve consistency with the surrounding code, so I kept them.

Implementation:

  • Double DQN
  • Prioritized Experience Replay
  • Independent q_head for multi-agent
  • For async setting, I calculate the number of updates need to be made by subtracting full update debt (num_updates * steps_per_update), then cap the actual work accordingly
  • Disable sample actions

Performance (Tested on Doom environments - single agent):

  • dqn_max_updates_per_batch=1: 12.5% faster than APPO
  • dqn_max_updates_per_batch=4: Same speed as APPO
  • Converges slower than APPO

Limitations:

  • Requires tuning
  • Performance generally is bottlenecked by dqn_max_updates_per_batch

Example train command:

python -m sf_examples.vizdoom.train_vizdoom \
    --env=doom_battle \
    --algo=DQN \
    --use_rnn=False \
    --train_for_env_steps=5000

Let me know what you think about this approach!

@alex-petrenko
Copy link
Copy Markdown
Owner

Hi @lmBored ! The PR looks really good!

I think at this point I won't be able to commit to maintaining and testing more code (as you can see by the failing tests, I barely have cycles to take care of the existing codebase :) )

I highly encourage you to start a fork and use this PR and similar issues to advertise your implementation! I have some people on discord who were asking about offline methods.

I wonder what your results were like with DQN on Doom? I remember all these years ago I got much better results with A2C/PPO compared to DQN and it was one of the reasons for focusing purely on online algorithms here. Do you have any training curves or behavior videos? 🎮

@lmBored
Copy link
Copy Markdown
Contributor Author

lmBored commented Feb 11, 2026

Hi @alex-petrenko, thank you for your response and looking at the PR. I understand your point, I've also just realised how old sample factory is now :)

Fitting an off-policy method to an asynchronous framework is quite troublesome, but being able to implement offline methods or model-based methods in the same fashion as sample factory would be rlly interesting. Making a fork is a good idea though, I'll consider that after I'm done with my current project.

This is a run (in a scenario where agent runs forward while jumping and avoiding lava pits) with PPO, and this is with DQN. Here FPS of DQN is worse than PPO is because I'm doing 4 updates per batch. At 2 updates, it performs on par with PPO, and at 1 update, DQN performs better than PPO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants