feat: add DQN/IDQN by lmBored · Pull Request #331 · alex-petrenko/sample-factory

lmBored · 2026-01-29T22:52:10Z

In my project, Sample Factory was used in editable mode as a standalone folder, that's why all changes appear in a single commit in this PR (I merged the whole folder instead of cherry-picking)

Also pls note that some minor reformatting in cfg.py is included. These changes were applied automatically by my formatter, but they improve consistency with the surrounding code, so I kept them.

Implementation:

Double DQN
Prioritized Experience Replay
Independent q_head for multi-agent
For async setting, I calculate the number of updates need to be made by subtracting full update debt (num_updates * steps_per_update), then cap the actual work accordingly
Disable sample actions

Performance (Tested on Doom environments - single agent):

dqn_max_updates_per_batch=1: 12.5% faster than APPO
dqn_max_updates_per_batch=4: Same speed as APPO
~~Converges slower than APPO~~

Limitations:

Requires tuning
Performance generally is bottlenecked by dqn_max_updates_per_batch

Example train command:

python -m sf_examples.vizdoom.train_vizdoom \
    --env=doom_battle \
    --algo=DQN \
    --use_rnn=False \
    --train_for_env_steps=5000

Let me know what you think about this approach!

alex-petrenko · 2026-02-11T04:41:23Z

Hi @lmBored ! The PR looks really good!

I think at this point I won't be able to commit to maintaining and testing more code (as you can see by the failing tests, I barely have cycles to take care of the existing codebase :) )

I highly encourage you to start a fork and use this PR and similar issues to advertise your implementation! I have some people on discord who were asking about offline methods.

I wonder what your results were like with DQN on Doom? I remember all these years ago I got much better results with A2C/PPO compared to DQN and it was one of the reasons for focusing purely on online algorithms here. Do you have any training curves or behavior videos? 🎮

lmBored · 2026-02-11T11:31:15Z

Hi @alex-petrenko, thank you for your response and looking at the PR. I understand your point, I've also just realised how old sample factory is now :)

Fitting an off-policy method to an asynchronous framework is quite troublesome, but being able to implement offline methods or model-based methods in the same fashion as sample factory would be rlly interesting. Making a fork is a good idea though, I'll consider that after I'm done with my current project.

This is a run (in a scenario where agent runs forward while jumping and avoiding lava pits) with PPO, and this is with DQN. Here FPS of DQN is worse than PPO is because I'm doing 4 updates per batch. At 2 updates, it performs on par with PPO, and at 1 update, DQN performs better than PPO.

lmBored force-pushed the DQN branch from 9a2a5c1 to e8beaa4 Compare January 29, 2026 23:08

feat: add DQN/IDQN

eab3f26

lmBored force-pushed the DQN branch from 42a0cb4 to eab3f26 Compare January 30, 2026 09:37

lmBored added 2 commits January 30, 2026 11:18

ci/cd: python 3.8 type hint

842bad5

feat: track gradient norm and env steps in DQN training stats

4687deb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add DQN/IDQN#331

feat: add DQN/IDQN#331
lmBored wants to merge 3 commits intoalex-petrenko:masterfrom
lmBored:DQN

lmBored commented Jan 29, 2026 •

edited

Loading

Uh oh!

alex-petrenko commented Feb 11, 2026

Uh oh!

lmBored commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lmBored commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation:

Performance (Tested on Doom environments - single agent):

Limitations:

Example train command:

Uh oh!

alex-petrenko commented Feb 11, 2026

Uh oh!

lmBored commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lmBored commented Jan 29, 2026 •

edited

Loading