Conversation
|
Hi @lmBored ! The PR looks really good! I think at this point I won't be able to commit to maintaining and testing more code (as you can see by the failing tests, I barely have cycles to take care of the existing codebase :) ) I highly encourage you to start a fork and use this PR and similar issues to advertise your implementation! I have some people on discord who were asking about offline methods. I wonder what your results were like with DQN on Doom? I remember all these years ago I got much better results with A2C/PPO compared to DQN and it was one of the reasons for focusing purely on online algorithms here. Do you have any training curves or behavior videos? 🎮 |
|
Hi @alex-petrenko, thank you for your response and looking at the PR. I understand your point, I've also just realised how old sample factory is now :) Fitting an off-policy method to an asynchronous framework is quite troublesome, but being able to implement offline methods or model-based methods in the same fashion as sample factory would be rlly interesting. Making a fork is a good idea though, I'll consider that after I'm done with my current project. This is a run (in a scenario where agent runs forward while jumping and avoiding lava pits) with PPO, and this is with DQN. Here FPS of DQN is worse than PPO is because I'm doing 4 updates per batch. At 2 updates, it performs on par with PPO, and at 1 update, DQN performs better than PPO. |
In my project, Sample Factory was used in editable mode as a standalone folder, that's why all changes appear in a single commit in this PR (I merged the whole folder instead of cherry-picking)
Also pls note that some minor reformatting in
cfg.pyis included. These changes were applied automatically by my formatter, but they improve consistency with the surrounding code, so I kept them.Implementation:
Performance (Tested on Doom environments - single agent):
dqn_max_updates_per_batch=1: 12.5% faster than APPOdqn_max_updates_per_batch=4: Same speed as APPOConverges slower than APPOLimitations:
dqn_max_updates_per_batchExample train command:
python -m sf_examples.vizdoom.train_vizdoom \ --env=doom_battle \ --algo=DQN \ --use_rnn=False \ --train_for_env_steps=5000Let me know what you think about this approach!