Low Seaquest avg score compared to A3C

Looking at a handful A3C implementations and results on Seaquest, they appear to score around 50K:
- https://gym.openai.com/evaluations/eval_pjjgc9POQJK4IuVw8nXlBw (ConvNet)
- https://gym.openai.com/evaluations/eval_uxYSMnhuTpCNLoPZ7DkxKQ (from https://github.com/dgriff777/rl_a3c_pytorch and with LSTM)

PAAC however, reaches a plateau around 2K according to our tests (similar to your paper). Visual inspection of the policy shows that the submarine does not resurface. While a common difficulty of the game, A3C appears to be able to overcome it (maybe this could be due to a modification in OpenAI Gym since their Atari setup has some differences with ALE).

We've looked at various explorations (e-greedy, boltzmann, bayesian dropout), with no improvement at the moment.

Do you seen any particular reason PAAC would underperform in this case ? LSTM might help, but from the two OpenAI Gym pointers above, it seems it should not be critical for Seaquest.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low Seaquest avg score compared to A3C #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Low Seaquest avg score compared to A3C #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions