Looking at a handful A3C implementations and results on Seaquest, they appear to score around 50K:
PAAC however, reaches a plateau around 2K according to our tests (similar to your paper). Visual inspection of the policy shows that the submarine does not resurface. While a common difficulty of the game, A3C appears to be able to overcome it (maybe this could be due to a modification in OpenAI Gym since their Atari setup has some differences with ALE).
We've looked at various explorations (e-greedy, boltzmann, bayesian dropout), with no improvement at the moment.
Do you seen any particular reason PAAC would underperform in this case ? LSTM might help, but from the two OpenAI Gym pointers above, it seems it should not be critical for Seaquest.
Looking at a handful A3C implementations and results on Seaquest, they appear to score around 50K:
PAAC however, reaches a plateau around 2K according to our tests (similar to your paper). Visual inspection of the policy shows that the submarine does not resurface. While a common difficulty of the game, A3C appears to be able to overcome it (maybe this could be due to a modification in OpenAI Gym since their Atari setup has some differences with ALE).
We've looked at various explorations (e-greedy, boltzmann, bayesian dropout), with no improvement at the moment.
Do you seen any particular reason PAAC would underperform in this case ? LSTM might help, but from the two OpenAI Gym pointers above, it seems it should not be critical for Seaquest.