DQN slower on GPU than CPU (tested on Breakout)

Dear all,

I’m both new to pyTorch and RL in general.

But recently I started to re-implement some of the most famous works.
I followed the tutorial of Denny Britz, but I used PyTorch to make it more interesting.

I found out that my implementation is much faster when run on the CPU than on the GPU, which is strange.
I’m not sure, but I suppose that this is cause by the fact that I need to move the GPU memory many times to the CPU to sample some actions.
You can find my code in the git repo: https://github.com/andompesta/MLTutorials/tree/master/RL/DeepQLearning

Is my intuition right? do you have nay suggestions? More importantly I found that there is no equivalent in pyTorch fo the function np.random.choice. Have anyone implemented it in torch?

Thanks in advance.


it’s possible that:

  • your model is very small
  • your GPU is not the fastest model

either of these could explain the slowness.