The training speed becomes slower as the replay memory of transitions grows

My DQN program runs slower as more transitions are stored in the replay buffer.

  1. It is not because of the sampling speed from the replay buffer, and it is the training speed that becomes slower, which involves intensive tensor calculations
  2. For instance, when the system memory used for the thread is only 10% with still 50% unused memory (32G memory), the time for training increases by around 50%.
  3. The problem exists on both CPU and GPU platforms.
  4. The replay buffer only stores Numpy arrays
  5. The training time will stop increasing once the replay buffer is full and starts to discard old transitions.
    Is there anyone meeting the problem before? I don’t find any solution to the problem on Google. Thank you!

Hi Jemmie

Sorry but without having a look at the code it is hard to tell anything. Is the code multi-threaded / multi-processed? Perhaps you could try to use a replay buffer where things are stored contiguously, I’ve found that this can help in the past (see our ReplayBuffer in torchrl which works with contiguous memory allocations through LazyMemmapStorage and LazyTensorStorage.

Oh, I hadn’t noticed this. That would have saved me some time! Is there also an equivalent lib in c++?

Not yet, feel free to submit an issue with the description of what you’d like to see and we’ll give it a shot if we can!

1 Like