My DQN program runs slower as more transitions are stored in the replay buffer.
It is not because of the sampling speed from the replay buffer, and it is the training speed that becomes slower, which involves intensive tensor calculations
For instance, when the system memory used for the thread is only 10% with still 50% unused memory (32G memory), the time for training increases by around 50%.
The problem exists on both CPU and GPU platforms.
The replay buffer only stores Numpy arrays
The training time will stop increasing once the replay buffer is full and starts to discard old transitions.
Is there anyone meeting the problem before? I don’t find any solution to the problem on Google. Thank you!
Sorry but without having a look at the code it is hard to tell anything. Is the code multi-threaded / multi-processed? Perhaps you could try to use a replay buffer where things are stored contiguously, I’ve found that this can help in the past (see our ReplayBuffer in torchrl which works with contiguous memory allocations through LazyMemmapStorage and LazyTensorStorage.