Cuda out of memory DQN training

i am training my own DQN agent for a game i wrote in python, and the training loop works fine, but after about 150 episodes, cuda runs out of memory at optimize model function:

File “trainingAgent.py”, line 99, in
optimize_model()
File “trainingAgent.py”, line 67, in optimize_model
next_state_values[non_final_mask] = target_net(non_final_next_states).max(1)[0].detach()
File “C:…\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py”, line 489, in call
result = self.forward(*input, **kwargs)
File “C:…\DQN.py”, line 28, in forward
x = F.relu(self.bn3(self.conv3(x)))
File “C:…\Python\Python37\lib\site-packages\torch\nn\functional.py”, line 862, in relu
result = torch.relu(input)
RuntimeError: CUDA out of memory. Tried to allocate 18.13 MiB (GPU 0; 8.00 GiB total capacity; 4.81 GiB already allocated; 17.54 MiB free; 1.43 GiB cached)

So yeah I have 8GB gpu memory, it is a gaming gpu. I’m not sure if this is normal but I guess not. Is it because my input screen matrix is too large? (300x300) cause i dont subtract the last screen from the current scren like in the example DQN algorithm (though i dont know if this lowers the input size?), but just use the current screen as the input for the network. Or is it some kind of memory leak? what could i do to solve the issue? I would like to be able to run at least 2000 episodes, and possibly 10,000 episodes if needed

I think gradient accumulation is the source of this problem. Seeing that the training worked for some epochs and you got the error after that.

Refer to this pytorch documentation that can help if this is the case.

thanks for the answer, I am looking but I do not accumulate anything in my training loop, besides the frame count of the current game

My guess at the moment is the replayMemory memory, which at every frame pushes 2 states of my game and since each of my states are 300x300, the memory blows up

memory.push(state, action, next_state, reward)

edit: indeed, I commented this out, along with the optimizer function. Just commenting out the optimizer, i still ran out of memory, but once I commented out the memory push, my GPU memory was free for 1000 games. I guess I shall try to train using the difference of current and last screen like in the cartpool training

Hi, @Milky, I recently wrote a lib called pytorch_memlab, probably you can use the MemReporter to inspect which kind of tensors are eating up your GPU memory.