Hello.
I have a scenario where I am running multiple runs where I create a new model, train it for a number of epochs in a loop, and then test the trained model after all these epochs. I am running multiple runs to gather averaged statistics. I would like to deallocate all unnecessary memory between each run to ensure that the timing is “fair”. I have managed to deallocate the memory (except the memory marked pretraining memory) between runs but using the memory snapshot there are two memory allocations which are persisting across runs. Please see the image.
One of these seems to be memory allocated for backprop. When I comment out loss_train.backward() it isn’t allocated. Another seems to be memory allocated for the forward pass of the model as when I replace output = model.forward() with a random tensor it isn’t allocated.
To try and deallocate the memory I have tried
optimizer.zero_grad(set_to_none=True)
model.zero_grad(set_to_none=True)
torch.cuda.synchronize()
for param in model.parameters():
if param.grad is not None:
del param.grad # Delete gradients
del model
del optimizer
gc.collect()
torch.cuda.empty_cache()
Using del + clearing cache has managed to free most memory between runs except these two blocks. Can anyone help with a way to force this memory to be deallocated between runs? I can post the memory snapshot information for the blocks if this is helpful. Thanks very much.