I am writing a function with attempts to find the upper bound of the possible model size. I do this in a loop which at each iteration “tries” to append a new layer to a moduleList, constructs a model off of that list, then attempts a single forward pass.
In the exception, I delete the moduleList, the model, and attempt to delete the output of the forward pass. Because the forward pass failed however, the output of course does not exist. The issue is, whatever intermediate computations were attempted on the way to the failure (mid-forward-pass) persist and eat up memory. How do I free them? Is there a way to kill ALL GPU tensors at this moment?
Example Code (not real):
modules = []
for i in range(1000):
try:
modules.append(nn.Linear(10, 10)).to(device)
model = nn.Sequential(*modules)
out = model(input_batch)
del out
del model
except:
try:
del out
except:
pass
del model
del modules
gc.collect()
torch.cuda.empty_cache()
print(torch.cuda.memory_allocated(0)) # this will show that memory is still full!