I was about to ask a question but I found my issue. Maybe it will help others.
I was on Google Colab and finding that I could train my model several times, but that on the 3rd or 4th time I’d run into the memory error. Using torch.cuda.empty_cache()
between runs did not help. All I could do was restart my kernel.
I had a setup of the sort:
class Fitter:
def __init__(self, model):
self.model = model
optimizer = # init optimizer here
The point is that I was carrying the model over in between runs but making a new optimizer (in my case I was making new instances of Fitter
). And in my case, the (Adam) optimizer state actually took up more memory than my model!
So to fix it I tried some things.
This did not work:
def wipe_memory(self): # DOES NOT WORK
self.optimizer = None
torch.cuda.empty_cache()
Neither did this:
def wipe_memory(self): # DOES NOT WORK
del self.optimizer
self.optimizer = None
gc.collect()
torch.cuda.empty_cache()
This did work!
def wipe_memory(self): # DOES WORK
self._optimizer_to(torch.device('cpu'))
del self.optimizer
gc.collect()
torch.cuda.empty_cache()
def _optimizer_to(self, device):
for param in self.optimizer.state.values():
# Not sure there are any global tensors in the state dict
if isinstance(param, torch.Tensor):
param.data = param.data.to(device)
if param._grad is not None:
param._grad.data = param._grad.data.to(device)
elif isinstance(param, dict):
for subparam in param.values():
if isinstance(subparam, torch.Tensor):
subparam.data = subparam.data.to(device)
if subparam._grad is not None:
subparam._grad.data = subparam._grad.data.to(device)
I got that optimizer_to
function from here