How can we release GPU memory cache?

I was about to ask a question but I found my issue. Maybe it will help others.

I was on Google Colab and finding that I could train my model several times, but that on the 3rd or 4th time I’d run into the memory error. Using torch.cuda.empty_cache() between runs did not help. All I could do was restart my kernel.

I had a setup of the sort:

class Fitter:
    def __init__(self, model):
        self.model = model
        optimizer = # init optimizer here

The point is that I was carrying the model over in between runs but making a new optimizer (in my case I was making new instances of Fitter). And in my case, the (Adam) optimizer state actually took up more memory than my model!

So to fix it I tried some things.
This did not work:

def wipe_memory(self): # DOES NOT WORK
    self.optimizer = None
    torch.cuda.empty_cache()

Neither did this:

def wipe_memory(self): # DOES NOT WORK
    del self.optimizer
    self.optimizer = None
    gc.collect()
    torch.cuda.empty_cache()

This did work!

def wipe_memory(self): # DOES WORK
    self._optimizer_to(torch.device('cpu'))
    del self.optimizer
    gc.collect()
    torch.cuda.empty_cache()

def _optimizer_to(self, device):
    for param in self.optimizer.state.values():
        # Not sure there are any global tensors in the state dict
        if isinstance(param, torch.Tensor):
            param.data = param.data.to(device)
            if param._grad is not None:
                param._grad.data = param._grad.data.to(device)
        elif isinstance(param, dict):
            for subparam in param.values():
                if isinstance(subparam, torch.Tensor):
                    subparam.data = subparam.data.to(device)
                    if subparam._grad is not None:
                        subparam._grad.data = subparam._grad.data.to(device)

I got that optimizer_to function from here

5 Likes