Why all out of a sudden google colab runs out of memory?

@ ptrblck I apologize in advance . I know I should not mention anyone specific and wait for the comunity to answer but I am in a In a tight spot. It would mean alot if you could help out.
So I have been trying to extract the features of a some pictures for a research case study . when trying to load an image from dataset_loader_ and try to use the model.forward() method for 34 iteration on my dataset_loader_object. this happens even using the cpu instead of gpu:(in this case google colab just crashes) I get a

CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 11.17 GiB total capacity; 10.62 GiB already allocated; 832.00 KiB free; 10.66 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I know this is indicating that I have ran out of cuda memory And here is the model . and the forward decleration :

class vgg16(torch.nn.Module):
    def __init__(self, pretrained=True):
        super(vgg16, self).__init__()
        vgg_pretrained_features = tv.vgg16(pretrained=pretrained).features
        self.layerss = torch.nn.Sequential()

        for x in range(30):
            self.layerss.add_module(str(x), vgg_pretrained_features[x])
    def forward(self, x):

      for i,layer in enumerate( self.layerss):
        # print (i)
        max_=[img_.max() for chanel in x for img_ in chanel]
        min_=[img_.min() for chanel in x for img_ in chanel] 
      return output

But the out of memory erore disapears when I Replace:

        max_=[img_.max() for chanel in x for img_ in chanel]
        min_=[img_.min() for chanel in x for img_ in chanel]

whit this

        max_=[img_.max().detach().numpy() for chanel in x for img_ in chanel]
        min_=[img_.min().detach().numpy() for chanel in x for img_ in chanel]

but why is this happening ? **Is a tensor obj really that large that 34 containers of type tensor can suck up the space of 10GIB in just matters of seconds ?Am i using the model wrong ?? it’s worth mentioning that the output of the forward function is a list of lists containing tensors

and I used this function to **monitor the cuda memory :
def mem_report():
  print("CPU RAM Free: " + humanize.naturalsize( psutil.virtual_memory().available ))
  GPUs = GPUtil.getGPUs()
  for i, gpu in enumerate(GPUs):
    print('GPU {:d} ... Mem Free: {:.0f}MB / {:.0f}MB | Utilization {:3.0f}%'.format(i, gpu.memoryFree, gpu.memoryTotal, gpu.memoryUtil*100))

and as mentioned everything looks fine until the for loop : here is the loop code

dataset_ =tid2008_dataloader(path,image_name_dir,first_time=True)
for batch_idx,data in enumerate(dataset_):
  data = data.to(device=device)
  features.append (model.forward(data))
  del data

It’s worth mentioning that the images are the size of 384 * 512*3

In your code you are appending the output of the forward method to features which will not only append the output tensor but the entire computation graph with it. Since you are iterating the entire dataset_ your memory usage would then grow in each iteration until you could be running out of memory.
I don’t know what you are doing with features, but the common approach is to train the model using a mini-batch, which would free the computation graph with its intermediate tensors, instead of accumulating the outputs of the entire dataset and potentially call backward() once per epoch.


I am trying to extract the outputs of the intermediate layers of an already pretrained_ network (in this case vgg ) as features , for some pictures as input to these networks.(there is no training or evaluating involved in the process) . later on I will try to do some analysis on these features. I just need the float data of the each layers output for a specific picture . So how can I get rid of the computational graph ? would tensor.item() do the trick ? I have tested this out but it’s really slow !

tensor.item() would only work on scalar tensors, so use tensor.detach() instead.

Got it . just one more question and this is not related to the topic . what data types and operations are allowed when pytorch is working with cuda .I couldnt find anything specific in pytorch documentation .Beacuse these instructions and data type have to be loaded into gpu . like can I use if statements or external functions like some other (ofcourse it has to already already be there in the gpu). or can I use sets, dictionaries , tuples, numpy arrays , …in my model, when later on trying to use the .to(‘cuda’) to load the model to gpu ? ( Or have I got it all wrong . if you could share a link conected to this topic I would really appriciate it )

You should be able to use any Python object in your code, which works on the CPU, as long as the actual tensor is pushed to the device.
CUDA kernels are executed on the tensor data directly, so no Python dicts etc. are passed to the kernel.

Oh so correct me if I am wrong . unless no tensor is passed to the gpu . it litteraly does nothing . and when we call model.to(“cuda”) we are just passing the tensors , containing the weights of neural network layers to the gpu ? Got it . thank you very much . I really apprciate the time you and the whole comunity are spending on helping out newbies like me . have a beatiful day !

1 Like

Hello again . Sorry I was just wondering instead of using .detach () or .item() cant I just use :

with torch.no_grads:
        max_=[img_.max()for chanel in x for img_ in chanel]
        min_=[img_.min()for chanel in x for img_ in chanel]```

instead of :
    max_=[img_.max().detach().numpy() for chanel in x for img_ in chanel]
    min_=[img_.min().detach().numpy() for chanel in x for img_ in chanel]```

Cause acording to the documentation :
Context-manager that disabled gradient calculation.
Disabling gradient calculation is useful for inference, when you are sure that you will not call Tensor.backward(). It will reduce memory consumption for computations that would otherwise have requires_grad=True.
I dont knowi if it removes entire computation graph or not ?(which was the problem causing the increas of memory usage)
I have tested this out and apparently it takse less memory . But I am not just sure

I guess your first code snippet is missing the with torch.no_grad() guard or with torch.inference_mode()? If so, then yes you wouldn’t need to detach() the tensors as no computation graph would be created. However, it would also be a no-op so you could also leave it if you are already using it.

1 Like

yes an edit mistake . Did change it . thank you . I will use with torc.nograd