GPU memory doesn't released

I’m trying to send my CNN model to the GPU device, but each time I run model = I got an error “RuntimeError: CUDA error: out of memory”.

I tried to use

import torch

but that did not work, I’ve restarted the Kernal but that didn’t solve the problem. I checked the free/used memory, it looks full, I’ve tried to clean the memory using torch.cuda.empty_cache() that did not work, the below image shows the free/used memory.

I don’t have any idea why this error pops-up even I don’t send or train any model on the GPU.

  • Is someone else using the same machine to run GPU-intensive tasks?
  • Are you running another GPU-intensive task (e.g.: a game, or 3D-rendering)?

Try these following steps to figure out where the problem is:

  • Use the model with a simple most basic training loop. If the problem is solved check your training loop for any accumulation or (maybe make sure you’re not calling for each epoch)
  • If the error persists try using a different model. To find out if the error is in your model. (Has happened to me before and was due to some linear layers)

Simple training loop: (no autograd or scalers)

def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    for batch, (X, y) in enumerate(dataloader):
        pred = model(X)
        loss = loss_fn(pred, y)
        if batch%100 ==0:
            loss, current = loss.item(), batch*len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

epochs = 10
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)

No, it’s an online platform and I don’t run anything else

Thanks for reply, the problem with sending the model to GPU, I didn’t reach training step

Can you share the model?

Thanks, It was a problem within the platform and they’re working to fix it

1 Like