I’m pretty new and want to learn how to debug GPU memory allocation.
My setup:
- paperspace, machine with A4000 16G GPU
- single notebook running
- playing with DINOv2, just using the embedding part with pre-trained weights
- inspecting the model, it has ~427M params, so even with float32 that should be around 1.7GB
- loading 280x280 images which I want to get embedding, 100 images x 280x280x3, with float32 should be under 100MB
I’m still getting RuntimeError: CUDA out of memory. Tried to allocate 48.00 MiB (GPU 0; 15.73 GiB total capacity; 13.80 GiB already allocated; 23.12 MiB free; 14.15 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Here is code to reproduce:
import torch
from torchvision import transforms
from PIL import Image
m = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitb14_reg')
m.eval()
m.cuda()
pil2tensor = transforms.ToTensor()
files = [] # 100 file paths to png files
x = torch.stack([pil2tensor(Image.open(f).resize((14*20,14*20))) for f in files])
x.shape == (100, 3, 280, 280)
bs = 10 # tried many different batch sizes
x_embs = []
for i in range(0, len(files), bs):
batch = x[i:i+bs]
batch = batch.cuda()
x_emb = m(batch)
x_embs.append(x_emb.cpu())
Usually it fails on the x_emb = m(batch)
line, which is when running the model inference. I tried different batches. I tried calling torch.cuda.empty_cache()
everywhere, but nothing helps.
It works fine on CPU.
Any advice on how to figure out why is so much memory “reserved”?