When training a model, it seems, the optimizer occupies some GPU memory which it does not release anymore. Let me explain this with an example:
import torchvision.models as models
import torch
from torch import optim, nn
model = models.resnet18(pretrained=True).cuda()
optimizer = optim.Adam(model.parameters())
criterion = nn.CrossEntropyLoss()
img = torch.rand((1, 3, 224, 224)).cuda()
label = torch.randint(0, 1000, (1,)).cuda()
# GPU Memory usage up to here: 849 MB
optimizer.zero_grad()
res = model(img)
loss = criterion(res, label)
# GPU Memory usage up to here: 855 MB
loss.backward()
# GPU Memory usage up to here: 917 MB
optimizer.step()
# GPU Memory usage up to here: 1021 MB
del loss, res, img, label
torch.cuda.empty_cache()
If I just initialize the model, I get 849 MB of GPU memory usage. Running a forward pass with a single image and then torch.cuda.empty_cache()
increases the usage to 855 MB, fair enough.
Running the backward pass and and then torch.cuda.empty_cache()
increases the memory usage to 917 MB, makes sense as the gradients are filled.
Now, running optimizer.step()
and then torch.cuda.empty_cache()
further increases the memory usage significantly. Could anyone tell me, why this is the case?
I sometimes run in to problem when alternatively training/validating, because of CUDA OOM error.
Cheers