When training a model, it seems, the optimizer occupies some GPU memory which it does not release anymore. Let me explain this with an example:
import torchvision.models as models import torch from torch import optim, nn model = models.resnet18(pretrained=True).cuda() optimizer = optim.Adam(model.parameters()) criterion = nn.CrossEntropyLoss() img = torch.rand((1, 3, 224, 224)).cuda() label = torch.randint(0, 1000, (1,)).cuda() # GPU Memory usage up to here: 849 MB optimizer.zero_grad() res = model(img) loss = criterion(res, label) # GPU Memory usage up to here: 855 MB loss.backward() # GPU Memory usage up to here: 917 MB optimizer.step() # GPU Memory usage up to here: 1021 MB del loss, res, img, label torch.cuda.empty_cache()
If I just initialize the model, I get 849 MB of GPU memory usage. Running a forward pass with a single image and then
torch.cuda.empty_cache() increases the usage to 855 MB, fair enough.
Running the backward pass and and then
torch.cuda.empty_cache() increases the memory usage to 917 MB, makes sense as the gradients are filled.
optimizer.step() and then
torch.cuda.empty_cache() further increases the memory usage significantly. Could anyone tell me, why this is the case?
I sometimes run in to problem when alternatively training/validating, because of CUDA OOM error.