Hello! Cant recognise, how to clear gpu memory and what object are stored there. Code sample below. I added comments with my 2 gpu usage after every line of code. As you can see del objects + torch.cuda.empty_cash()
works well (not so well, because where is anyway 0.5gb more used, then before…) , but during my evaluation part of training loop I fails.
My main questions:
- Why after train part I got 1.2+0.6 gb vs 0.7+0.0 gb before training
- Why after eval part empty_cash absolutely fails?
model = torchvision.fcn_resnet50(pretrained=False, progress=False, num_classes=12)
model = torch.nn.DataParallel(model)
### GPU USAGE: 0.0 and 0.0 gb
model.to('cuda:0')
### GPU USAGE 0.7 and 0.0 gb
criterion = torch.nn.CrossEntropyLoss()
### GPU USAGE 0.7 and 0.0 gb
optimizer = torch.optim.Adam(model.parameters(), 5e-4, )
### GPU USAGE 0.7 and 0.0 gb
for epoch in [0]:
torch.cuda.empty_cache()
### GPU USAGE 0.7 and 0.0 gb
model.train()
### GPU USAGE 0.7 and 0.0 gb
if 1 == 1:
img, mask = next(iter(loaders['train']))
### GPU USAGE 0.7 and 0.0 gb
img, mask = img.to('cuda:0'), mask.to('cuda:0')
### GPU USAGE 0.8 and 0.0 gb
predicted_mask = model(img)['out']
### GPU USAGE 4.8 and 4.6 gb
loss = criterion(predicted_mask, mask.long())
### GPU USAGE 4.8 and 4.6 gb
optimizer.zero_grad()
### GPU USAGE 4.8 and 4.6 gb
loss.backward()
### GPU USAGE 5.4 and 5.0 gb
optimizer.step()
### GPU USAGE 5.4 and 5.0 gb
del img, mask, predicted_mask
### GPU USAGE 5.4 and 5.0 gb
torch.cuda.empty_cache()
### GPU USAGE 1.2 and 0.6 gb
# start validation part
model.eval()
### GPU USAGE 1.2 and 0.6 gb
if 1 == 1:
img, mask = next(iter(loaders['train']))
### GPU USAGE 1.2 and 0.6 gb
img, mask = img.to(args['device']), mask.to(args['device'])
### GPU USAGE 1.2 and 0.6 gb
predicted_mask = model(img)['out']
### GPU USAGE 5.1 and 4.6 gb
loss = criterion(predicted_mask, mask.long())
### GPU USAGE 5.2 and 4.6 gb
del img, mask, predicted_mask
### GPU USAGE 5.2 and 4.6 gb
torch.cuda.empty_cache()
### GPU USAGE 5.2 and 4.6 gb