I encountered a CUDA out of memory error according to the attached code.
The error line code is
Vrestored = model_restoration(Vinput_)
The error specifically occurred during the validation phase, which was placed within the training step. To address the issue, I attempted to delete some variables in the training part and clear the memory cache. However, these attempts did not resolve the problem. I also tried reducing the batch size, using the garbage collector, and utilizing the torch.no_grad()
approach, but none of these methods were successful.
for epoch in range(start_epoch, opt.OPTIM.NUM_EPOCHS + 1):
epoch_start_time = time.time()
epoch_loss = 0
train_id = 1
model_restoration.train()
for i, data in enumerate(tqdm(train_loader), 0):
# zero_grad
for param in model_restoration.parameters():
param.grad = None
with torch.no_grad():
target = data[0].to('cuda')
input_ = data[1].to('cuda')
if epoch > 5:
target, input_ = mixup.aug(target, input_)
with torch.no_grad():
restored = model_restoration(input_)
# Compute loss at each stage
loss = np.sum([criterion(torch.clamp(restored[j], 0, 1), target) for j in range(len(restored))])
loss.requires_grad = True
loss.backward()
optimizer.step()
epoch_loss += loss.item()
del target, input_, restored
torch.cuda.empty_cache()
#### Evaluation ####
if i % eval_now == 0 and i > 0 and (epoch in [1, 25, 45] or epoch > 60):
model_restoration.eval()
psnr_val_rgb = []
for ii, data_val in enumerate((val_loader), 0):
Vtarget = data_val[0].to('cuda')
Vinput_ = data_val[1].to('cuda')
with torch.no_grad():
Vrestored = model_restoration(Vinput_)
Vrestored = Vrestored[0]
for res, tar in zip(Vrestored, Vtarget):
psnr_val_rgb.append(utils.torchPSNR(res, tar))
del Vtarget, Vinput_, Vrestored
torch.cuda.empty_cache()
psnr_val_rgb = torch.stack(psnr_val_rgb).mean().item()