CUDA out of memory after 12 steps

±----------------------------------------------------------------------------+
| NVIDIA-SMI 440.36 Driver Version: 440.36 CUDA Version: 10.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108… On | 00000000:83:00.0 Off | N/A |
| 21% 34C P8 10W / 250W | 1MiB / 11178MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+
0
gpu03
/home/cwzhou/.conda/envs/pytorch/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:118: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). "
/home/cwzhou/.conda/envs/pytorch/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:143: UserWarning: The epoch parameter in scheduler.step() was not necessary and is being deprecated where possible. Please use scheduler.step() to step the scheduler. During the deprecation, if epoch is different from None, the closed form is used instead of the new chainable form, where available. Please open an issue if you are unable to replicate your use case: https://github.com/pytorch/pytorch/issues/new/choose.
warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning)
/home/cwzhou/.conda/envs/pytorch/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:395: UserWarning: To get the last learning rate computed by the scheduler, please use get_last_lr().
warnings.warn("To get the last learning rate computed by the scheduler, "
/home/cwzhou/.conda/envs/pytorch/lib/python3.8/site-packages/torch/nn/functional.py:1558: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.
warnings.warn(“nn.functional.tanh is deprecated. Use torch.tanh instead.”)
[epoch 0 step 0 loss 2588.6765]
[epoch 0 step 1 loss 2246.1255]
[epoch 0 step 2 loss 1857.5001]
[epoch 0 step 3 loss 1608.0941]
[epoch 0 step 4 loss 1188.3115]
[epoch 0 step 5 loss 1005.5057]
[epoch 0 step 6 loss 950.6890]
[epoch 0 step 7 loss 861.3840]
[epoch 0 step 8 loss 711.6000]
[epoch 0 step 9 loss 639.1685]
[epoch 0 step 10 loss 689.5997]
[epoch 0 step 11 loss 668.2221]
[epoch 0 step 12 loss 672.7594]
Traceback (most recent call last):
File “main.py”, line 146, in
y_val = model(x_test)
File “/home/cwzhou/.conda/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 550, in call
result = self.forward(*input, **kwargs)
File “/home/cwzhou/20200729_H2B_tocaiwei/PReNet.py”, line 96, in forward
x = F.relu(self.res_conv4(x)+resx)
RuntimeError: CUDA out of memory. Tried to allocate 114.00 MiB (GPU 0; 10.92 GiB total capacity; 10.33 GiB already allocated; 59.06 MiB free; 10.34 GiB reserved in total by PyTorch)

what should I do?I was confused

A common issue is storing the whole computation graph in each iteration.
Make sure you are not storing any tensors in e.g. a list or any other container, which might be still attached to the computation graph, as this will increase the memory usage in each iteration.
E.g. if you want to store the loss, use losses.append(loss.item()) instead of directly appending the loss.

you are right.it seems that I didn’t Release memory. how should I do to release them?
here is my training code
‘’

for i in range(args.epochs):
    scheduler.step(i)

    for j,(data,label) in enumerate(train_dl,0):
        model.train()
        model.zero_grad()
        optimizer.zero_grad()
        if args.use_gpu:
            data = data.cuda()
            label = label.cuda()
            
        pred,_ = model(data)
        loss =1e3*criterion(pred,label)
        
        loss.backward()
        optimizer.step()
#        optimizer.zero_grad()
        
        # training curve plot
        print("[epoch %d step %d loss %.4f]"%(i,j,loss.item()))
        if j%10==0:
            writer.add_scalar('loss', loss.item(),j)
    model.eval()
    x_val,_ = model(x_test)
    val_loss = criterion(x_val ,y_test)
    writer.add_scalar('val_loss', val_loss,i)
    if i%50==0:
        # if y_test.dim() == 4 and y_test.size(1)==1:
        #     y_test = torch.cat((y_test,y_test,y_test),dim=1)
        #     x_val = torch.cat((x_val,x_val,x_val),dim=1)
        #     x_test = torch.cat((x_test,x_test,x_test),dim=1)
    
        clean_grid = torchvision.utils.make_grid(y_test,nrow=16)
        writer.add_image('clean image',clean_grid,dataformats='CHW')
        dirty_grid = torchvision.utils.make_grid(x_test,nrow=16)
        writer.add_image('dirty image',dirty_grid,dataformats='CHW')
        debackground_grid = torchvision.utils.make_grid(x_val,nrow=16)
        writer.add_image('debackground image',debackground_grid,dataformats='CHW')
    print("[epoch %d val_loss %.4f]"%(i,val_loss))
        
    if i%10==0:
        path = os.path.join(os.getcwd(),'model','deback_epoch%d.pth'%(i+1))
        torch.save(model.state_dict(),path)

‘’’

after I reduce the depth of the dl model,here is the report:
[epoch 0 step 0 loss 2560.6475]
[epoch 0 step 1 loss 2552.1030]
[epoch 0 step 2 loss 2326.0481]
[epoch 0 step 3 loss 2074.4417]
[epoch 0 step 4 loss 1908.8021]
[epoch 0 step 5 loss 1693.8888]
[epoch 0 step 6 loss 1524.7013]
[epoch 0 step 7 loss 1367.1302]
[epoch 0 step 8 loss 1221.5389]
[epoch 0 step 9 loss 1098.3346]
[epoch 0 step 10 loss 927.1423]
[epoch 0 step 11 loss 826.2545]
[epoch 0 step 12 loss 796.7938]
[epoch 0 val_loss 650.3557]
[epoch 1 step 0 loss 632.1362]
[epoch 1 step 1 loss 594.0154]
[epoch 1 step 2 loss 552.5127]
[epoch 1 step 3 loss 544.2441]
[epoch 1 step 4 loss 562.6210]
[epoch 1 step 5 loss 585.8839]
[epoch 1 step 6 loss 566.8837]
[epoch 1 step 9 loss 519.0481]
[epoch 1 step 10 loss 503.7430]
[epoch 1 step 11 loss 480.8072]
[epoch 1 step 12 loss 470.9222]
[epoch 1 val_loss 945.2809]
[epoch 2 step 0 loss 450.9913]
[epoch 2 step 1 loss 405.8258]
[epoch 2 step 2 loss 399.1376]
[epoch 2 step 3 loss 404.3622]
[epoch 2 step 4 loss 408.4886]
[epoch 2 step 5 loss 387.0332]
[epoch 2 step 6 loss 388.2880]
[epoch 2 step 7 loss 349.8983]
[epoch 2 step 8 loss 364.2084]
[epoch 2 step 9 loss 388.2591]
[epoch 2 step 10 loss 343.5857]
[epoch 2 step 11 loss 356.0178]
[epoch 2 step 12 loss 359.7800]
Traceback (most recent call last):
File “main.py”, line 158, in
y_val = model(x_test)
File “/home/cwzhou/.conda/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 550, in call
result = self.forward(*input, **kwargs)
File “/home/cwzhou/20200729_H2B_tocaiwei/networks.py”, line 210, in forward
x = F.relu(self.res_conv3(x) + resx)
File “/home/cwzhou/.conda/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 550, in call
result = self.forward(*input, **kwargs)
File “/home/cwzhou/.conda/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/container.py”, line 100, in forward
input = module(input)
File “/home/cwzhou/.conda/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 550, in call
result = self.forward(*input, **kwargs)
File “/home/cwzhou/.conda/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/activation.py”, line 94, in forward
return F.relu(input, inplace=self.inplace)
File “/home/cwzhou/.conda/envs/pytorch/lib/python3.8/site-packages/torch/nn/functional.py”, line 1063, in relu
result = torch.relu(input)
RuntimeError: CUDA out of memory. Tried to allocate 114.00 MiB (GPU 0; 10.92 GiB total capacity; 10.37 GiB already allocated; 3.06 MiB free; 10.39 GiB reserved in total by PyTorch)

Your code snippet doesn’t show any part which might store the computation graph.
Maybe this line of code could be problematic (I haven’t tested it):

writer.add_scalar('val_loss', val_loss,i)

Could you use writer.add_scalar('val_loss', val_loss.item(), i) and rerun the code?
Also, did you verify that the GPU memory is increasing in each iteration via nvidia-smi?

thanks for your patience.
I add with torch.no_grad() before model eval and all the problems are solved.
I am very thank you and your colleague for your contributions to the pytorch community

1 Like

Good to hear it’s solved now and for the sake of completeness: your approach of wrapping the validation loop into the no_grad() block is the better approach anyway :wink:

Today, I used torchvision.utils.make_grid to generate images and writer.add_image to incorporate them into TensorBoard. I faced a similar issue after 27 steps.

The problem was resolved by including with torch.no_grad. Thank you.

Adding with torch.no_grad helped fix it. Thanks.