RuntimeError: cuda runtime error (2) : out of memory at ……THCStorage.cu:58

Hxb110 · July 19, 2021, 11:43am

Hello Everyone,

RuntimeError: cuda runtime error (2) : out of memory at c:\users\administrator\downloads\new-builder\win-wheel\pytorch\aten\src\thc\generic/THCStorage.cu:58

When I trained my network, i got this error. My network is modified from RCAN. After 1 epoch over, means my code

t.train()

is over.And then going to

t.test()

The entire main code is as follows

import torch
import utility
import data
import model
import loss
from option import args
from trainer import Trainer

if __ name __ == '__ main __':
torch.backends.cudnn.enabled = False
torch.manual_seed(args.seed)
checkpoint = utility.checkpoint(args)

if checkpoint.ok:
    loader = data.Data(args)
    model = model.Model(args, checkpoint)
    loss = loss.Loss(args, checkpoint) if not args.test_only else None
    t = Trainer(args, loader, model, loss, checkpoint)
    while not t.terminate():
        t.train()
        t.test()

    checkpoint.done()

The following is the result:

C:\Users\HP\Anaconda3\envs\pytorch\python.exe D:/hxb/RCAN-master/RCAN_TrainCode/code/main.py
Preparing seperated binary files
Preparing seperated binary files
Making model…
Preparing loss function:
1.000 * L1
[Epoch 1] Learning rate: 1.00e-4
[1600/16000] [L1: 17.5433] 88.1+5.9s
[3200/16000] [L1: 14.3299] 87.8+0.4s
[4800/16000] [L1: 12.7960] 88.0+0.4s
[6400/16000] [L1: 11.8648] 88.0+0.3s
[8000/16000] [L1: 11.2393] 88.0+0.3s
[9600/16000] [L1: 10.8215] 88.0+0.4s
[11200/16000] [L1: 10.4444] 88.0+0.3s
[12800/16000] [L1: 10.1431] 87.8+0.4s
[14400/16000] [L1: 9.9283] 87.8+0.4s
[16000/16000] [L1: 9.6954] 87.9+0.4s
Evaluation:
0%| | 0/5 [00:00<?, ?it/s]THCudaCheck FAIL file=c:\users\administrator\downloads\new-builder\win-wheel\pytorch\aten\src\thc\generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File “D:/hxb/RCAN-master/RCAN_TrainCode/code/main.py”, line 23, in
t.test()
File “D:\hxb\RCAN-master\RCAN_TrainCode\code\trainer.py”, line 93, in test
sr = self.model(lr, idx_scale)
File “C:\Users\HP\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py”, line 491, in call
result = self.forward(*input, **kwargs)
File “D:\hxb\RCAN-master\RCAN_TrainCode\code\model_init_.py”, line 54, in forward
return self.model(x)
File “C:\Users\HP\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py”, line 491, in call
result = self.forward(*input, **kwargs)
File “D:\hxb\RCAN-master\RCAN_TrainCode\code\model\mynetwork.py”, line 149, in forward
x = self.tail(res)
File “C:\Users\HP\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py”, line 491, in call
result = self.forward(*input, **kwargs)
File “C:\Users\HP\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\container.py”, line 91, in forward
input = module(input)
File “C:\Users\HP\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py”, line 491, in call
result = self.forward(*input, **kwargs)
File “C:\Users\HP\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\conv.py”, line 301, in forward
self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (2) : out of memory at c:\users\administrator\downloads\new-builder\win-wheel\pytorch\aten\src\thc\generic/THCStorage.cu:58

And we can see that there are some problem in

t.test()

but I dont know why.

Is there anyone can help me. The question is cu:58 .Thanks for reading my amateur english.

avalon1511 · July 19, 2021, 4:33pm

This just means that your GPU ran out of memory and you might need to assign more than one GPU to execute this program

Hxb110 · July 22, 2021, 9:00am

This should be a solution, but I have no conditions to verify it.