RuntimeError: cuda runtime error (2) : out of memory

zeng · June 21, 2017, 7:44pm

Error seems the loss is too big? I don’t get why will out of memory.

I have changed the code to make the generator inside the model, and just use one backward(), does this combination result in this problem?

THCudaCheck FAIL file=/b/wheel/pytorch-src/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
File “main.py”, line 241, in
main()
File “main.py”, line 238, in main
train.train(model, optim, criterion, trainData, validData, testData, opt)
File “/home/zeng/code/tb-seq2seq/train.py”, line 263, in train
loss.backward()
File “/home/zeng/envs/pytorch_0.1.12_py27/local/lib/python2.7/site-packages/torch/autograd/variable.py”, line 146, in backward
self.execution_engine.run_backward((self,), (gradient,), retain_variables)
File "/home/zeng/envs/pytorch_0.1.12_py27/local/lib/python2.7/site-packages/torch/nn/functions/thnn/auto.py", line 46, in backward
grad_input = grad_output.new().resize_as(input).zero()
RuntimeError: cuda runtime error (2) : out of memory at /b/wheel/pytorch-src/torch/lib/THC/generic/THCStorage.cu:66

Uxsiearo · December 11, 2017, 10:50am

https://github.com/pytorch/pytorch/issues/958 it seems the same error. Isn’t it? Do you save it now? and how ?

albanD · December 11, 2017, 11:00am

Hi,
The error is that during the backward pass, when it tries to allocate memory to store the gradients and perform computations, there is not enough. You should try reducing the batch size.

Uxsiearo · December 11, 2017, 11:07am

hi， is not the reason of large number of parameters? I have met the same issue

albanD · December 11, 2017, 11:17am

Hi,

Yes having more parameters makes your model more memory hungry.
If you still want to be able to use the same network, a solution would be to reduce the batch size, this way the intermediary computations will be smaller and will use less memory.

Uxsiearo · December 11, 2017, 11:52am

get it. Thank you very much!