Same code meet out of memory problems in Pytorch.0.4

The problem is when doing validation , the input image is 1024 * 2048 , 4 images for 4 1080ti gpu (each gpu per image). I met this error.
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File “train_cityscape.py”, line 229, in
train(cfg)
File “train_cityscape.py”, line 141, in train
outputs = model(images_val)
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 491, in call
result = self.forward(*input, **kwargs)
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py”, line 114, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py”, line 124, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py”, line 65, in parallel_apply
raise output
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py”, line 41, in _worker
output = module(*input, **kwargs)
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 491, in call
result = self.forward(*input, **kwargs)
File “/home/lxt/project/seg-pytorch/model/denseaspp121.py”, line 114, in forward
feature = self.features(_input)
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 491, in call
result = self.forward(*input, **kwargs)
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py”, line 91, in forward
input = module(input)
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 491, in call
result = self.forward(*input, **kwargs)
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py”, line 91, in forward
input = module(input)
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 491, in call
result = self.forward(*input, **kwargs)
File “/home/lxt/project/seg-pytorch/model/denseaspp121.py”, line 237, in forward
new_features = super(_DenseLayer, self).forward(x)
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py”, line 91, in forward
input = module(input)
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 491, in call
result = self.forward(*input, **kwargs)
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py”, line 49, in forward
self.training or not self.track_running_stats, self.momentum, self.eps)
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py”, line 1194, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58

However, in pytorch 0.3.1, the code can run as usual.

It was very confused, I found that a singe image of size 3 * 1024 * 2048 with GTX 1080ti(11GB memory) can not’t handle than with fcn like CNN model based on densenet (Pytorch 0.40) However, pytorch 0.3.1 can handle it

Have you wrapped your validation code into torch.no_grad()?

2 Likes

Thank you very much ! ! I just use pytorch 0.4 for a day. Problem solved by that command, so the default setting of each tensor is need gradient calculation?

The default settings weren’t changed. Just the volatile flag is deprecated.
Instead of this flag, some context managers were introduced.
Have a look at the Migration Guide for more examples.