The problem is when doing validation , the input image is 1024 * 2048 , 4 images for 4 1080ti gpu (each gpu per image). I met this error.
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File “train_cityscape.py”, line 229, in
train(cfg)
File “train_cityscape.py”, line 141, in train
outputs = model(images_val)
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 491, in call
result = self.forward(*input, **kwargs)
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py”, line 114, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py”, line 124, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py”, line 65, in parallel_apply
raise output
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py”, line 41, in _worker
output = module(*input, **kwargs)
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 491, in call
result = self.forward(*input, **kwargs)
File “/home/lxt/project/seg-pytorch/model/denseaspp121.py”, line 114, in forward
feature = self.features(_input)
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 491, in call
result = self.forward(*input, **kwargs)
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py”, line 91, in forward
input = module(input)
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 491, in call
result = self.forward(*input, **kwargs)
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py”, line 91, in forward
input = module(input)
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 491, in call
result = self.forward(*input, **kwargs)
File “/home/lxt/project/seg-pytorch/model/denseaspp121.py”, line 237, in forward
new_features = super(_DenseLayer, self).forward(x)
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py”, line 91, in forward
input = module(input)
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 491, in call
result = self.forward(*input, **kwargs)
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py”, line 49, in forward
self.training or not self.track_running_stats, self.momentum, self.eps)
File “/home/lxt/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py”, line 1194, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58
However, in pytorch 0.3.1, the code can run as usual.