Batchnormalization problem (multi-gpu)

lxtGH · June 22, 2018, 8:10am

File “/home/xiangtai/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 468, in call
result = self.forward(*input, **kwargs)
File “/home/xiangtai/project/pytorch-segmentation/models/psp/adaptive_res101.py”, line 136, in forward
feat1 = F.upsample(self.conv1(self.pool1(conv5)), (h, w), mode=‘bilinear’, align_corners=True)
File “/home/xiangtai/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 468, in call
result = self.forward(*input, **kwargs)
File “/home/xiangtai/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py”, line 91, in forward
input = module(input)
File “/home/xiangtai/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 468, in call
result = self.forward(*input, **kwargs)
File “/home/xiangtai/anaconda3/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py”, line 65, in forward
exponential_average_factor, self.eps)
File “/home/xiangtai/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py”, line 1226, in batch_norm
raise ValueError(‘Expected more than 1 value per channel when training, got input size {}’.format(size))
ValueError: Expected more than 1 value per channel when training, got input size [1, 512, 1, 1]

I was doing segmentation task,(psp net) I found when I use only one image per GPU. I meet this error. I don’t why this happened. Batchnorm should support one images per gpu.
However, When I change two images per GPU it can work.
Then I found this global pooling cause the problem.
self.pool1 = nn.AdaptiveAvgPool2d((1,1)). The spatial size must more than 1 when call normal bn.
who can help solve this problem ??

gdjmck · July 3, 2018, 6:20am

Are you doing it in evaluation mode and forgot to switch your model to evaluation mode?

lxtGH · July 4, 2018, 6:16am

Becasue for 1 * 1 perchannel tensor , the BN mean will always be zero