I have a model that works perfectly when there are multiple input. However, if there is only one datapoint as input, I will get the above error. Does anyone have an idea on what’s going on here?
Most likely you have a nn.BatchNorm
layer somewhere in your model, which expects more then 1 value to calculate the running mean and std of the current batch.
In case you want to validate your data, call model.eval()
before feeding the data, as this will change the behavior of the BatchNorm
layer to use the running estimates instead of calculating them for the current batch.
If you want to train your model and can’t use a bigger batch size, you could switch e.g. to InstanceNorm
.
You are right - adding model.eval()
is the key.
Consider the following network snippet
def __init__(self):
...
self.fc_hidden_0 = nn.Linear(586, 100)
self.bn_hidden_0 = nn.BatchNorm1d(100)
def forward(self, X):
...
out = F.leaky_relu(self.bn_hidden_0(self.fc_hidden_0(out)))
Batch size is 32 (tried 20 as well) and sometime the the error is thrown -
ValueError: Expected more than 1 value per channel when training, got input size [1, 100]
I was trying to understand #ptrblck response that BatcNorm could be the problem which expects 100 values but does it expect that value of batch size 32 x 100 and getting a single instance as may be the train-loader is getting the last data segment?
Note that when I restart the training (followed by validation and train again) then the error is gone. Also note that model train() and eval() is placed before the starting of training and validation.
The error is most likely thrown during training and if the current batch only contains a single sample.
As you’ve explained this might happen, if the length of your Dataset
is not dividable by the batch size without a remainder, which happens to be 1.
You could set drop_last=True
in your DataLoader
and run your code again.
I met the same problem. I’m using the Deeplabv3
from the model.zoo
, and I only like to train one image, so my batch size is 1, my input image size is 512*1024
,and I got the exactly same error.
Traceback (most recent call last):
File "/home/mengdietao/.pycharm_helpers/pydev/pydevd.py", line 1758, in <module>
main()
File "/home/mengdietao/.pycharm_helpers/pydev/pydevd.py", line 1752, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/home/mengdietao/.pycharm_helpers/pydev/pydevd.py", line 1147, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/mengdietao/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/mengdietao/tangshan/tangshan_data/image_segmentation/docs/data_check.py", line 356, in <module>
train(CONFIG, True)
File "/home/mengdietao/tangshan/tangshan_data/image_segmentation/docs/data_check.py", line 244, in train
pred = model(data)
File "/home/mengdietao/.conda/envs/mengdietao/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/home/mengdietao/.conda/envs/mengdietao/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/mengdietao/.conda/envs/mengdietao/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/home/mengdietao/.conda/envs/mengdietao/lib/python3.7/site-packages/torchvision/models/segmentation/_utils.py", line 22, in forward
x = self.classifier(x)
File "/home/mengdietao/.conda/envs/mengdietao/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/home/mengdietao/.conda/envs/mengdietao/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/mengdietao/.conda/envs/mengdietao/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/home/mengdietao/.conda/envs/mengdietao/lib/python3.7/site-packages/torchvision/models/segmentation/deeplabv3.py", line 91, in forward
res.append(conv(x))
File "/home/mengdietao/.conda/envs/mengdietao/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/home/mengdietao/.conda/envs/mengdietao/lib/python3.7/site-packages/torchvision/models/segmentation/deeplabv3.py", line 60, in forward
x = super(ASPPPooling, self).forward(x)
File "/home/mengdietao/.conda/envs/mengdietao/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/mengdietao/.conda/envs/mengdietao/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/home/mengdietao/.conda/envs/mengdietao/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 81, in forward
exponential_average_factor, self.eps)
File "/home/mengdietao/.conda/envs/mengdietao/lib/python3.7/site-packages/torch/nn/functional.py", line 1652, in batch_norm
raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1])
@ptrblck
But why when i use F.interploate with batchsize=1, i meet the same error ?
F.interpolate
should not throw this error, as it’s underlying operations are independent regarding the batch size.
Could you check your model for batchnorm layers?
If you don’t find anything suspicious, could you post a code snippet to reproduce this error?
x5 = F.interpolate(self.avg_pool(x), size=x.size()[2:], mode='bilinear', align_corners=True)
That’ a BN layer in avg_pool. Thanks very much!
thanks, the func model.eval()
is useful.
I want to train the model and batch_size = 1
, so I add drop_last = True
to my training dataloader. However, the error is still there.
And the error doesn’t happen in the backbone. It existed at deeplabV3-assp-x5 = F.interpolate(self.avg_pool(x), size=x.size()[2:], mode='bilinear', align_corners=True)
For a batch size of 1 you don’t need to use drop_last=True
, as there won’t be a smaller batch at the end of the epoch.
The error should be raised by batchnorm layers, which cannot calculate the batch statistics using a single sample.
You should therefore either increase the batch size or call eval()
on the batchnorm layers.
Hi, @ptrblck
I use the nn.SyncBatchNorm
with bath size of 1 for each of the 2 GPUs.
I.e, the total batch size is 2. But I still get the same error.
I use the pytorch1.5.
Each device would still need more than a single value to calculate the local estimates.
SyncBatchNorm
will only synchronize the calculations between the devices.
Thank you very much!
Hi, @ptrblck
One strange problem: when I use pytorch1.3.0, I use the nn.SyncBatchNorm
with bath size of 1 for each of the 8 GPUs. I.e. The total batch size is 8. I didn’t get this error.
Could you check the shape of the input activation to this particular batchnorm layer, please?
When the shape of the input to the nnSyncBatchNorm
is (1, 128, 32, 32)
for each gpu, it is OK to use the SyncBatchNorm
if I have more than a sinle GPU.
But, when the input shape is (1, 128, 1, 1)
, i.e, 1x1
spatial resolution, each device would still need more than a single value to calculate the local estimates, as mentioned by @ptrblck.
This phenomenon is based on pytorch1.5.
And when on pytorch1.3, nn.SyncBatchNorm
always works regardless of the spatial resolution of input.
Is that a problem for PyTorch? 1.3 or 1.5? @ptrblck
Thanks in advance.
I’m not sure, if 1.3
had a bug, but this functionality should have been added approx. a month agp in this PR, which would enable it in the nightly binaries and next release. Could you install the nightlies and check, if it’s working now?
sorry for late reply.
I’m stucked by the issue when install pytorch from source code