Error: Expected more than 1 value per channel when training

AlexisW · October 1, 2018, 1:51am

I have a model that works perfectly when there are multiple input. However, if there is only one datapoint as input, I will get the above error. Does anyone have an idea on what’s going on here?

ptrblck · October 1, 2018, 4:06am

Most likely you have a nn.BatchNorm layer somewhere in your model, which expects more then 1 value to calculate the running mean and std of the current batch.
In case you want to validate your data, call model.eval() before feeding the data, as this will change the behavior of the BatchNorm layer to use the running estimates instead of calculating them for the current batch.
If you want to train your model and can’t use a bigger batch size, you could switch e.g. to InstanceNorm.

AlexisW · October 1, 2018, 7:28am

You are right - adding model.eval() is the key.

Ahsan_Habib · February 12, 2019, 4:03am

Consider the following network snippet

def __init__(self):
  ...
  self.fc_hidden_0 = nn.Linear(586, 100)
  self.bn_hidden_0 = nn.BatchNorm1d(100)

def forward(self, X):
  ...
  out = F.leaky_relu(self.bn_hidden_0(self.fc_hidden_0(out)))

Batch size is 32 (tried 20 as well) and sometime the the error is thrown -

ValueError: Expected more than 1 value per channel when training, got input size [1, 100]

I was trying to understand #ptrblck response that BatcNorm could be the problem which expects 100 values but does it expect that value of batch size 32 x 100 and getting a single instance as may be the train-loader is getting the last data segment?

Note that when I restart the training (followed by validation and train again) then the error is gone. Also note that model train() and eval() is placed before the starting of training and validation.

ptrblck · February 12, 2019, 8:39am

The error is most likely thrown during training and if the current batch only contains a single sample.
As you’ve explained this might happen, if the length of your Dataset is not dividable by the batch size without a remainder, which happens to be 1.
You could set drop_last=True in your DataLoader and run your code again.

Mandy · October 12, 2019, 2:46am

I met the same problem. I’m using the Deeplabv3 from the model.zoo, and I only like to train one image, so my batch size is 1, my input image size is 512*1024,and I got the exactly same error.

Traceback (most recent call last):
  File "/home/mengdietao/.pycharm_helpers/pydev/pydevd.py", line 1758, in <module>
    main()
  File "/home/mengdietao/.pycharm_helpers/pydev/pydevd.py", line 1752, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/home/mengdietao/.pycharm_helpers/pydev/pydevd.py", line 1147, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/mengdietao/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/mengdietao/tangshan/tangshan_data/image_segmentation/docs/data_check.py", line 356, in <module>
    train(CONFIG, True)
  File "/home/mengdietao/tangshan/tangshan_data/image_segmentation/docs/data_check.py", line 244, in train
    pred = model(data)
  File "/home/mengdietao/.conda/envs/mengdietao/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/mengdietao/.conda/envs/mengdietao/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/mengdietao/.conda/envs/mengdietao/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/mengdietao/.conda/envs/mengdietao/lib/python3.7/site-packages/torchvision/models/segmentation/_utils.py", line 22, in forward
    x = self.classifier(x)
  File "/home/mengdietao/.conda/envs/mengdietao/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/mengdietao/.conda/envs/mengdietao/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/home/mengdietao/.conda/envs/mengdietao/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/mengdietao/.conda/envs/mengdietao/lib/python3.7/site-packages/torchvision/models/segmentation/deeplabv3.py", line 91, in forward
    res.append(conv(x))
  File "/home/mengdietao/.conda/envs/mengdietao/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/mengdietao/.conda/envs/mengdietao/lib/python3.7/site-packages/torchvision/models/segmentation/deeplabv3.py", line 60, in forward
    x = super(ASPPPooling, self).forward(x)
  File "/home/mengdietao/.conda/envs/mengdietao/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/home/mengdietao/.conda/envs/mengdietao/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/mengdietao/.conda/envs/mengdietao/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 81, in forward
    exponential_average_factor, self.eps)
  File "/home/mengdietao/.conda/envs/mengdietao/lib/python3.7/site-packages/torch/nn/functional.py", line 1652, in batch_norm
    raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1])

Wendell_Philips · October 23, 2019, 1:29am

@ptrblck
But why when i use F.interploate with batchsize=1, i meet the same error ?

ptrblck · October 23, 2019, 10:02am

F.interpolate should not throw this error, as it’s underlying operations are independent regarding the batch size.
Could you check your model for batchnorm layers?
If you don’t find anything suspicious, could you post a code snippet to reproduce this error?

Wendell_Philips · October 25, 2019, 2:52am

 x5 = F.interpolate(self.avg_pool(x), size=x.size()[2:], mode='bilinear', align_corners=True)

That’ a BN layer in avg_pool. Thanks very much!

hy_struggle · March 10, 2020, 2:29pm

thanks, the func model.eval() is useful.

Zhengfang_Xin · April 3, 2020, 3:30pm

I want to train the model and batch_size = 1, so I add drop_last = True to my training dataloader. However, the error is still there.
And the error doesn’t happen in the backbone. It existed at deeplabV3-assp-x5 = F.interpolate(self.avg_pool(x), size=x.size()[2:], mode='bilinear', align_corners=True)

ptrblck · April 3, 2020, 10:32pm

For a batch size of 1 you don’t need to use drop_last=True, as there won’t be a smaller batch at the end of the epoch.

The error should be raised by batchnorm layers, which cannot calculate the batch statistics using a single sample.
You should therefore either increase the batch size or call eval() on the batchnorm layers.

Alpha · May 30, 2020, 1:25pm

Hi, @ptrblck

I use the nn.SyncBatchNorm with bath size of 1 for each of the 2 GPUs.

I.e, the total batch size is 2. But I still get the same error.

I use the pytorch1.5.

ptrblck · May 30, 2020, 1:27pm

Each device would still need more than a single value to calculate the local estimates.
SyncBatchNorm will only synchronize the calculations between the devices.

Alpha · May 30, 2020, 1:41pm

Thank you very much!

Alpha · May 30, 2020, 1:52pm

Hi, @ptrblck

One strange problem: when I use pytorch1.3.0, I use the nn.SyncBatchNorm with bath size of 1 for each of the 8 GPUs. I.e. The total batch size is 8. I didn’t get this error.

ptrblck · May 30, 2020, 9:39pm

Could you check the shape of the input activation to this particular batchnorm layer, please?

Alpha · May 31, 2020, 2:01am

When the shape of the input to the nnSyncBatchNorm is (1, 128, 32, 32) for each gpu, it is OK to use the SyncBatchNorm if I have more than a sinle GPU.

But, when the input shape is (1, 128, 1, 1), i.e, 1x1 spatial resolution, each device would still need more than a single value to calculate the local estimates， as mentioned by @ptrblck.

This phenomenon is based on pytorch1.5.

And when on pytorch1.3, nn.SyncBatchNorm always works regardless of the spatial resolution of input.

Is that a problem for PyTorch? 1.3 or 1.5? @ptrblck

Thanks in advance.

ptrblck · May 31, 2020, 5:37am

I’m not sure, if 1.3 had a bug, but this functionality should have been added approx. a month agp in this PR, which would enable it in the nightly binaries and next release. Could you install the nightlies and check, if it’s working now?

Alpha · June 1, 2020, 2:38am

sorry for late reply.

I’m stucked by the issue when install pytorch from source code