What should the input to DeepLabV3 be in training mode?

I am trying to train a deeplabv3_resnet50 model on a custom dataset, but get the error ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1]) when trying to do the forward pass. The following minimal example produces this error:

import torch
import torchvision

model = torchvision.models.segmentation.deeplabv3_resnet50(weights="DEFAULT")
model.train()

batch_size = 1
nbr_of_channels = 3
img_height, img_width = (500, 500)
input = torch.rand((batch_size, nbr_of_channels, img_height, img_width))
model(input)

I do not understand this at all. What on earth is meant by got input size torch.Size([1, 256, 1, 1])?

The stacktrace points to a batchnorm layer, which gets an activation in the shape of [1, 256, 1, 1] and tries to normalize the input by calculating its mean and std. Since you are passing a single sample and the input activation to this batchnorm layer has a single pixel, the stats calculation will fail which this error message indicates.
Increase the batch size and it will work.

1 Like