Expected more than 1 value per channel when training

Double post from here.
As already described, you need to provide more than a single element to batchnorm layers in training mode so that stats can be calculated. Calling eval() was a suggested workaround, which might work if you are using a pretrained model, since the already trained running stats are used.