Torchvision 0.3, deeplab models not taking in batch size of 1

gtm2122 · June 18, 2019, 4:56pm

Hi, I recently installed torchvision 0.3 on linux from source

my below code does not work during training mode but works during eval mode. It gives the error
“Expected more than 1 value per channel when training, got input size torch.Size([1,256,1,1])”

import torchvision.models.segmentation as seg_model
import torch

img = torch.zeros((1,1,300,300))
m1 = seg_model.deeplabv3_resnet50(num_classes=2,pretrained=False)
m2 = seg_model.deeplabv3_resnet101(num_classes=2,pretrained=False)

o = m1.cuda()(img.cuda()) # Error
o = m2.cuda()(img.cuda()) # Also error

This doesn’t happen during eval mode. It only happens when the 0th dim is less than 2 during train mode. I was just wondering to train this do I just keep minimum batch size as 2 or does the model expect something else like like some other information along with the input tensor image ? I am using the code from https://github.com/pytorch/vision/blob/v0.3.0/references/segmentation/train.py as reference

ptrblck · June 18, 2019, 9:46pm

Most likely you’ll see this error in a batchnorm layer, which cannot calculate the running batch statistics from a single value in each channel.
It’s similar to the Inception3() from torchvision, which also expects at least two samples in a batch during training.

Would it be possible to provide more than a single sample or are you running out of memory?

gtm2122 · June 19, 2019, 1:20pm

Yes, that is correct. The traceback ends at the definition for the functional of batchnorm. I will just set the minimum batch size as 2 in that case. Thank you!

Dane_Mitrev · November 1, 2019, 11:42am

In my case, I am running out of memory with batch 2. Is is possible to somehow bypass this limitation which expects at least two samples for BN layers?

ptrblck · November 1, 2019, 11:55am

You could try to use torch.utils.checkpointing to trade compute for memory.
However, you even if it’s working with a batch size of 2, the running estimates might be quite off due to the small batch size. You might need to tune the momentum or rather switch to another normalization layer, e.g. InstanceNorm.