I have a four gpu setup (24 GB each) where I am trying to train a DeepLabV3Plus model using segmentation_models_pytorch library.
I am facing these errors:
ValueError: Caught ValueError in replica 0 on device 0.
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1])
The batch size I am using is 8.
Can you please help me resolve this issue?
It looks like you are passing a grayscale image (single channel) when the model expects multiple channels.
Are you adjusting for this property when creating model e.g., via the
model = smp.Unet(
in_channels=1, # model input channels (1 for gray-scale images, 3 for RGB, etc.)
Actually, I am using all RGB images.
It works with the default settings. Now there is a pool of encoders among which we can select which one to use. I am experimenting with different encoders. For some it works, for some it doesn’t.
Thank you for replying so quickly!
Really appreciate it.
Right, I misread the original error. It looks like your “encoder” might be downsampling the input too much. Could you check if e.g., downsampling is a setting that is available to you or workaround the issue by increasing the input size so that the spatial dimensions are not reduced to 1x1?
These are all the available settings.
And currently I am using everything default except for the
Right, so you might look into:
upsampling, or increasing the resolution of the input image as possible solutions.
Okay. Thank you for the tips. I am gonna try these out, and will update on this thread!
Thank you so much!
@eqy already explained why the error might be raised, however it still doesn’t fit your description:
I have a four gpu setup (24 GB each) where I am trying to train a DeepLabV3Plus
got input size torch.Size([1, 256, 1, 1])
I don’t know of you are using data parallel (I would assume so), which should yield a batch size of 2 for each of the 4 GPUs assuming the global batch size is 8. If the local batch size is set to 8 then of course each GPU should get 8 samples while the error indicates a single sample.
Yes, I am using
Which would mean that each of the four GPUs should process 2 samples for a batch size of 8. Could you add
print statements to the
forward method and post the shape of the input as well as all activation tensors? I guess you might either use an invalid reshaping operation in the
forward or your batch size is not 8.
I had the same issue which was solved by passing
drop_last = True to the Dataloader.
The issue occurs when the last batch accidentally holds 1 sample.