Only batches of spatial targets supported (non-empty 3D tensors) but got targets of size: : [1, 1, 256, 256]

Your segmentation masks are supposed to have values in [0, 1, 2], if you are trying to predict 3 classes in the segmentation output, so this sounds correct.
The 255 values are most likely coming from the image format you are loading.
Are your segmentation masks currently color-encoded? I.e. is e.g. the color “red” referring to class0, “blue” to class1 etc.?
If so, then note that these color encoded masks will have 3 channels (RGB) and will use the standard uint8 value range.
Red would thus be [255, 0, 0], Blue [0, 0, 255] etc. and you would need to map these colors to the class labels first. This post gives an example how this mapping could be applied.

1 Like