Multi masks segmentation U-Net

simongeek · August 11, 2020, 11:06am

Hi,

I would like to train a neural network for landmarks detection as heatmaps using U-Net architecture.
I have 20 points, and I have generated a binary mask for each point (20 classes + background = 21 classes).

For example, my input is 800x800x1 and I would like to have my output as 800x800x21.

Does anybody know how to deals with it?

ptrblck · August 13, 2020, 8:23am

Assuming your input has the dimensions [height, width, channels], you would have to permute it and add a batch dimension, since PyTorch modules expect image tensors as [batch_size, channels, height, width].

The same applies to your target tensor, i.e. it should have the shape [batch_size, 21, 800, 800].

Based on your description it seems you are working on a mulit-label segemntation, i.e. each pixel position could belong to zero, one, or multiple classes.
If that’s correct, your model output should have the same shape as the target and contain logits (no non-linearity at the end). For the criterion you could use nn.BCEWithLogitsLoss.

Let me know, if that would work for you.