How to pass semantic segmentation labels to FCN?

I have the following data:

Ground truth image
Segmentation mask image
And a .npy file that contains the pixel wise labels for the ground truth.

I want to use this data to train a FCN from scratch. The structure of the FCN is as follows -

Conv2D
Dropout
BN
Activation

This block is repeated three times to finish the model.

How do I pass the .npy data to a FCN so that I can train it from scratch to generate segmentation masks?

You should be able to load the numpy array via np.load, transform it to a tensor via torch.from_numpy(array), and pass the tensor to the PyTorch model.

Thanks this worked! However, my classification accuracy is really bad. My network is as follows:

class label_net_3c(nn.Module):

    def __init__(self):
        
        super(label_net_3c, self).__init__()


        self.conv1 = nn.Conv2d(3, 3, kernel_size = 1)
        self.bn1 = nn.BatchNorm2d(3)
        self.act1 = nn.ReLU(inplace = True)
        self.conv2 = nn.Conv2d(3,2, kernel_size = 1)
        self.bn2 = nn.BatchNorm2d(2)
        self.act2 = nn.ReLU(inplace = True)
        self.conv3 = nn.Conv2d(2, 34, kernel_size = 1)
        self.act3 = nn.Softmax(dim = 1)

    
    def forward(self, x):

        x = self.conv1(x)
        x = self.bn1(x)
        x = self.act1(x)
        x = self.conv2(x)
        x = self.bn2(x)
        x = self.act2(x)
        x = self.conv3(x)
        x = self.act3(x)

        return x

it’s a pixel wise classifier so I pass batches of pixels of size N such that the input is N x 3 x 1 x 1 for a given RGB image. Any tips on how to improve the accuracy?

I guess you are using nn.CrossEntropyLoss as the loss function for your segmentation use case.
If so, remove the nn.Softmax layer as nn.CrossEntropyLoss expects raw logits not probabilities.

Gotcha, I assume nn.CrossEntropyLoss. has a built in Softmax operation. Although I did what you suggested but my accuracy is still really bad.

Yes, nn.CrossEntropyLoss calls F.log_softmax and nn.NLLLoss internally.
Could you try to overfit a small dataset (e.g. just 10 samples) by playing around with some hyper-parameters? If this still doesn’t work there might be another issue in the code which I missed.

Hi, I think my features were just too weak since I was trying to see if I could get some classification results with a few samples only. I am trying to now plot the intermediate features.

I have a tensor of size (1,32,256,256) - can I plot a t-sne for this? just want to visualize the data dont want to compare it to the ground truth.

I figured out the problem. I am doing

m = pixel_classifier()
pred = m(train_batch)
print(pred.grad_fn)
# <AddmmBackward0 object at 0x7f865771f590>
_, pred = torch.max(pred, dim = 1)
print(pred.grad_fn)
# None

Seems like torch.max() is breaking the graph and my loss function + optimizer parameters arent updating for this reason.

Is there a workaround for torch.max()? I need it to get the labels for the predictions.

torch.argmax (i.e. the second return value from torch.max) is not differentiable as the gradients would be almost everywhere zero.
You can use it to calculate the accuracy but not to train the model.
For a multi-class classification use nn.CrossEntropyLoss and pass the logits to it.

Hi ptrblck,

I need a bit of help on the semantic segmentation task as well.
So I followed your instructions in this thread and did the following:

  1. Removed the nn.softmax layer from the architecture
  2. Last layer of my architecture is nn.Conv2D(in_channels, out_channels=4, …) since I have 4 classes
  3. Passed the logits in nn.CrossEntropy(logits, groundtruth) and backpropagated the loss
    My groundtruth is of size (batch_size,1,256,256) but the logits is (batch_size, 4, 256, 256)
    so how can I convert the logits to predicted segmentation map of size (batch_size,1,256,256) for visualization purpose? I need the predicted classes to appear on one slice.
    Many Thanks !

Never mind, I have resolved it. Thanks for the above discussion. :))