Semantic Segmenataion Model Problem

I am working on semantic Segmentation on Pascal VOC 2012 dataset and my model is not working.
Please help.

My model is like.

(I took the final decoder output dimension as 1 as binary so 1 dimension labelmap)

class Net(nn.Module):
  def __init__(self):
        self.is_x_tensor4 = False
        self.batch_size, self.img_w, self.img_h=10,128,128
        self.input_shape = (self.batch_size, 3, self.img_w, self.img_h)
        #number of filters for each convolution layer in the encoder
        self.n_convfilter = [96, 128, 256, 256, 256, 256]  
        #the dimension of the fully connected layer
        self.n_fc_filters = [1024]    
        #number of filters for each 2d convolution layer in the decoder
        self.n_deconvfilter = [256, 256, 256, 128, 96, 1]
        #HERE THE PROBELM IS idx1,idx2,idx3
  def forward(self,x):
    output,c1,c2,c3,c4,c5= self.encoder(x)
    return ok

I have used the encoder-decoder model and my code is as follows–

class encoder(nn.Module):
  def __init__(self,input_shape,n_convfilter,\
        print("\ninitalizing \"encoder\"")
        self.conv1a = Conv2d(input_shape[1], n_convfilter[0], 7, padding=3,stride=1)#
        self.conv1b = Conv2d(n_convfilter[0], n_convfilter[0], 3, padding=1,stride=1)
        self.conv2a = Conv2d(n_convfilter[0], n_convfilter[1], 3, padding=1,stride=1)
        self.conv2b = Conv2d(n_convfilter[1], n_convfilter[1], 3, padding=1,stride=1)
        self.conv2c = Conv2d(n_convfilter[0], n_convfilter[1], 1)
        self.conv3a = Conv2d(n_convfilter[1], n_convfilter[2], 3, padding=1,stride=1)
        self.conv3b = Conv2d(n_convfilter[2], n_convfilter[2], 3, padding=1,stride=1)
        self.conv3c = Conv2d(n_convfilter[1], n_convfilter[2], 1)
        self.conv4a = Conv2d(n_convfilter[2], n_convfilter[3], 3, padding=1,stride=1)
        self.conv4b = Conv2d(n_convfilter[3], n_convfilter[3], 3, padding=1,stride=1)
        self.conv5a = Conv2d(n_convfilter[3], n_convfilter[4], 3, padding=1,stride=1)
        self.conv5b = Conv2d(n_convfilter[4], n_convfilter[4], 3, padding=1,stride=1)
        self.conv5c = Conv2d(n_convfilter[3], n_convfilter[4], 1)
        self.conv6a = Conv2d(n_convfilter[4], n_convfilter[5], 3, padding=1,stride=1)
        self.conv6b = Conv2d(n_convfilter[5], n_convfilter[5], 3, padding=1,stride=1)
        self.conv6a = Conv2d(n_convfilter[4], n_convfilter[5], 3, padding=1,stride=1)
        self.conv6b = Conv2d(n_convfilter[5], n_convfilter[5], 3, padding=1,stride=1)
        self.conv7a = Conv2d(n_convfilter[5], n_convfilter[5], 3, padding=1,stride=1)
        self.conv7b = Conv2d(n_convfilter[5], n_convfilter[5], 3, padding=1,stride=1)
        #self.conv7c = Conv2d(n_convfilter[3], n_convfilter[4], 1)
        ########################### n_convfilter[5]=256###################all
        #pooling layer
        self.pool1 = MaxPool2d(kernel_size= 2,stride=2)#,return_indices=True)
        self.pool2 = MaxPool2d(kernel_size=1,stride=2)#,return_indices=True)
        #nonlinearities of the network
        self.leaky_relu = LeakyReLU(negative_slope= 0.01)
        self.sigmoid = Sigmoid()
        self.tanh = Tanh()
        self.conv8a = Conv2d(n_convfilter[5], 1024, 1, padding=0,stride=1)
        #self.fc7 = Linear(1*1*256, 1024) 
  def forward(self, x):
        #x is the input and the size of x is (batch_size, channels, heights, widths).
        conv1a = self.conv1a(x)
        rect1a = self.leaky_relu(conv1a)
        conv1b = self.conv1b(rect1a)
        rect1 = self.leaky_relu(conv1b)
        pool1,idx1 = self.pool1(rect1),0
        conv2a = self.conv2a(pool1)
        rect2a = self.leaky_relu(conv2a)
        conv2b = self.conv2b(rect2a)
        rect2 = self.leaky_relu(conv2b)
        conv2c = self.conv2c(pool1)
        res2 = conv2c + rect2
        pool2,idx2 = self.pool2(res2),0
        conv3a = self.conv3a(pool2)
        rect3a = self.leaky_relu(conv3a)
        conv3b = self.conv3b(rect3a)
        rect3 = self.leaky_relu(conv3b)
        conv3c = self.conv3c(pool2)
        res3 = conv3c + rect3
        pool3,idx3 = self.pool2(res3),0
        conv4a = self.conv4a(pool3)
        rect4a = self.leaky_relu(conv4a)
        conv4b = self.conv4b(rect4a)
        rect4 = self.leaky_relu(conv4b)
        pool4,idx4 = self.pool2(rect4),0
        conv5a = self.conv5a(pool4)
        rect5a = self.leaky_relu(conv5a)
        conv5b = self.conv5b(rect5a)
        rect5 = self.leaky_relu(conv5b)
        conv5c = self.conv5c(pool4)
        res5 = conv5c + rect5
        pool5,idx5 = self.pool2(res5),0
        conv6a = self.conv6a(pool5)
        rect6a = self.leaky_relu(conv6a)
        conv6b = self.conv6b(rect6a)
        rect6 = self.leaky_relu(conv6b)
        res6 = pool5 + rect6
        pool6,idx6 = self.pool2(res6),0
        conv7a = self.conv6a(pool6)
        rect7a = self.leaky_relu(conv7a)
        conv7b = self.conv6b(rect7a)
        rect7 = self.leaky_relu(conv7b)
        res7 = pool6 + rect7
        pool7,idx6 = self.pool2(res7),0
        #pool9 = pool8.view(pool8.size(0), -1)
        fc7 = self.conv8a(pool8)
        rect7 = self.leaky_relu(fc7)
        return rect7,pool1,pool2,pool3,pool4,pool5


class decoder(nn.Module):
    def __init__(self, n_deconvfilter,n_class):
        print("\ninitializing \"decoder\"")
        super(decoder, self).__init__()
        #2d conv10
        self.conv10 = ConvTranspose2d(1024, n_deconvfilter[0], 3,stride=2, output_padding=1)#n_deconvfilter[0](we have to replace it with 256)
        #self.conv7b = ConvTranspose2d(n_deconvfilter[1], 256, 3, padding=1)#n_deconvfilter[0](we have to replace it with 256)

        #2d conv11
        self.conv11 = ConvTranspose2d(n_deconvfilter[0], n_deconvfilter[1], 3, padding=1,stride=2,output_padding=1)#((4-1)*2+3-2*1)
        #self.conv8b = ConvTranspose2d(n_deconvfilter[2], n_deconvfilter[2], 3, padding=1)
        #2d conv12
        self.conv12 = ConvTranspose2d(n_deconvfilter[1], n_deconvfilter[2], 3, padding=1,stride=2,output_padding=1)
        #self.conv9b = ConvTranspose2d(n_deconvfilter[3], n_deconvfilter[3], 3, padding=1)
        #self.conv9c = ConvTranspose2d(n_deconvfilter[2], n_deconvfilter[3], 1)
        #2d conv13
        self.conv13 = ConvTranspose2d(n_deconvfilter[2], n_deconvfilter[3], 3, padding=1,stride=2,output_padding=1)
        #self.conv10b = ConvTranspose2d(n_deconvfilter[4], n_deconvfilter[4], 3, padding=1)
        #self.conv10c = ConvTranspose2d(n_deconvfilter[4], n_deconvfilter[4], 3, padding=1)
        #2d conv14
        self.conv14 = ConvTranspose2d(n_deconvfilter[3], n_deconvfilter[4], 4, padding=1,stride=2)
        self.conv15 = ConvTranspose2d(n_deconvfilter[4], n_deconvfilter[5], 4, padding=1,stride=2)
        self.leaky_relu = LeakyReLU(negative_slope= 0.01)
    def forward(self, rect7,c1,c2,c3,c4,c5):
        #rect7=rect7.view([rect7.shape[0],1024,1 , 1])#idx3
        #unpool7 = self.unpool2d(rect7)###HERE FACING PROBLEM
        conv10 = self.conv10(rect7)
        rect10 = self.leaky_relu(conv10)
        #unpool8 = self.unpool2d(res7)#Here is the probelm
        conv11 = self.conv11(rect10)
        rect11 = self.leaky_relu(conv11)
        conv12 = self.conv12(rect11)
        rect12 = self.leaky_relu(conv12)
        conv13 = self.conv13(rect12)
        rect13 = self.leaky_relu(conv13)
        conv14 = self.conv14(rect13)
        rect14 = self.leaky_relu(conv14)

        return soft

My images are of size(10 as batch size)



n_epochs = 5
valid_loss_min = np.Inf # track change in validation loss
train_loss = 0.0
import datetime
for epoch in range(1, n_epochs+1):
    print("ALL ABOUT LOSS(training)--------",(train_loss/len(images)),"----------/n")
    print("ALL ABOUT LOSS(valid)--------",(valid_loss/len(valid_target)),"----------/n")
    train_loss = 0.0 
    for i in range(len(train_img)):
            # clear the gradients of all optimized variables
            # forward pass: compute predicted outputs by passing inputs to the model
          loss = criterion(output, tar)
            # perform a single optimization step (parameter update)
            # update training loss
          train_loss += loss.item()*data.size(0)
            print("about me 1 ",train_loss)

            print("\nabout me 43 ",train_loss)
            print("\nabout me 200 ",train_loss)
    for i in range(len(valid_img)):
          loss = torch.mean(criterion(model(data), tar))
          valid_loss += loss.item()*data.size(0)

And i am using Optimizers and loss as-

import torch.optim as optim
criterion =  nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=1e-5,eps=1e-08)

I think the problem is in the Loss function
My images are Binary in nature–

So i create label map as-

tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 1., 1., 1.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 1., 1., 1.],
        [0., 0., 0.,  ..., 0., 0., 0.]])

I guess the probelm is with the last layer.
Using softmax 2d all my output converts to 1(each cell).
I am not getting where i am lacking.
Please Help.

Use Cross Entropy Loss instead.

@shivam2298 Using CrossEntropyLoss it throws error-

RuntimeError: multi-target not supported at /pytorch/aten/src/THCUNN/generic/

Please revisit decoder code

Well thats because you must have one-hot encoded it, just use a matrix having class labels for each sample (batch_size,1,H,W). Have a look at the documentation of Cross Entropy Loss, It would be much more clear to you. Your prediction output should be having 2 channels.

1 Like

@shivam2298 My target Image contains only 1(white) and 0(black) . So i created a label map where background is 0 and area of interest is 1.Hence my output dimension is 1(Please rectify me if i am going the wrong way).
And i dint fully understand to use one-hot vectors in this case(I used one hot in multi class where the dimension is equal to the number of classes).
Please tell me how can i use one hot encoding in this case for the representation of target.

what are the dimension of your prediction and target which your are passing in the loss function?

1 Like

Here 10 is the batch size

prediction(output by the model)-

It just flattening the decoder output- 128* 128 *1

for segmentation task, you will be appying softmax over each pixel. So your prediction should be (batch_size,channel,H,W) and target should be(batch_size,H,W). In your case channel will be 2.

1 Like

@shivam2298 I am not fully understand what you are saying.
Will it not lead to size mismatch.
As prediction-
10* 2 *128 *128

And target-
10* 128 *128

if its about ,that we need to convert our target to one hot. So Please explain it, how i can convert it in 2 channel dimension with only 1 and 0 as labels.
I don’t think so it will cause any problem. Are you getting any error on doing so?

1 Like

@shivam2298 Hey i worked.Thank You.
Please rectify what mistake i was doing(I am not fully understood).(i have to convert the prediction again back to image)
And my training loss is also decreasing to slowly.

criterion =  nn.CrossEntropyLoss()
# specify optimizer
optimizer = optim.Adam(model.parameters(), lr=1e-5,eps=1e-08)

Any help u can provide with this.

Well there was no harm in even using BCELoss, just you had to read the what exactly are inputs it expects. For getting prediction labels, you can use torch.argmax. If you think your issue is solved, please mark the answer as solution.

One-hot encoding is required to compute the loss but that is taken care of by the pytorch loss function itself. So I would say your understanding was right but you need to change data representation as expected by the function :).

@shivam2298 Thanks for replying.
What i basically follow.
For semantic segmentation, if you have only one label then you can just have a label map where at each pixel you have a 1 where the object of interest is and 0s everywhere else. If you have >1 label of interest, you should use one hot encoding to generate your label map.
(Please provide the important info that i am lacking)

So in this please explain what actually is happening in the last layer of the decoder as we r using cross entropy loss.
What i basically understand is if we have multiclass then each dimesion holds like… For the first layer of that dimension I would have ones for each pixel that belongs to class 1, for layer 2 I would have ones at all of the pixels that belong to class 2, etc. etc.

If there are more than 2 labels also you don’t need one-hot as it is taken care of by the framework.

@shivam2298 Okk. Thank you very much

1 Like

@shivam2298 Can u please help me as my semantic segmentation on Pascal voc dataset as it is not working. I have changed the model to FCN8 but it dint work.
I Have trained my model but the loss reduces very small.
I am providing you the link to the google collab file and also of the model checkpoint.
Please help. and also suggest where i am lacking.
Collab Link-

Model Checkpoint-

@ptrblck if possible please help.

The learning rate seems to be very low with 1e-7. Was the model not training at all with higher learning rates?

1 Like

@ptrblck Thanks for replying. I have tried learning rate 0.0001, the loss decreased upto 4 epochs but then the loss almost stops decreasing(very less decrement in loss).

Then i tried 0.001 and also 0.000001 , but no good results.I have tried different eps values and different lr values but its like the loss decreased significant like 0.4012343 to 0.3512435 and then after some some epoch(2 or 4 epochs) the loss started decreasing very very less.
Please also rectify me if my approach is right towards semantic(basically what i have coded is right or i am doing something wrong).

What about having some modification on lr_scheduler?

1 Like