If I change the structure of the model, should I change loss function?

forcefulowl · April 16, 2018, 6:37pm

I’m trying to implement FCN now, so far I modify the last two layers of the resnet34 and use nn.NLLLoss, the result is good for me, now I wanna use mobilenet to implement that, but, after I modified the structure of model to mobilenet and still used nn.NLLLoss, the result is really really bad, so should I change the loss function?

ptrblck · April 16, 2018, 8:32pm

If you have a classification problem with multiple classes, you should return the log_softmax of the logits from your model and use NLLLoss. The architecture itself does not determine the loss function, but your classification problem.

forcefulowl · April 17, 2018, 12:53am

sorry I’m still confused with that part, so far my code of that part is as follows:

net = fcn(num_classes)
net.cuda()
criterion = nn.NLLLoss2d()
basic_optim = torch.optim.SGD(net.parameters(), lr=1e-2, weight_decay=1e-4)
optimizer = ScheduledOptim(basic_optim)

may I know what’s the meaning of return the log_softmax and use NLLLoss

Thank you so much.

ptrblck · April 17, 2018, 5:23am

The code looks fine. Which PyTorch version are you using, since NLLLoss2d was merged to NLLLoss as far as I know.

Your model could return:

def forward(self, x):
    ...
    return F.log_softmax(x)

Or you could apply the function before calling criterion.

forcefulowl · April 17, 2018, 6:30am

I used pytorch 0.3.0, and I think I’ve already used log_softmax during training.
My model is as follows:

pretrained_net = resnetnet.resnet34(pretrained=False)
num_classes = len(classes)
class fcn(nn.Module):
    def __init__(self, num_classes):
        super(fcn, self).__init__()
        self.stage1 = nn.Sequential(*list(pretrained_net.children())[:-4])
        self.stage2 = list(pretrained_net.children())[-4]
        self.stage3 = list(pretrained_net.children())[-3]


        self.scores1 = nn.Conv2d(512, num_classes, 1)
        self.scores2 = nn.Conv2d(256, num_classes, 1)
        self.scores3 = nn.Conv2d(128, num_classes, 1)

        self.upsample_8x = nn.ConvTranspose2d(num_classes, num_classes, 16, 8, 4, bias=False)


        self.upsample_4x = nn.ConvTranspose2d(num_classes, num_classes, 4, 2, 1, bias=False)


        self.upsample_2x = nn.ConvTranspose2d(num_classes, num_classes, 4, 2, 1, bias=False)


    def forward(self, x):
        x = self.stage1(x)
        s1 = x  # 1/8

        x = self.stage2(x)
        s2 = x  # 1/16

        x = self.stage3(x)
        s3 = x  # 1/32

        s3 = self.scores1(s3)
        s3 = self.upsample_2x(s3)
        s2 = self.scores2(s2)
        s2 = s2 + s3

        s1 = self.scores3(s1)
        s2 = self.upsample_4x(s2)
        s = s1 + s2
        s = self.upsample_8x(s2)
        return s

and here’s my training part.

for e in range(50):
    if e > 0 and e % 50 == 0:
        optimizer.set_learning_rate(optimizer.learning_rate * 0.1)
    train_loss = 0


    net = net.train()

    for data in train_data:
        im = Variable(data[0]).cuda()
        label = Variable(data[1]).cuda()
        # forward
        out = net(im)
        out = F.log_softmax(out, dim=1)  # (b, n, h, w)
        loss = criterion(out, label)
        # backward
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        train_loss += loss.data[0]

        label_pred = out.max(dim=1)[1].data.cpu().numpy()
        label_true = label.data.cpu().numpy()


    net = net.eval()
    eval_loss = 0

    for data in valid_data:
        im = Variable(data[0].cuda(), volatile=True)
        label = Variable(data[1].cuda(), volatile=True)
        # forward
        out = net(im)
        out = F.log_softmax(out, dim=1)
        loss = criterion(out, label)
        eval_loss += loss.data[0]

        label_pred = out.max(dim=1)[1].data.cpu().numpy()
        label_true = label.data.cpu().numpy()

So the problem is, if I choose resnet34 and modify the last two layers to do the upsampling, it works well, but if I choose mobilenet and modify the last two layers to do the upsampling, the result is really bad, I’m not sure which part which I adjust.

ptrblck · April 17, 2018, 8:11am

Ok, I see. I assume you are using transfer learning on the pretrained net?
If so, could you try to lower the learning rate or set a lower learning rate for the pretrained model and leave the high learning rate for the new layers. You can use the per-parameter options from the optimizer.