If I change the structure of the model, should I change loss function?

I’m trying to implement FCN now, so far I modify the last two layers of the resnet34 and use nn.NLLLoss, the result is good for me, now I wanna use mobilenet to implement that, but, after I modified the structure of model to mobilenet and still used nn.NLLLoss, the result is really really bad, so should I change the loss function?

If you have a classification problem with multiple classes, you should return the log_softmax of the logits from your model and use NLLLoss. The architecture itself does not determine the loss function, but your classification problem.

sorry I’m still confused with that part, so far my code of that part is as follows:

net = fcn(num_classes)
net.cuda()
criterion = nn.NLLLoss2d()
basic_optim = torch.optim.SGD(net.parameters(), lr=1e-2, weight_decay=1e-4)
optimizer = ScheduledOptim(basic_optim)

may I know what’s the meaning of return the log_softmax and use NLLLoss

Thank you so much.

The code looks fine. Which PyTorch version are you using, since NLLLoss2d was merged to NLLLoss as far as I know.

Your model could return:

def forward(self, x):
    ...
    return F.log_softmax(x)

Or you could apply the function before calling criterion.

I used pytorch 0.3.0, and I think I’ve already used log_softmax during training.
My model is as follows:

pretrained_net = resnetnet.resnet34(pretrained=False)
num_classes = len(classes)
class fcn(nn.Module):
    def __init__(self, num_classes):
        super(fcn, self).__init__()
        self.stage1 = nn.Sequential(*list(pretrained_net.children())[:-4])
        self.stage2 = list(pretrained_net.children())[-4]
        self.stage3 = list(pretrained_net.children())[-3]


        self.scores1 = nn.Conv2d(512, num_classes, 1)
        self.scores2 = nn.Conv2d(256, num_classes, 1)
        self.scores3 = nn.Conv2d(128, num_classes, 1)

        self.upsample_8x = nn.ConvTranspose2d(num_classes, num_classes, 16, 8, 4, bias=False)


        self.upsample_4x = nn.ConvTranspose2d(num_classes, num_classes, 4, 2, 1, bias=False)


        self.upsample_2x = nn.ConvTranspose2d(num_classes, num_classes, 4, 2, 1, bias=False)


    def forward(self, x):
        x = self.stage1(x)
        s1 = x  # 1/8

        x = self.stage2(x)
        s2 = x  # 1/16

        x = self.stage3(x)
        s3 = x  # 1/32

        s3 = self.scores1(s3)
        s3 = self.upsample_2x(s3)
        s2 = self.scores2(s2)
        s2 = s2 + s3

        s1 = self.scores3(s1)
        s2 = self.upsample_4x(s2)
        s = s1 + s2
        s = self.upsample_8x(s2)
        return s

and here’s my training part.

for e in range(50):
    if e > 0 and e % 50 == 0:
        optimizer.set_learning_rate(optimizer.learning_rate * 0.1)
    train_loss = 0


    net = net.train()

    for data in train_data:
        im = Variable(data[0]).cuda()
        label = Variable(data[1]).cuda()
        # forward
        out = net(im)
        out = F.log_softmax(out, dim=1)  # (b, n, h, w)
        loss = criterion(out, label)
        # backward
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        train_loss += loss.data[0]

        label_pred = out.max(dim=1)[1].data.cpu().numpy()
        label_true = label.data.cpu().numpy()


    net = net.eval()
    eval_loss = 0

    for data in valid_data:
        im = Variable(data[0].cuda(), volatile=True)
        label = Variable(data[1].cuda(), volatile=True)
        # forward
        out = net(im)
        out = F.log_softmax(out, dim=1)
        loss = criterion(out, label)
        eval_loss += loss.data[0]

        label_pred = out.max(dim=1)[1].data.cpu().numpy()
        label_true = label.data.cpu().numpy()

So the problem is, if I choose resnet34 and modify the last two layers to do the upsampling, it works well, but if I choose mobilenet and modify the last two layers to do the upsampling, the result is really bad, I’m not sure which part which I adjust.

Ok, I see. I assume you are using transfer learning on the pretrained net?
If so, could you try to lower the learning rate or set a lower learning rate for the pretrained model and leave the high learning rate for the new layers. You can use the per-parameter options from the optimizer.