I’m trying to implement FCN now, so far I modify the last two layers of the resnet34 and use nn.NLLLoss, the result is good for me, now I wanna use mobilenet to implement that, but, after I modified the structure of model to mobilenet and still used nn.NLLLoss, the result is really really bad, so should I change the loss function?
If you have a classification problem with multiple classes, you should return the log_softmax
of the logits from your model and use NLLLoss
. The architecture itself does not determine the loss function, but your classification problem.
sorry I’m still confused with that part, so far my code of that part is as follows:
net = fcn(num_classes)
net.cuda()
criterion = nn.NLLLoss2d()
basic_optim = torch.optim.SGD(net.parameters(), lr=1e-2, weight_decay=1e-4)
optimizer = ScheduledOptim(basic_optim)
may I know what’s the meaning of return the log_softmax
and use NLLLoss
Thank you so much.
The code looks fine. Which PyTorch version are you using, since NLLLoss2d
was merged to NLLLoss
as far as I know.
Your model could return:
def forward(self, x):
...
return F.log_softmax(x)
Or you could apply the function before calling criterion.
I used pytorch 0.3.0, and I think I’ve already used log_softmax during training.
My model is as follows:
pretrained_net = resnetnet.resnet34(pretrained=False)
num_classes = len(classes)
class fcn(nn.Module):
def __init__(self, num_classes):
super(fcn, self).__init__()
self.stage1 = nn.Sequential(*list(pretrained_net.children())[:-4])
self.stage2 = list(pretrained_net.children())[-4]
self.stage3 = list(pretrained_net.children())[-3]
self.scores1 = nn.Conv2d(512, num_classes, 1)
self.scores2 = nn.Conv2d(256, num_classes, 1)
self.scores3 = nn.Conv2d(128, num_classes, 1)
self.upsample_8x = nn.ConvTranspose2d(num_classes, num_classes, 16, 8, 4, bias=False)
self.upsample_4x = nn.ConvTranspose2d(num_classes, num_classes, 4, 2, 1, bias=False)
self.upsample_2x = nn.ConvTranspose2d(num_classes, num_classes, 4, 2, 1, bias=False)
def forward(self, x):
x = self.stage1(x)
s1 = x # 1/8
x = self.stage2(x)
s2 = x # 1/16
x = self.stage3(x)
s3 = x # 1/32
s3 = self.scores1(s3)
s3 = self.upsample_2x(s3)
s2 = self.scores2(s2)
s2 = s2 + s3
s1 = self.scores3(s1)
s2 = self.upsample_4x(s2)
s = s1 + s2
s = self.upsample_8x(s2)
return s
and here’s my training part.
for e in range(50):
if e > 0 and e % 50 == 0:
optimizer.set_learning_rate(optimizer.learning_rate * 0.1)
train_loss = 0
net = net.train()
for data in train_data:
im = Variable(data[0]).cuda()
label = Variable(data[1]).cuda()
# forward
out = net(im)
out = F.log_softmax(out, dim=1) # (b, n, h, w)
loss = criterion(out, label)
# backward
optimizer.zero_grad()
loss.backward()
optimizer.step()
train_loss += loss.data[0]
label_pred = out.max(dim=1)[1].data.cpu().numpy()
label_true = label.data.cpu().numpy()
net = net.eval()
eval_loss = 0
for data in valid_data:
im = Variable(data[0].cuda(), volatile=True)
label = Variable(data[1].cuda(), volatile=True)
# forward
out = net(im)
out = F.log_softmax(out, dim=1)
loss = criterion(out, label)
eval_loss += loss.data[0]
label_pred = out.max(dim=1)[1].data.cpu().numpy()
label_true = label.data.cpu().numpy()
So the problem is, if I choose resnet34 and modify the last two layers to do the upsampling, it works well, but if I choose mobilenet and modify the last two layers to do the upsampling, the result is really bad, I’m not sure which part which I adjust.
Ok, I see. I assume you are using transfer learning on the pretrained net?
If so, could you try to lower the learning rate or set a lower learning rate for the pretrained model and leave the high learning rate for the new layers. You can use the per-parameter options from the optimizer.