Weights never update while training

Hi guys,
I’m new here and just learn pytorch for 3 days. I tried to implement a demo about convnet but it didn’t work since the output loss at training period was never changed. And I figured out that when I ran loss.backward, the grad was always 0. Here is my demo code, is there something wrong with it ?

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, 3, padding=1)
        self.conv2 = nn.Conv2d(10, 1, 1)

    def forward(self, _x):
        _x = F.relu(self.conv1(_x))
        _x = F.softmax(self.conv2(_x))
        return _x


class myLoss(nn.Module):
    def __init__(self):
        super(myLoss, self).__init__()

    def forward(self, predict, target):
        predict = predict.view(predict.size()[0], -1)
        target = target.view(target.size()[0], -1)
        return (predict - target).mean()


x = torch.FloatTensor(torch.rand(10, 1, 100, 100))
y = torch.FloatTensor(torch.rand(10, 1, 100, 100))

x = Variable(x)
y = Variable(y)


net = Net()
opt = optim.Adam(net.parameters(), lr=0.1)
cr = myLoss()

for epoch in xrange(1000):

    output = net(x)
    loss = cr(output, y)
    opt.zero_grad()
    # print loss
    loss.backward()
    # for f in net.parameters():
    #     print(f.grad)
    opt.step()

In this code, loss never changed and f.grad was always 0

My mistake… I’ve already know which part of my code goes wrong.

So which part goes wrong? I can’t find it in your code

I have the same problem. How did you fix it?

Same here. What did you end up doing?

It’s my misuse of softmax function. I just change the function to sigmoid function. And I think it is advisable to print the output of every layer to see where the mistake happened. Maybe the initial weights , relu function, etc.

@wuhoo @zhangyuygss @cgraupe

Hi @Tao_jiang, thanks for the reply! I just changed the softmax to sigmoid too, and it works!
but do you know why is that?? I heard that softmax+crossentropy is a good combination but why it fails in this case?

@wuhoo The softmax activation needs in your case more than one output channel to calculate the probabilities properly.

See this short code snippet:

a = torch.FloatTensor(1, 1, 2, 2).normal_()
print a
>> (0 ,0 ,.,.) = 
>> 0.7764 -0.8651
>> 0.4080  0.9525
>> [torch.FloatTensor of size 1x1x2x2]

F.softmax(a) 
>> Variable containing:
>> (0 ,0 ,.,.) = 
>>  1  1
>> 1  1
>> [torch.FloatTensor of size 1x1x2x2]

F.sigmoid(a)
>> Variable containing:
>> (0 ,0 ,.,.) = 
>> 0.6849  0.2963
>> 0.6006  0.7216
>> [torch.FloatTensor of size 1x1x2x2]

As you can see, the softmax calculates the probabilities in dim 1.

Try this example with 2 channels in dim 1 and it should work as expected:

a = torch.FloatTensor(1, 2, 2, 2).normal_()
F.softmax(a)
>> Variable containing:
>> (0 ,0 ,.,.) = 
>> 0.5941  0.2936
>> 0.3891  0.3968

>> (0 ,1 ,.,.) = 
>> 0.4059  0.7064
>> 0.6109  0.6032
>> [torch.FloatTensor of size 1x2x2x2]