Model weights not being updated

Hey again,

I’m currently developing a transversal machine learning tool that is able to support multiple ML frameworks and therefore I’m doing things a little differently when compared to the regular pytorch workflow.

My model inherits from nn.Module and has the regular init and forward methods. However, to fit the framework, I had to add an update method that calls the forward, computes the loss, calls loss.backward() and calls optimizer.step.

This is the current update bit from my code:

def update(self, input=None, output=None):

    # input_source, input_target, tags, input_editor=None, input_client=None

    input_source = input['source']
    input_target = input['target']
    input_editor = input['editor']
    input_client = input['client']
    tags = output['tags']

    inputs = [input_source, input_target]
    if input_editor is not None:
        inputs += [input_editor]
    if input_client is not None:
        inputs += [input_client]

    if self.cuda:
        inputs = [Variable(inpt.cuda()) for inpt in inputs]
        tags = Variable(tags.cuda())
    else:
        inputs = [Variable(inpt) for inpt in inputs]
        tags = Variable(tags)

    self.optimizer.zero_grad()
    out = self.forward(*inputs)
    loss = self.loss_fn(out, tags)
    loss.backward()
    self.optimizer.step()

    return loss

Everything is working fine, EXCEPT the update bit of the weights. The update method is being called in a train_loop function that calls model.update(**data). I’m printing the losses from every epoch and the values are exactly the same (with different types of data).

Anyone has a clue of what might be wrong?

Try seeing if model.parameters() gives you all the parameters you are expecting. Maybe you have defined some of the parameters as Variable instead of nn.Parameter, so the logic behind parameters() can’t see your weights.

The custom modules my model uses are only for shape manipulation, everything else is native torch, so I guess there’s no need for nn.Parameter…

Could you try seeing if list(model.parameters()) prints all the parameters of your model?
Also, verify that you are not setting requires_grad=True to all parameters of your network, as it would avoid backprop through the network

Every parameter seem to be in the parameter pool…

The problem seems to be the loss.backward() and optimizer.step() calls that are not updating the weights.

While debugging I did:

a = list(self.parameters())[0]
loss.backward()
self.optimizer.step()
b = list(self.parameters())[0]
torch.equal(a.data, b.data)

And it returned True so the parameter is not being updated at all. I don’t have a clue of what may be wrong with my code anymore. Here’s a small reproducible example of my code:

https://gist.github.com/miguelvr/251903423fed3b64d320b53a847aaa75

The example that you showed has a problem with it.
Indeed, when you do list(model.parameters())[0], you are taking a reference of the tensor.
Try adding a clone() instead

a = list(self.parameters())[0].clone()
loss.backward()
self.optimizer.step()
b = list(self.parameters())[0].clone()
torch.equal(a.data, b.data)
8 Likes

Thanks for the correction. However, it still returns True, so the weights are definitely not being updated…

So… I tried to remove the last linear layer with the softmax and changed the BCELoss to MSELoss and the weights are being updated!

I’m guessing this is a problem from BCELoss and not from my code… Could you check this out @smth @apaszke ?

Did you remove your custom modules too? I guess you must have repacked the data into a new Variables somewhere, thus breaking the history graph. You might want to check if list(model.parameters())[0].grad is not None. If it is, you must be breaking the graph somewhere (you should never need to take the .data do some op, and wrap it in a Variable).

2 Likes

It is not None it is a full zero matrix!! And that explains why the weights are not being updated…

But I checked again and this only happens because I use BCELoss AND a softmax activation for the output! With a sigmoid activation, the weights are updated…

Am I doing something wrong? I’m pretty much recreating some other code I had in Keras where I used a categorical cross entropy loss (with one-hot encodings) and a final softmax activation. In pytorch I’m not doing one-hot encodings and I’m using BCELoss.

4 Likes

Hey, I have the exact same error!

  1. My model doesn’t seem to be training.
  2. Upon checking a = list(model.parameters())[0].clone() and b = a = list(model.parameters())[0].clone() before and after the call to loss.backward() and optimizer.step(). a==b returns false
  3. Upon printing list(model.parameters())[0].grad it returns a matrix of all 0’s.

I’ve been struggling with this problem for 3 days now. Please help me out. The original question can be found at LSTM Model not training

@smth @apaszke @fmassa please take a look at this. I can share the complete code if I am missing out on providing some details now.

Hi all. I get the similar problem. My bug is probably that I use the wrong combination of Softmax and Loss function, so value of grads are super small.

For me,
My model doesn’t seem to be training.
Upon checking a = list(model.parameters())[0].clone() and b = a = list(model.parameters())[0].clone() before and after the call to loss.backward() and optimizer.step(). a==b returns false
Upon printing list(model.parameters())[0].grad it returns a matrix of all super small number like in order of 10^-8.

3 Likes

Hello There!

What do you mean by the following?

I don’t understand why the gradients are super small in your case.

Hi, I am also facing the same problem. But in my case list(model.parameters())[0].grad is None. How can I find out the mistake? Any suggestions?

Thanks

In my case, it was because the Softmax function calculate the zero value tensor. So the NLL loss was unchanging and the weights were not being updated.

1 Like

I am also dealing with the same problem. I get p.grad is None for all the parameters in the network. How do I check if the computation graph is broken somewhere?

@Gkv @RGaonkar I have the same problem here.
My weights never be updated and my model list(model.parameters())[0].grad is None . And my Network didn’t return any grad_fn in tensor.
It turns out I have to return the function instead of variable. I don’t know it is correct or not but it worked for me.
Bug source code

class Model(nn.Module):
    def __init__(self, size_in, size_out):
        super(Model, self).__init__()
        self.ReLU1 = nn.ReLU()
        self.conv1 = nn.Conv2d(in_channels=size_in, out_channels=size_out,
                               kernel_size=(3, 3),
                               stride=(1, 1),
                               padding=(1, 1),
                               dilation=1,
                               groups=1,
                               bias=False)

    def forward(self, x):
        x = self.ReLU1(x)
        x = self.conv1(x)
        return x

You guys could try this:

class Model(nn.Module):
    def __init__(self, size_in, size_out):
        super(Model, self).__init__()
        self.ReLU1 = nn.ReLU()
        self.conv1 = nn.Conv2d(in_channels=size_in, out_channels=size_out,
                               kernel_size=(3, 3),
                               stride=(1, 1),
                               padding=(1, 1),
                               dilation=1,
                               groups=1,
                               bias=False)

    def forward(self, x):
        x = self.ReLU1(x)
        return self.conv1(x)

what do you mean by
verify that you are not setting requires_grad=True to all parameters of your network, as it would avoid backprop through the network

1 Like

Ignore this message. The problem was that I had too shallow of a network! Sorry about the noise! It converged after I added a couple of conv layers.

I am having a very similar issue. Initially, I had a 3 conv+relu+maxpool followed by 2 linears + relu and sigmoid for binary classification. It would not learn – all the parameters would be set to something very close to 0, and the output would be “random”. I reduced the size of the network to Conv/relu + Linear/relu + Linear/sigmoid => same problem.

1 Like

similar problem faced by me as I was using relu in the layer and then apply softmax on the layer and then cross entropy so make sure that last layer should not contain any activation function if you are using softmax

1 Like