Gradients exist but weights not updating

So, I have a deep convolutional network with an lstm layer, and after the ltsm layer it splits off to compute two different functions (using two different linear layers) whose results are then added together to form the final network output.

When I compute the loss of the network so that I can have it compute the gradients and update the weights, I have it do a few operations and then have it compute the loss between the derived value and the calculated target value.

def update(output, target):
    # target output is calculated outside the function
    # operations on output
    loss(output, target).backward()
    self.optimizer.step()

The network has some loss (sometimes in a very small order of magnitude, but sometimes also on higher orders of magnitude), for example a few of the losses:

tensor(1.00000e-04 *
   5.7420)
tensor(2.7190)
tensor(0.9684)

It also has gradients as calculated here:

for param in self.parameters():
    print(param.grad.data.sum())

Which outputs:

tensor(1.00000e-03 *
   1.9996)
tensor(1.00000e-03 *
   2.6101)
tensor(1.00000e-02 *
   -1.3879)
tensor(1.00000e-03 *
   -4.5834)
tensor(1.00000e-02 *
   2.1762)
tensor(1.00000e-03 *
   3.6246)
tensor(1.00000e-03 *
   6.6234)
tensor(1.00000e-02 *
   2.9373)
tensor(1.00000e-02 *
   1.2680)
tensor(1.00000e-03 *
   1.8791)
tensor(1.00000e-02 *
   1.7322)
tensor(1.00000e-02 *
   1.7322)
tensor(0.)
tensor(0.)
tensor(1.00000e-03 *
   -6.7885)
tensor(1.00000e-02 *
   9.7793)

And:

tensor(2.4620)
tensor(0.9544)
tensor(-26.2465)
tensor(0.2280)
tensor(-219.2602)
tensor(-2.7870)
tensor(-50.8203)
tensor(3.2548)
tensor(19.6163)
tensor(-18.6029)
tensor(3.8564)
tensor(3.8564)
tensor(0.)
tensor(0.)
tensor(0.8040)
tensor(-0.1157)

But when I compare the weight before and after running the optimizer, I get the result that the weights are equal to each other.

Code to see if weights change:

before = list(neuralnet.parameters())
neuralnet.update()
after = list(neuralnet.parameters())
for i in range(len(before)):
    print(torch.equal(before[i].data, after[i].data))

The above returns True for each iteration.

Hi,

When you get the parameters of your net, it does not clone the tensors. So in your case, before and after contain the same tensors. So when the optimizer update the weights in place, it updates both your lists. You can try and change one weight by hand, they will still remain the same.

So, how can we solve the problem.

You could call .clone() on each parameter so that a deep copy will be used.

Were you able to solve this issue? I am having the same problem.

Hi everyone, I have the same problem, and my code is as following:

for batch_idx, (data, label) in enumerate(data_loader_new): 
    data, label = data.to(device), label.to(device)
    optimizer.zero_grad()
    output= model(data)
    Before = list(model.parameters())[0].clone()
    loss = loss_fn(pred=output, target=label)
    loss.backward(retain_graph=True)
    grad = torch.autograd.grad(outputs=loss, inputs=data) # I need it for some computations
    optimizer.step()
    After = list(model.parameters())[0].clone()
    print(torch.equal(Before.data, After.data))

It returns True.
I appreciate any help from your side.

My problem has just solved.
I mention it here, maybe it will help the others.
I used model.parameters() in optimizer = torch.optim.SGD(model.parameters(), lr=x, momentum=y) and now it returns False (before that I did not mention the model name correctly).

I have also met this problem. My mistake is that optimizer.zero_grad() was run after loss.backward(). So optimizer.step() was updated on zero gradients… Hope this may help someone making the same mistake.