Gradients exist but weights not updating

WR01 · June 29, 2018, 3:53pm

So, I have a deep convolutional network with an lstm layer, and after the ltsm layer it splits off to compute two different functions (using two different linear layers) whose results are then added together to form the final network output.

When I compute the loss of the network so that I can have it compute the gradients and update the weights, I have it do a few operations and then have it compute the loss between the derived value and the calculated target value.

def update(output, target):
    # target output is calculated outside the function
    # operations on output
    loss(output, target).backward()
    self.optimizer.step()

The network has some loss (sometimes in a very small order of magnitude, but sometimes also on higher orders of magnitude), for example a few of the losses:

tensor(1.00000e-04 *
   5.7420)
tensor(2.7190)
tensor(0.9684)

It also has gradients as calculated here:

for param in self.parameters():
    print(param.grad.data.sum())

Which outputs:

tensor(1.00000e-03 *
   1.9996)
tensor(1.00000e-03 *
   2.6101)
tensor(1.00000e-02 *
   -1.3879)
tensor(1.00000e-03 *
   -4.5834)
tensor(1.00000e-02 *
   2.1762)
tensor(1.00000e-03 *
   3.6246)
tensor(1.00000e-03 *
   6.6234)
tensor(1.00000e-02 *
   2.9373)
tensor(1.00000e-02 *
   1.2680)
tensor(1.00000e-03 *
   1.8791)
tensor(1.00000e-02 *
   1.7322)
tensor(1.00000e-02 *
   1.7322)
tensor(0.)
tensor(0.)
tensor(1.00000e-03 *
   -6.7885)
tensor(1.00000e-02 *
   9.7793)

And:

tensor(2.4620)
tensor(0.9544)
tensor(-26.2465)
tensor(0.2280)
tensor(-219.2602)
tensor(-2.7870)
tensor(-50.8203)
tensor(3.2548)
tensor(19.6163)
tensor(-18.6029)
tensor(3.8564)
tensor(3.8564)
tensor(0.)
tensor(0.)
tensor(0.8040)
tensor(-0.1157)

But when I compare the weight before and after running the optimizer, I get the result that the weights are equal to each other.

Code to see if weights change:

before = list(neuralnet.parameters())
neuralnet.update()
after = list(neuralnet.parameters())
for i in range(len(before)):
    print(torch.equal(before[i].data, after[i].data))

The above returns True for each iteration.

albanD · June 29, 2018, 3:55pm

Hi,

When you get the parameters of your net, it does not clone the tensors. So in your case, before and after contain the same tensors. So when the optimizer update the weights in place, it updates both your lists. You can try and change one weight by hand, they will still remain the same.

Shivam_Chandhok · March 30, 2019, 6:57am

So, how can we solve the problem.

ptrblck · March 30, 2019, 10:52am

You could call .clone() on each parameter so that a deep copy will be used.

Rajiv_Teja · May 18, 2019, 11:43am

Were you able to solve this issue? I am having the same problem.

Nazila-H · December 21, 2020, 2:00pm

Hi everyone, I have the same problem, and my code is as following:

for batch_idx, (data, label) in enumerate(data_loader_new): 
    data, label = data.to(device), label.to(device)
    optimizer.zero_grad()
    output= model(data)
    Before = list(model.parameters())[0].clone()
    loss = loss_fn(pred=output, target=label)
    loss.backward(retain_graph=True)
    grad = torch.autograd.grad(outputs=loss, inputs=data) # I need it for some computations
    optimizer.step()
    After = list(model.parameters())[0].clone()
    print(torch.equal(Before.data, After.data))

It returns True.
I appreciate any help from your side.

Nazila-H · December 22, 2020, 9:57am

My problem has just solved.
I mention it here, maybe it will help the others.
I used model.parameters() in optimizer = torch.optim.SGD(model.parameters(), lr=x, momentum=y) and now it returns False (before that I did not mention the model name correctly).

Livent_Liang · January 12, 2021, 3:16am

I have also met this problem. My mistake is that optimizer.zero_grad() was run after loss.backward(). So optimizer.step() was updated on zero gradients… Hope this may help someone making the same mistake.

Shubham_Dhayarkar · February 12, 2021, 5:00am

same is the problem with my network pipeline for me the weights are not getting and this is the issue everytime