# Optimizer not updating the weights/parameters

I am using ADAM with LBFGS. The loss doesn’t change with each epoch when I try to use optimizer.step() with the closure function. If I use only ADAM with optimizer.step(), the loss function converges (albeit slowly which is why i decided to use LBFGS). Can you tell me where is my code wrong?

``````optimizer1 = torch.optim.Adam(net.parameters(),lr = 0.0001)
optimizer2 = torch.optim.LBFGS(net.parameters(),lr=0.001)
## Training
iterations = 10
loss_array = np.zeros((iterations))

for epoch in range(iterations):
def closure():

# # Data driven/boundary loss
# net_bc_out1 = net(pt_x_bc1)
# net_bc_out2 = net(pt_x_bc2)
# mse_u1 = mse_cost_function(net_bc_out1, pt_u_bc)
# mse_u2 = mse_cost_function(net_bc_out2, pt_u_bc)
# mse_u = mse_u1 + mse_u2

## Physics informed loss
all_zeros = np.zeros((500,1))
f_out = f(pt_x_collocation, net) # output of f(x,t)
mse_pinn = mse_cost_function(f_out, pt_all_zeros)

## Training data loss
u_train = net(pt_x_collocation)
mse_training = mse_cost_function(u_train, pt_u_true)

# Combining the loss functions
loss = mse_pinn + mse_training
loss_array[epoch] = loss
loss.backward()
return loss

if epoch<5000:
optimizer1.step(closure)
else:
optimizer2.step(closure)

print(epoch,"Traning Loss:",loss.data)

``````

This is the output:

0 Traning Loss: tensor(0.4883)
1 Traning Loss: tensor(0.4883)
2 Traning Loss: tensor(0.4883)
3 Traning Loss: tensor(0.4883)
4 Traning Loss: tensor(0.4883)
5 Traning Loss: tensor(0.4883)
6 Traning Loss: tensor(0.4883)
7 Traning Loss: tensor(0.4883)
8 Traning Loss: tensor(0.4883)
9 Traning Loss: tensor(0.4883)

Thanks

You are detaching the output from the computation graph by rewrapping it into a deprecated `Variable` here:

``````pt_u_true = Variable(torch.from_numpy(u_true).float(), requires_grad=False).to(device)
``````

However, based on this code snippet it seems you are also using numpy arrays, which won’t be attached to the computation graph by Autograd in the first place.
If you need to use other libraries such as numpy, you would need to write custom `autograd.Function`s and implement the `backward` method manually.

That doesn’t seem right, as no training should happen.

The training is taking place when i use ADAM and just perform optimizer.step(). Here’s the code:

``````## Training
iterations = 10
loss_array = np.zeros((iterations))

for epoch in range(iterations):

## Physics informed loss
all_zeros = np.zeros((500,1))
f_out = f(pt_x_collocation, net) # output of f(x,t)
mse_pinn = mse_cost_function(f_out, pt_all_zeros)

## Training data loss
u_train = net(pt_x_collocation)
mse_training = mse_cost_function(u_train, pt_u_true)

# Combining the loss functions
loss = mse_pinn + mse_training
loss_array[epoch] = loss
loss.backward()

optimizer1.step()

print(epoch,"Traning Loss:",loss.data)

``````

The output for just 10 epochs is:
0 Traning Loss: tensor(0.7643)
1 Traning Loss: tensor(0.7007)
2 Traning Loss: tensor(0.6511)
3 Traning Loss: tensor(0.6144)
4 Traning Loss: tensor(0.5886)
5 Traning Loss: tensor(0.5705)
6 Traning Loss: tensor(0.5555)
7 Traning Loss: tensor(0.5401)
8 Traning Loss: tensor(0.5223)
9 Traning Loss: tensor(0.5022)

Could you check the gradients of the parameters after the backward call?
I would assume they are all set to zero since the `optimizer.zero_grad()` would be resetting them.
I don’t see where the computation graph should be coming from in your code snippet based on my previous statements or a) detaching the graph explicitly and b) using numpy.

Hi, I have printed the gradients of first hidden layer to show that they are indeed being calculated when i use optimizer.step(). Here’s the code snippet and below that is the output for the first few epochs:

``````## Training
iterations = 10
loss_array = np.zeros((iterations))

for epoch in range(iterations):

## Physics informed loss
all_zeros = np.zeros((500,1))
f_out = f(pt_x_collocation, net) # output of f(x,t)
mse_pinn = mse_cost_function(f_out, pt_all_zeros)

## Training data loss
u_train = net(pt_x_collocation)
mse_training = mse_cost_function(u_train, pt_u_true)

# Combining the loss functions
loss = mse_pinn + mse_training
loss_array[epoch] = loss
loss.backward()

# for name, param in net.named_parameters():
optimizer1.step()

print(epoch,"Traning Loss:",loss.data)

``````

The output is:
Hidden layer 1 weights gradient: tensor([[ 4.5100e-07],
[ 8.3037e-08],
[-3.1868e-09],
[-1.2420e-08],
[ 1.9485e-07],
[ 1.8525e-08],
[-3.7920e-07],
[-3.4814e-07],
[-1.6464e-07],
[ 1.1097e-07]])
0 Traning Loss: tensor(0.1058)
Hidden layer 1 weights gradient: tensor([[-5.2541e-07],
[ 1.9976e-06],
[ 1.1666e-07],
[ 1.9102e-07],
[-7.6507e-07],
[-5.7566e-08],
[ 2.1010e-06],
[ 5.4820e-07],
[ 7.9201e-07],
[ 8.8915e-07]])
1 Traning Loss: tensor(0.1038)
Hidden layer 1 weights gradient: tensor([[-4.4913e-08],
[ 4.0377e-07],
[ 5.9259e-08],
[ 6.0829e-08],
[-2.1468e-07],
[-1.2919e-08],
[ 3.1558e-07],
[ 7.9497e-08],
[ 2.5290e-07],
[ 3.2178e-07]])
2 Traning Loss: tensor(0.0976)
Hidden layer 1 weights gradient: tensor([[ 4.2063e-07],
[-1.0188e-06],
[-4.8912e-08],
[-4.5130e-08],
[ 2.5593e-07],
[-5.5157e-09],
[-1.3609e-06],
[-3.6893e-07],
[-2.9356e-07],
[-2.5360e-07]])
3 Traning Loss: tensor(0.0938)

Thanks for the update!
I missed this part of the code:

``````f_out = f(pt_x_collocation, net) # output of f(x,t)
mse_pinn = mse_cost_function(f_out, pt_all_zeros)
...
loss = mse_pinn + mse_training
loss.backward()
``````

which is not detaching the computation graph and will thus create the gradients.
`mse_training` will still be detached and not influence the gradient calculation.

EDIT: I also misread the second part and don’t know why you are setting the `requires_grad` attribute of the targets to `True`.
In any case that’s not necessary if you don’t need to update the targets and remove the `Variable` usage.
Once this is done, check the gradients again, verify that they are calculated, check the parameters before and after the `step()` operation and you should see the update.