PyTorch optimizer.step() function doesn't update weights

Jean-Rassaire · January 17, 2022, 4:10pm

I am new to Pytorch and I tried to use SGD to perform a fitting using a statistical model .
But the problem is that the optimizer.step() part does not work. I output optimizer parameters() after each epoch, and the weights do not change.

here is my code:

shape_parameters_estimated = torch.zeros([batch, stat8model.variance.size], dtype=torch.float32, device=device, requires_grad=True)

learning_rate = 0.1
optimizer = torch.optim.Adagrad([shape_parameters_estimated], lr=learning_rate)
max_iter = 20
for iteration in range(max_iter):
theta_estimated = shape_parameters_estimated

fit_image, fit_landmarks = stat_model(theta_estimated)

loss_lm = lm_loss(target_landmark, fit_landmarks)
print('theta=', theta_estimated[0]) #I'am printing the parameter here, but doesn't change
loss_value = loss_lm
optimizer.zero_grad()
loss_value.backward()
optimizer.step()

Please, can anyone help?

ptrblck · January 17, 2022, 8:02pm

Your code works fine after replacing the undefined classes:

shape_parameters_estimated = torch.zeros(1, 1, dtype=torch.float32, requires_grad=True)
stat_model = nn.Linear(1, 1)

learning_rate = 0.1
optimizer = torch.optim.Adagrad([shape_parameters_estimated], lr=learning_rate)
max_iter = 20
criterion = nn.MSELoss()

for iteration in range(max_iter):
    theta_estimated = shape_parameters_estimated
    out = stat_model(theta_estimated)

    loss = criterion(out, torch.rand_like(out))
    print('theta=', theta_estimated[0])
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Output:

theta= tensor([0.], grad_fn=<SelectBackward0>)
theta= tensor([0.1000], grad_fn=<SelectBackward0>)
theta= tensor([0.1397], grad_fn=<SelectBackward0>)
theta= tensor([0.1831], grad_fn=<SelectBackward0>)
theta= tensor([0.2331], grad_fn=<SelectBackward0>)
theta= tensor([0.2467], grad_fn=<SelectBackward0>)
theta= tensor([0.2960], grad_fn=<SelectBackward0>)
theta= tensor([0.3146], grad_fn=<SelectBackward0>)
theta= tensor([0.3339], grad_fn=<SelectBackward0>)
theta= tensor([0.3832], grad_fn=<SelectBackward0>)
theta= tensor([0.4118], grad_fn=<SelectBackward0>)
theta= tensor([0.4494], grad_fn=<SelectBackward0>)
theta= tensor([0.4717], grad_fn=<SelectBackward0>)
theta= tensor([0.4924], grad_fn=<SelectBackward0>)
theta= tensor([0.5206], grad_fn=<SelectBackward0>)
theta= tensor([0.5551], grad_fn=<SelectBackward0>)
theta= tensor([0.5686], grad_fn=<SelectBackward0>)
theta= tensor([0.5911], grad_fn=<SelectBackward0>)
theta= tensor([0.6077], grad_fn=<SelectBackward0>)
theta= tensor([0.6128], grad_fn=<SelectBackward0>)

Maybe you are detaching the computation graph in one of your modules.

thecho7 · January 18, 2022, 1:50am

If you want to get the loss value,

loss_value = loss_lm.data()
loss_lm.backward()

Other issues are clearly explained by @ptrblck

Jean-Rassaire · January 18, 2022, 10:31am

Thanks @ptrblck
I checked the output of my statistics model to see if I am detaching the computation graph, but in fact it doesn’t seem to be detached.

here is the code and the output of stat_model as well as the parameter.

fit_image, fit_landmarks = stat_model(theta_estimated)
print('fit landmark =', fit_landmarks)
print('theta=', theta_estimated[0])

fit landmark = tensor([[[ -93.7965, 41.2901, -457.3236, -118.0355, 23.2566, -394.9610,
-129.1178, -22.9549, -413.0228, -60.4286],
[ -14.2109, -392.8283, -112.7418, 32.2946, -766.2040, -65.1696,
31.8340, -795.4185, -35.9762, -2.9846],
[-788.4980, -108.9594, -9.7898, -800.8992, -49.7171, 31.8733,
-808.2354, -93.7749, 49.2322, -800.7413]],

    [[ -86.7400,   28.3823, -492.1542,  -93.9346,   29.3518, -440.1688,
      -111.6750,    2.4048, -448.3925,  -44.6627],
     [ -23.6792, -451.9865, -117.1925,   11.8526, -759.7421,  -62.1217,
        39.6679, -760.5637,  -32.7417,   18.6626],
     [-752.7242,  -92.1307,  -17.7508, -781.5572,  -45.2557,   43.6841,
      -768.3282, -102.0547,   38.3073, -769.6452]],

    [[ -78.7095,   17.3260, -457.3948, -111.3670,   22.8568, -394.5397,
      -122.8169,  -18.9796, -403.8733,  -52.9702],
     [ -10.9037, -390.0055, -114.4793,   26.6313, -784.2676,  -56.4531,
        25.9526, -799.3102,  -33.1011,    2.2568],
     [-785.2794, -100.1249,  -14.8437, -806.3973,  -41.8734,   26.0318,
      -805.1998,  -95.0777,   42.2077, -806.0328]]], device='cuda:0',
   grad_fn=<ReshapeAliasBackward0>).

theta= tensor([-1.6324, -3.0613, 1.1880, -1.6660, -0.3259, -4.0532, 2.8580, -2.3427,
2.1786, 2.7516, -0.2029, -5.1932, -1.2288, -2.5457, 0.9505, -1.3430,
0.4178, -1.9190, 1.5987, 1.4101, -1.3779, -2.0695, -3.3447, 2.0617,
-3.6703, 2.3111, -2.0738, -7.8953, 1.4093, 0.3684, -0.1465, -5.3603,
3.9112, -1.7550, 3.0453, -3.8051, -2.8779, -0.9223, -2.2068, 0.7696,
-3.6210, -0.1781, 1.4597, -0.9524, 0.5832, -3.4065, -3.0134, -2.4941],
device=‘cuda:0’, grad_fn=)

you can see that grad_fn= for the output used for the loss and grad_fn= for the parameter.

what else could be detached?

ptrblck · January 18, 2022, 6:04pm

theta_estimated is assigned to shape_parameters_estimated, which was initialized with torch.zeros. In your current output theta_estimated shows values which are different from zeros, so I guess that it was already updated?

Jean-Rassaire · January 18, 2022, 6:45pm

Thanks again for your prompt reply @ptrblck .
The current value of theta_estimated is the initial value. I didn’t initialize with zero, I initialized with that random tensor. So it is not updating. Assigning shape_parameters_estimated to theta_estimated shouldn’t be a problem, right?

ptrblck · January 18, 2022, 6:48pm

In your posted code snippet you’ve initialized it as:

shape_parameters_estimated = torch.zeros([batch, stat8model.variance.size], dtype=torch.float32, device=device, requires_grad=True)

Could you post a minimal, executable code snippet to reproduce the issue, please?

Jean-Rassaire · January 18, 2022, 7:01pm

Thanks @ptrblck

The problem is that the statistical model required local libraries. So it’s not possible to have those libraries online.
But here is more detailed code:

device = torch.device(‘cuda’)
torch.manual_seed(0)

print(‘FITTING DEMO … (does not include pose or camera parameters)’)
print(’============’)
batch = 3

image_width = 224

image_height = 224

number_of_channels = 3

voxel_size = 2.0

view1_alpha = 0
view1_beta = -90

view2_alpha = -90
view2_beta = -90

number_of_channels = number_of_channels

transform = transforms.Compose([
transforms.ToPILImage(),
transforms.ToTensor(),

])

femur_ids = []
with open(‘data/femur_landmark_id.csv’, mode=‘r’) as inp:
reader = csv.reader(inp, delimiter=’,’)
for rows in reader:
femur_ids.append(int(float(rows[1])))

tibia_ids = []
with open(‘data/tibia_landmark_id.csv’, mode=‘r’) as inp:
reader = csv.reader(inp, delimiter=’,’)
for rows in reader:
tibia_ids.append(int(float(rows[1])))

carm = MobileCArm(source_to_detector_distance=40,
immersion_depth=0,
free_space=0,
sensor_height=200,
sensor_width=100)

dmfcModel = StatismoFileReaderDmfc.readDmfcModel(“data/models/kneeDmfc.h5”)

decoder = decoder = DecoderDmfc(dmfcModel,
carm,
first_view_alpha=view1_alpha,
first_view_beta=view1_beta,
second_view_alpha=view2_alpha,
second_view_beta=view2_beta,
number_of_channels=number_of_channels,
transform=transform,
voxel_size=voxel_size,
landmark_indices=[femur_ids, []]
).to(device=device)

tparam = []
for i in range(batch):
alpha = np.random.normal(np.zeros(1), 2.0, dmfcModel.feature_classes.variance.size)
tparam.append(alpha)

shape_parameters = torch.tensor(tparam, dtype=torch.float32, device=device) # torch.randn([batch, dmfcModel.feature_classes.variance.size], dtype=torch.float32, device=device)

theta = shape_parameters
target_image, target_landmarks = decoder(theta)

param = []
for i in range(batch):
alpha = np.random.normal(np.zeros(1), 1.5 + 1.0 / (i + 1), dmfcModel.feature_classes.variance.size)
param.append(alpha)

shape_parameters_estimated = torch.tensor(param, dtype=torch.float32, device=device, requires_grad=True)

setup losses we use

loss = nn.MSELoss()
prior_loss = GaussianPrior()
lm_loss = LandmarkLoss()

setup (visual) reporting

report_interval = 1
fig, axs = pyplot.subplots(batch, 3)

print(‘LANDMARK FITTING … (not optimizing for color or light)’)
print(’================’)
learning_rate = 1
optimizer = torch.optim.Adagrad([
shape_parameters_estimated],
lr=learning_rate)
max_iter = 5
for iteration in range(max_iter):
theta_estimated = shape_parameters_estimated

fit_image, fit_landmarks = decoder(theta_estimated)



loss_lm = loss(target_landmarks, fit_landmarks)
print('fit land', fit_landmarks)
print('theta=', theta_estimated[0])

loss_value = loss_lm 

loss_value.backward()
optimizer.step()
optimizer.zero_grad()

ptrblck · January 18, 2022, 7:38pm

In case you are using other libraries and are “leaving” PyTorch, note that this will detach the computation graph as well (e.g. if you are using numpy). In these cases you would have to implement a custom autograd.Function with the manual definition of the backward pass.
Also check if you are rewrapping any tensors via x = torch.tensor(x, requires_grad=True) as this will also detach the tensor.

Jean-Rassaire · January 18, 2022, 8:23pm

Thanks again @ptrblck
Yes, I am using Numpy.
here an example of what I am doing to create a tensors from list and numpy.
drr = Variable(drr, requires_grad=theta.requires_grad)
landmark_batch = Variable(torch.tensor(landmark_batch, dtype=torch.float32, device=theta.device), requires_grad=theta.requires_grad)

Would this detach drr and landmark_batch? if yes, how can I write a custom autograd.Function to create tensors from numpy and list?

ptrblck · January 18, 2022, 9:02pm

Yes, both operations are rewrapping the tensors and are thus detaching them (besides Variables being deprecated since PyTorch 0.4. ).
Check this tutorial on how to write custom autograd.Functions.

Jean-Rassaire · January 19, 2022, 9:05am

Thanks So much @ptrblck

I have written a minimal, executable code snippet to reproduce the issue.

Here is the code, you will see that the parameter is not updating. You can run it directly, without the need for additional libraries.

import numpy as np
import torch
import torch.nn as nn

from torch.autograd import Variable

class Torch_stat_model(nn.Module):

def __init__(self, initial_landmarks):  # landmark size (batch_size nx3)
    super(Torch_stat_model, self).__init__()
    self.initial_land = initial_landmarks

def forward(self, shape_params):  # shape-param size (batch_size nx1)
    land_numpy = (shape_params * self.initial_land).cpu().detach().numpy()
    return land_numpy

class Decoder_stat_model(nn.Module):

def __init__(self, initial_landmarks, scale=1.0):
    super(Decoder_stat_model, self).__init__()
    self.scale = scale
    self.model = Torch_stat_model(initial_landmarks)

def forward(self, theta):
    land_detached = self.model(theta)
    land_tensor = Variable(torch.from_numpy(land_detached).to(dtype=torch.float32).to(device=theta.device),
                           requires_grad=theta.requires_grad)
    return land_tensor * self.scale

device = torch.device(‘cuda’)
torch.manual_seed(0)

batch = 3

initial_landmark = torch.rand((batch, 48, 3)).to(device=device)

decoder = decoder = Decoder_stat_model(initial_landmark).to(device=device)

tparam = []
for i in range(batch):
alpha = np.random.normal(np.zeros(1), 2.0, 48)
tparam.append(alpha)

shape_parameters = torch.tensor(tparam, dtype=torch.float32,
device=device)

theta = torch.unsqueeze(shape_parameters, 2)
print(theta.shape)
target_landmarks = decoder(theta)

param = []
for i in range(batch):
alpha = np.random.normal(np.zeros(1), 1.5 + 1.0 / (i + 1), 48)
param.append(alpha)

shape_parameters_estimated = torch.zeros([batch, 48, 1], dtype=torch.float32, device=device, requires_grad=True)

loss = nn.MSELoss()

report_interval = 1
learning_rate = 1
optimizer = torch.optim.Adagrad([
shape_parameters_estimated],
lr=learning_rate)
max_iter = 5
for iteration in range(max_iter):
theta_estimated = shape_parameters_estimated

fit_landmarks = decoder(theta_estimated)
print(theta_estimated[0][0])

loss_lm = loss(target_landmarks, fit_landmarks)

loss_value = loss_lm

loss_value.backward()
optimizer.step()
optimizer.zero_grad()

# visualization
if iteration % report_interval == 0:
    print(f'epoch {iteration + 1}: loss={loss_value:.8f}')

You mentioned writing a custom `autograd.Function, how would I include that in the Decoder_stat_model class to attach the numpy array to the computation graph?

Jean-Rassaire · January 19, 2022, 11:19am

@ptrblck , It is important to note that this operation (shape_params * self.initial_land).cpu().detach().numpy() in the Torch_stat_model class, is just to illustrate that I have to deal with numpy arrays and lists. I know that in this minimal code we can bypass this operation but it is just to show the type of operation that I have in my code. Because of the local libraries I use, I have to deal with numpy arrays and lists.

Thanks again. Any help with be appreciated.

ptrblck · January 19, 2022, 6:40pm

This use case fits exactly into the custom autograd.Function usage, so you would have to write the backward method manually as described in the tutorial.
Autograd cannot track the numpy operations (or generally any operations from other libraries) which is why the backward is needed.