Loss not updating in PyTorch?

johnny69 · December 26, 2022, 11:58pm

I’m trying to create a collaborative filtering model to generate recommendations for patient/psychologist pairs. Here’s my model so far:

class Recommender(nn.Module):
  def __init__(self, patients, psychologists):
    super().__init__()
    self.patient_params = nn.ParameterDict({patient.name: nn.Parameter(torch.randn(10, requires_grad=True, dtype=torch.float)) for patient in patients})
    self.psych_params = nn.ParameterDict({psych.name: nn.Parameter(torch.randn(10, requires_grad=True, dtype=torch.float)) for psych in psychologists})
  def forward(self, matches):
    output = []
    for psych_match in matches:
      patient = psych_match[0]
      psych = psych_match[1]
      patient_params = self.patient_params[patient.name]
      psych_params = self.psych_params[psych.name]
      output.append(patient_params @ psych_params.T)
    return torch.tensor(output, requires_grad=True)

I’m able to make predictions with this model, and run through the training loop, but my loss and weights don’t update.

model = Recommender(patients, psychologists)
loss_fn = nn.MSELoss()
optimizer = torch.optim.SGD(params=model.parameters(), lr=0.01)
for epoch in range(50):
  model.train()
  y_preds = model(X)
  loss = loss_fn(y_preds, y)
  optimizer.zero_grad()
  loss.backward() 
  optimizer.step()
  if epoch % 10 == 0:
    print(f"Epoch: {epoch} | Loss: {loss}")

And the output is:

Epoch: 0 | Loss: 20.708234786987305
Epoch: 10 | Loss: 20.708234786987305
Epoch: 20 | Loss: 20.708234786987305
Epoch: 30 | Loss: 20.708234786987305
Epoch: 40 | Loss: 20.708234786987305

For simplicity’s sake, I’ve made the y values numerical, so I’d assume that MSELoss would be work fine here.

Any idea what might be going wrong or how to fix it?

ConvolutionalAtom · December 27, 2022, 2:11am

If you look at the link below:
https://pytorch.org/docs/stable/generated/torch.tensor.html

It says torch.tensor does not preserve autograd history, hence no grad value is preserved.
In you last step of forward function,you use this function, which I think leads to destruction of computation graph

I think it may be not necessary to turn into tensor with require grad in the last step since it already has grad as it originates from param.Parametere

J_Johnson · December 27, 2022, 4:10am

class Recommender(nn.Module):
  def __init__(self, patients, psychologists):
    super().__init__()
    self.patient_params = nn.ParameterDict({patient.name: nn.Parameter(torch.randn(10, requires_grad=True, dtype=torch.float)) for patient in patients})
    self.psych_params = nn.ParameterDict({psych.name: nn.Parameter(torch.randn(10, requires_grad=True, dtype=torch.float)) for psych in psychologists})
  def forward(self, matches):
    output = []
    for psych_match in matches:
      patient = psych_match[0]
      psych = psych_match[1]
      patient_params = self.patient_params[patient.name]
      psych_params = self.psych_params[psych.name]
      output.append(patient_params @ psych_params.T)
    return torch.tensor(output, requires_grad=True)

As @ConvolutionalAtom noted, your model isn’t maintaining the gradients graph all the way through, given the way you currently have structured the forward pass.

You could try something like:

  def forward(self, matches):
    output = torch.empty(0,1)
    for psych_match in matches:
      patient = psych_match[0]
      psych = psych_match[1]
      patient_params = self.patient_params[patient.name]
      psych_params = self.psych_params[psych.name]
      output=torch.cat([output,(patient_params @ psych_params.T).view(1,-1)])
    return output

johnny69 · December 27, 2022, 4:58am

Perfect, now it’s working as expected. Thanks!