nn.CrossEntropyLoss: - optimizer not updating weights

Hello,

I have a very simple prediction network that predicts the labels for a set of pixels. I am calculating the loss as follows:

m = pixel_classifier()
optimizer = optim.Adam(m.parameters(), lr = 1e-2, betas=(0.9, 0.999))
criterion = nn.CrossEntropyLoss()

I have run

for name, params in m.named_parameters():

    if params.requires_grad == True:
        print(name)

to check if the model params have grad updates enabled and all parameters are being printed.

the training is as follows:

m.train()
predictions = m(input)
loss = criterion(predictions, true_label)
loss.backward()

It is at this stage I get the following error:

element 0 of tensors does not require grad and does not have a grad_fn

not sure what is happening here. Can anyone help?

Check this one. Your can check input.requires_grad and if not True this is the problem.

Note: input.requires_grad=True all Tensors computed from input will have requires_grad attribute True. It is like covid-19.

This way you can fool a pretrained network to change its prediction by modifying the input features.

You are most likely detaching the computation graph in your model somewhere so could you post the model definition, please?

@blackbirdbarber you do not need to set input.requires_grad = True and it should not fix this issue.

1 Like

I actually found the issue, it’s related to the comment of yours I responded to on another post.

Here’s the problem:

m = pixel_classifier()
pred = m(train_batch)
print(pred.grad_fn)
# <AddmmBackward0 object at 0x7f865771f590>
_, pred = torch.max(pred, dim = 1)
print(pred.grad_fn)
# None

My loss function is nn.CrossEntropyLoss but my ground truth is just a NxN matrix of labels and not logits - that’s why I was using the second value from torch.max() to make my own NxN matrix of predicted labels to pass to the loss function. But unfortunately this is causing the issue

This makes sense. As explained in the other post, nn.CrossEntropyLoss expects raw logits from the model so don’t use torch.(arg)max on the model outputs and just pass the output of the last layer (without any non-linearity) to the loss function.

Sorry I am confused.

The output from the last layer has shape (N * N x num_classes) but the given ground truth has shape N*N, moreover the ground truth is labels aka the matrix is populated by the labels of the classes and not logits.

So if I pass the output of the last layer to the loss function with my ground truth that gives an error.

Is that what you meant?

nn.CrossEntropyLoss expects a model output containing logits in the shape [batch_size, nb_classes, *] and a target containing class indices in the range [0, nb_classes] in the shape [batch_size, *] as described in the docs.