Optimising model.parameters and custom learnable parameter together using torch.optim gives non-leaf tensor error

Framework: PyTorch

I am trying to optimise a custom nn.parameter(Temperature) used in softmax calculation along with the model parameters using a single Adam optimiser while model training. But doing so gives the following error:

ValueError: can’t optimize a non-leaf Tensor

Here is my custom loss function:

class CrossEntropyLoss2d(torch.nn.Module):
    def __init__(self, weight=None):
	    self.temperature = torch.nn.Parameter(torch.ones(1, requires_grad=True, device=device))
	    self.loss = torch.nn.NLLLoss(weight)
    def forward(self, outputs, targets):
	    T_logits = self.temp_scale(outputs)
	    return self.loss(torch.nn.functional.log_softmax(T_logits, dim=1), targets)

    def temp_scale(self, logits):
	    temp = self.temperature.unsqueeze(1).expand(logits.size(1), logits.size(2), logits.size(3))
	    return logits/temp 

Here is the part of training code:

criterion = CrossEntropyLoss2d(weight)
params = list(model.parameters()) +list(criterion.temperature)
optimizer = Adam(params, 5e-4, (0.9, 0.999),  eps=1e-08, weight_decay=1e-4)


File "train_my_net_city.py", line 270, in train
optimizer = Adam(params, 5e-4, (0.9, 0.999),  eps=1e-08, weight_decay=1e-4)
File "/home/saquib/anaconda3/lib/python3.8/site-packages/torch/optim/adam.py", line 48, in __init__
super(Adam, self).__init__(params, defaults)
File "/home/saquib/anaconda3/lib/python3.8/site-packages/torch/optim/optimizer.py", line 54, in __init__
File "/home/saquib/anaconda3/lib/python3.8/site-packages/torch/optim/optimizer.py", line 257, in add_param_group
raise ValueError("can't optimize a non-leaf Tensor")
ValueError: can't optimize a non-leaf Tensor

Checking the variable for leaf gives true:


The error arises due to the criterion.temperature parameter and not due to model.parameters.
Kindly help!

It seems that list(criterion.temperature) creates a non-leaf tensor with UnbindBackward as its grad_fn. Use [criterion.temperature] instead for now (I’m unsure if this is expected or not).

list(tensor) takes the first dimension of the tensor as an iterable and makes a list of it.

>>> list(torch.randn(3))
[tensor(-1.0266), tensor(1.7809), tensor(0.0680)]

Note how there are three scalar tensors here that have been “computed” from the original 3-element tensor. To autograd, this iteration, splitting the tensor along the first axis, is computation:

The [tensor] sticks tensor (the original object) into a new list, so the original object is actually in the list:

>>> [torch.randn(3)]
[tensor([ 0.9664, -0.7144,  1.3371])]

Best regards


This makes total sense, thanks!
I didn’t see the issue as I was using a single value in my test :sweat_smile:

Dear @tom and @ptrblck thank you so much for your reply. It was still showing an error. But I managed to get it work by doing the following:

  params = list(model.parameters())