I am trying to add the trainable weights of my custom loss function (which is a weighted sum of 2 loss functions) to the optimizer. The weights are initialized as
w = nn.Parameter(torch.Tensor().cuda())
And I’ve successfully registered the weights to the optimizer’s param group, but I keep getting the ‘can’t optimize a non-leaf Tensor’ error. It would be fine if I just initialize the weights as tensors
w = torch.Tensor().cuda()
but in this way, even if the weights have been added to the optimizer param group, it does not update over epochs.
How can I solve this problem?
Try to do
w_gpu = w.cuda()
then use w_gpu
Thanks for the reply! I’ve just tried it but didn’t work out. It’s still logging the weights as non-leaf tensor.
Sorry, I’m assuming you are setting this inside a nn.module, is it the case?
Yeah I did set this inside a nn.Module, but I also manually passed the weight variables to the optimiser
params = list(model.parameters()) + list(loss_function.w1) + list(loss_function.w2)
optimiser = optim.SGD(params)
The manual passing is necessary since the weight(s) would not be added to model.parameters() even if I decleared it as nn.Parameter() within the nn.Module.
Let me know, what is w1 and w2?
Think that if you have a loss which is a nn.Module like
w_gpu = w.cuda()
You only have to pass w to the optimizer, not w_gpu. w should also be a leaf tensor if defined that way.
Here you don’t have to pass w_gpu as it is a copy of w (and non-leaf tensor whose backward is just passed to w)
Same effect should be achieved if you allocate that module
instance = MyLoss().cuda()
In fact, you should code it this way as the previous one is a way of allocating different parts of a submodel on different gpus, thus, it’s not intended to be used like that.
It would be useful if you paste the essential part of your code instead
Sry for the late reply. I’ve found out that the bug is caused by me not carrying out any operations on the created weight (a mispositioned line of code) and I’ve fixed it. Thank you very much!
If you have two different loss functions, finish the forwards for both of them separately, and then finally you can do
(loss1 + loss2).backward().