Passing 'model.parameters() + other_parms' to optimizer


I am using nn.AdaptiveLogSoftmaxWithLoss. the way I am building my model, the loss is outside of my nn.Module.

How can I pass the weights included in this loss for them to appear in my model.parameters() and model.modules()? Or at least, how can I join both the parameters/modules of my model with the one sin the loss function?



model.parameters() and model.modules() are both generator, firstly you could get the list of parameters and modules by list(model.parameters()) and then passing the weights and the loss module in a append to list method.

But model.modules() get submodules in a iteration way, so there will be something difficult.

1 Like
(Alex Veuthey) #3

This answer is pretty much all you need!

In the SGD example of the answer, you would only need to change the model.base by your_model_name and model.classifier by your_loss_name. If you wrote your loss module properly (with registered nn.Parameters and not just tensors), it should work.


Yes, you are both right. Since the losses in PyTorch are nn.Module, it’s no problem, they have parameters and modules methods. But what if I wanted to add an extra tensor to the optimizer (with grad set to True)?


Firstly, the extra tensor should be trainable (with requires_grad=True) and included in the computational graph.
Secondly, you could add the extra tensor into optimizer.params_group like this.


Can you give a code example please? I don’t find that optimizer.params_groupin the link that you mentioned


I meant you could add extra tensor into optimizer by optimizer.add_param_group
or initialize your optimizer as follows:

                {'params': model.base.parameters()},
                {'params': model.classifier.parameters(), 'lr': 1e-3}
            ], lr=1e-2, momentum=0.9)

but a tensor does not have .paramaters(), what do you mean?

(Alex Veuthey) #9

I don’t think you actually can do that with Tensors, which is the whole point of torch.nn.Parameter.

My understanding is that Parameter was added specifically to avoid computing gradients for normal Tensors.

What is your use case for Tensors which are not part of the model?

1 Like

yeah, Parameter works, thanks :slight_smile:

1 Like