Passing 'model.parameters() + other_parms' to optimizer

I am using nn.AdaptiveLogSoftmaxWithLoss. the way I am building my model, the loss is outside of my nn.Module.

How can I pass the weights included in this loss for them to appear in my model.parameters() and model.modules()? Or at least, how can I join both the parameters/modules of my model with the one sin the loss function?


model.parameters() and model.modules() are both generator, firstly you could get the list of parameters and modules by list(model.parameters()) and then passing the weights and the loss module in a append to list method.

But model.modules() get submodules in a iteration way, so there will be something difficult.

1 Like

This answer is pretty much all you need!

In the SGD example of the answer, you would only need to change the model.base by your_model_name and model.classifier by your_loss_name. If you wrote your loss module properly (with registered nn.Parameters and not just tensors), it should work.

Yes, you are both right. Since the losses in PyTorch are nn.Module, it’s no problem, they have parameters and modules methods. But what if I wanted to add an extra tensor to the optimizer (with grad set to True)?

Firstly, the extra tensor should be trainable (with requires_grad=True) and included in the computational graph.
Secondly, you could add the extra tensor into optimizer.params_group like this.

Can you give a code example please? I don’t find that optimizer.params_groupin the link that you mentioned

I meant you could add extra tensor into optimizer by optimizer.add_param_group
or initialize your optimizer as follows:

                {'params': model.base.parameters()},
                {'params': model.classifier.parameters(), 'lr': 1e-3}
            ], lr=1e-2, momentum=0.9)
1 Like

but a tensor does not have .paramaters(), what do you mean?

I don’t think you actually can do that with Tensors, which is the whole point of torch.nn.Parameter.

My understanding is that Parameter was added specifically to avoid computing gradients for normal Tensors.

What is your use case for Tensors which are not part of the model?

1 Like

yeah, Parameter works, thanks :slight_smile:

1 Like


So in my case I have pretrained a model and after pre-training, in the main training loop, I have applied a few functions (tensor operations) on top of the pre-trained model’s output. This, applying a new function has some parameters that I need to update at each iteration.

Do you recommend making a nn.Module sub-class out of these new functions and parameter where I can define the new parameter using nn.Parameter ??

The problem is, I will not be using any pytorch’s nn layer in this new model (model on top of pre-trained model). So, is it even necessary or the best practice to add additional parameters like this? What do you guys suggest? Please let me know If I am being unclear.

Would appreciate your help, Thanks!

I think even if you don’t use PyTorch layers you should use Parameters for your learnable weights/params. This will probably make it easier for you.

Using a subclass of nn.Module will ensure that all the backend work for computing gradients will work out of the box, and you can use your functions (or let’s call it a module, then) like another module of pytorch layers…

1 Like