I am using nn.AdaptiveLogSoftmaxWithLoss. the way I am building my model, the loss is outside of my nn.Module.
How can I pass the weights included in this loss for them to appear in my model.parameters() and model.modules()? Or at least, how can I join both the parameters/modules of my model with the one sin the loss function?
model.parameters() and model.modules() are both generator, firstly you could get the list of parameters and modules by list(model.parameters()) and then passing the weights and the loss module in a append to list method.
But model.modules() get submodules in a iteration way, so there will be something difficult.
In the SGD example of the answer, you would only need to change the model.base by your_model_name and model.classifier by your_loss_name. If you wrote your loss module properly (with registered nn.Parameters and not just tensors), it should work.
Yes, you are both right. Since the losses in PyTorch are nn.Module, it’s no problem, they have parameters and modules methods. But what if I wanted to add an extra tensor to the optimizer (with grad set to True)?
Firstly, the extra tensor should be trainable (with requires_grad=True) and included in the computational graph.
Secondly, you could add the extra tensor into optimizer.params_group like this.
So in my case I have pretrained a model and after pre-training, in the main training loop, I have applied a few functions (tensor operations) on top of the pre-trained model’s output. This, applying a new function has some parameters that I need to update at each iteration.
Do you recommend making a nn.Module sub-class out of these new functions and parameter where I can define the new parameter using nn.Parameter ??
The problem is, I will not be using any pytorch’s nn layer in this new model (model on top of pre-trained model). So, is it even necessary or the best practice to add additional parameters like this? What do you guys suggest? Please let me know If I am being unclear.
I think even if you don’t use PyTorch layers you should use Parameters for your learnable weights/params. This will probably make it easier for you.
Using a subclass of nn.Module will ensure that all the backend work for computing gradients will work out of the box, and you can use your functions (or let’s call it a module, then) like another module of pytorch layers…