I know we can use “optimizer = optim.Adam(model1.parameters())” to optimize a model, but how can I optimize multi model in one optimizer?
optim.Adam(list(model1.parameters()) + list(model2.parameters())
Could I put model1, model2 in a nn.ModulList, and give the parameters() generator to optimizer?
Yes you can do that.
When I construct a optimizer like that, the torch.nn.utils.clip_grad_norm_ function is not available anymore. Is there any way to make both of them available?
Seems to be this issue. Let’s keep the discussion in the other thread to keep this one clean.
Hi @ptrblck and @smth, just wanted to make sure that method with concatenating lists of paramaters from two models would work even if there is some highly non-linear transformation between them?
The case I am thinking about is that model 1 will create some outputs y_1. Then a non-linear function f is applied applied, such that, the inputs to model 2: x_2 are x_2 = f(y_1).
Or is there a way of forcing a optimizer to be multiplied by a gradient of that function in between?
Thanks for your time!
The optimizer will apply its updating scheme to the passed parameters, so it should work if you pass the parameters of both models to it.
You can use itertools.chain
from itertools import chain
chain(net1.parameters(), net2.parameters())
Is it possible to use a different learning rate for each of the models with the same optimizer?
You can use different learning rate and other hyperparameters as well.
Take a look on official example
optim.SGD([
{'params': model.base.parameters()},
{'params': model.classifier.parameters(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)
Reference to official docs.
I believe, all pytorch optimizers provide the same interface, so should work with Adam as well.