How to optimize multi model's parameter in one optimizer

I know we can use “optimizer = optim.Adam(model1.parameters())” to optimize a model, but how can I optimize multi model in one optimizer?

5 Likes
optim.Adam(list(model1.parameters()) + list(model2.parameters())
42 Likes

Could I put model1, model2 in a nn.ModulList, and give the parameters() generator to optimizer?

Yes you can do that.

1 Like

When I construct a optimizer like that, the torch.nn.utils.clip_grad_norm_ function is not available anymore. Is there any way to make both of them available?

Seems to be this issue. Let’s keep the discussion in the other thread to keep this one clean. :wink:

Hi @ptrblck and @smth, just wanted to make sure that method with concatenating lists of paramaters from two models would work even if there is some highly non-linear transformation between them?
The case I am thinking about is that model 1 will create some outputs y_1. Then a non-linear function f is applied applied, such that, the inputs to model 2: x_2 are x_2 = f(y_1).
Or is there a way of forcing a optimizer to be multiplied by a gradient of that function in between?

Thanks for your time!

The optimizer will apply its updating scheme to the passed parameters, so it should work if you pass the parameters of both models to it.

1 Like

You can use itertools.chain

from itertools import chain
chain(net1.parameters(), net2.parameters())
7 Likes

Is it possible to use a different learning rate for each of the models with the same optimizer?

You can use different learning rate and other hyperparameters as well.
Take a look on official example

optim.SGD([
                {'params': model.base.parameters()},
                {'params': model.classifier.parameters(), 'lr': 1e-3}
            ], lr=1e-2, momentum=0.9)

Reference to official docs.

4 Likes

I believe, all pytorch optimizers provide the same interface, so should work with Adam as well.