How to optimize multi model's parameter in one optimizer

Kyle · June 1, 2017, 3:47am

I know we can use “optimizer = optim.Adam(model1.parameters())” to optimize a model, but how can I optimize multi model in one optimizer?

smth · June 1, 2017, 9:23pm

optim.Adam(list(model1.parameters()) + list(model2.parameters())

Neo_li · December 10, 2017, 11:02am

Could I put model1, model2 in a nn.ModulList, and give the parameters() generator to optimizer？

smth · December 10, 2017, 6:45pm

Yes you can do that.

Andybert · October 6, 2018, 1:12am

When I construct a optimizer like that, the torch.nn.utils.clip_grad_norm_ function is not available anymore. Is there any way to make both of them available?

ptrblck · October 6, 2018, 4:37am

Seems to be this issue. Let’s keep the discussion in the other thread to keep this one clean.

balbok0 · April 29, 2019, 6:45am

Hi @ptrblck and @smth, just wanted to make sure that method with concatenating lists of paramaters from two models would work even if there is some highly non-linear transformation between them?
The case I am thinking about is that model 1 will create some outputs y_1. Then a non-linear function f is applied applied, such that, the inputs to model 2: x_2 are x_2 = f(y_1).
Or is there a way of forcing a optimizer to be multiplied by a gradient of that function in between?

Thanks for your time!

ptrblck · April 30, 2019, 9:46am

The optimizer will apply its updating scheme to the passed parameters, so it should work if you pass the parameters of both models to it.

RobertYu · September 25, 2019, 2:29am

You can use itertools.chain

from itertools import chain
chain(net1.parameters(), net2.parameters())

rony2 · March 4, 2021, 9:11am

Is it possible to use a different learning rate for each of the models with the same optimizer?

Alexey_Demyanchuk · March 4, 2021, 9:19am

You can use different learning rate and other hyperparameters as well.
Take a look on official example

optim.SGD([
                {'params': model.base.parameters()},
                {'params': model.classifier.parameters(), 'lr': 1e-3}
            ], lr=1e-2, momentum=0.9)

Reference to official docs.

Alexey_Demyanchuk · March 4, 2021, 12:29pm

I believe, all pytorch optimizers provide the same interface, so should work with Adam as well.