Optimizer comparison

Hi all, I`m confused that three types of optimizer expressions.

1. optimizer = torch.optim.Adam(list(model1.parameters()) + list(model2.parameters()), ...)
vs

2. optimizer = torch.optim.SGD([{'params': model1.parameters()},
                           {'params': model2.parameters()}], ...)

vs
3. optimizer = torch.optim.Adam(whole_network.parameters(), ...)

(whole_network include model1 and model2)
those three optimizesr work same ? Anyone can compare those optimizers ?

The first and third approach are basically the same with the difference that you are passing the parameters of two different models together in 1.

The second one can be used for per-parameter options.

Thanks you so much!
It helps me a lot :grinning: