What is the difference between the following two ways of combing muti-model parameters in one optimizer?

  1. optim.Adam(list(net1.parameters()) + list(net2.parameters()),
    lr=cfg.learning_rate_apt,
    betas=(cfg.beta1, cfg.beta2))

  2. optim.Adam(itertools(net1.parameters(), net2.parameters()),
    lr=cfg.learning_rate_apt,
    betas=(cfg.beta1, cfg.beta2))

Hi all,
I have used the two ways to produce the optimizer to optimize the different modules of parameters of GAN, and got different results.
I want to figure out the reason of this.
Thank you.