Issue with sharing block - getting warning - issue 40967


I am running the code from the VAE-GAN from the INTERSPEECH 2020 paper Voice Conversion Using Speech-to-Speech Neuro-Style Transfer. This also happens with the UNIT paper code.

Pytorch is giving a warning as follows:
UserWarning: optimizer contains a parameter group with duplicate parameters; in future, this will cause an error; see Inconsistent behaviour when parameter appears multiple times in parameter list · Issue #40967 · pytorch/pytorch · GitHub for more information

The problem is with the shared block and it appears that this is going to be error moving forward yet I’m unsure how to fix the issue so I van continue using pytorch for these types of structures. If it is going to error out I will need an understanding how to get this to work.

The block is shared:

shared_dim = opt.dim * 2 ** opt.n_downsample

Initialize generator and discriminator

encoder = Encoder(dim=opt.dim, in_channels=opt.channels, n_downsample=opt.n_downsample)

shared_G = ResidualBlock(features=shared_dim)

G1 = Generator(dim=opt.dim, out_channels=opt.channels, n_upsample=opt.n_downsample, shared_block=shared_G)

G2 = Generator(dim=opt.dim, out_channels=opt.channels, n_upsample=opt.n_downsample, shared_block=shared_G)

D1 = Discriminator(input_shape)

D2 = Discriminator(input_shape)

The warning is occurring when the Adam optimizer is being defined:


optimizer_G = torch.optim.Adam(
itertools.chain(encoder.parameters(), G1.parameters(), G2.parameters()),,
optimizer_G, lr_lambda=LambdaLR(opt.n_epochs, opt.epoch, opt.decay_epoch).step
betas=(opt.b1, opt.b2),
optimizer_D1 = torch.optim.Adam(D1.parameters(),, betas=(opt.b1, opt.b2))
optimizer_D2 = torch.optim.Adam(D2.parameters(),, betas=(opt.b1, opt.b2))

Learning rate update schedulers

lr_scheduler_G = torch.optim.lr_scheduler.LambdaLR(
optimizer_G, lr_lambda=LambdaLR(opt.n_epochs, opt.epoch, opt.decay_epoch).step

The G1 parameters and G2.parameters have a shared block of weights and this I believe is causing the issue. Duplicate parameters? I haven’t been able to find any remedy to this issue, yet the sharing of blocks does occur in many of the newer papers.

Any suggestions on how to get around this warning?


So coceptually, what do you want to happen if the ‘optimal’ configuration for g1 and g2 happens to be at different values of the shared block?

If you duplicate, you will get different values.

If you just want the shared block to be pulled in the respective descent direction with each step, without caring too much about the details, having two optimizers might be better.

If you care about one more than the other, you might use a weighted loss of those for g1and g2 and filter out the duplicates.

The last two are not really excluding each other: you could split out the common part to a third parameter group.

Best regards