Progressive GAN training -> copying some pretrained weights to new model and training?

I’m working on a generative model based on the Progressive GAN paper, where the authors start with a GAN capable of generating small images, training at that scale, and then subsequently adding additional layers to increase resolution, at each step only training the newly added layers.


gO_load = torch.load("OutputModels/8x8_Gen_FINAL.pt")
dO_load = torch.load("OutputModels/8x8_Disc_FINAL.pt")

netG_state_dict = netG.state_dict()

filtered_g_dict={}
filtered_g_dict['module.c0.weight']=gO_load['module.c0.weight']
filtered_g_dict['module.b0.weight']=gO_load['module.b0.weight']
filtered_g_dict['module.b0.bias']=gO_load['module.b0.bias']
filtered_g_dict['module.b0.running_mean']=gO_load['module.b0.running_mean']
filtered_g_dict['module.b0.running_var']=gO_load['module.b0.running_var']
filtered_g_dict['module.b0.num_batches_tracked']=gO_load['module.b0.num_batches_tracked']

netG_state_dict.update(filtered_g_dict)
netG.load_state_dict(netG_state_dict)


netD_state_dict = netD.state_dict()
filtered_d_dict={}
filtered_d_dict['module.c9.weight']=dO_load['module.c9.weight']
netD_state_dict.update(filtered_d_dict)
netD.load_state_dict(netD_state_dict)

for param in netG.module.c0.parameters(): param.requires_grad=False
for param in netG.module.b0.parameters(): param.requires_grad=False

for param in netD.module.c9.parameters(): param.requires_grad=False

Here I have loaded the weights from the previously trained layers into my two networks (which now have additional layers that are initialized using a gaussian distribution prior to the code provided here).
When I begin training, everything is being fed through correctly and everything is sized and shaped correctly as I would want. But the network doesn’t seem to be updating at all.

Do I need to save/load weights from my optimization algorithm as well (this doesn’t make sense to me though, as I’m not wanting to train the same layers any further). Or do I need to manually specify requires_grad somewhere else?

Thanks!

I should also add that I’m asking this because when I don’t load the state dict for these layers prior to training, and instead just train the entire model (all layers) at once, I’m getting decent performance, although taking longer than if it were fewer layers.

When I load the state dict from the saved model like this, I don’t see any indication that anything is adjusting over time. It looks like nothing is being updated.

It appears that the following code fixed the issue:

gO_load = torch.load("OutputModels/8x8_Gen_FINAL.pt")
dO_load = torch.load("OutputModels/8x8_Disc_FINAL.pt")

netG_state_dict = netG.state_dict()
filtered_g_dict={}
filtered_g_dict['module.c0.weight']=gO_load['module.c0.weight']
filtered_g_dict['module.b0.weight']=gO_load['module.b0.weight']
filtered_g_dict['module.b0.bias']=gO_load['module.b0.bias']
filtered_g_dict['module.b0.running_mean']=gO_load['module.b0.running_mean']
filtered_g_dict['module.b0.running_var']=gO_load['module.b0.running_var']
filtered_g_dict['module.b0.num_batches_tracked']=gO_load['module.b0.num_batches_tracked']

netG_state_dict.update(filtered_g_dict)
netG.load_state_dict(netG_state_dict)

for param in netG.module.c0.parameters(): param.requires_grad=False
for param in netG.module.b0.parameters(): param.requires_grad=False
for param in netG.module.c1.parameters(): param.requires_grad=True
for param in netG.module.b1.parameters(): param.requires_grad=True
for param in netG.module.c8.parameters(): param.requires_grad=True
optimizerG = optim.Adam(filter(lambda p: p.requires_grad, netG.parameters()), lr=lr*1.25, betas=(beta1, 0.999))


netD_state_dict = netD.state_dict()
filtered_d_dict={}
filtered_d_dict['module.c9.weight']=dO_load['module.c9.weight']

netD_state_dict.update(filtered_d_dict)
netD.load_state_dict(netD_state_dict)

for param in netD.module.c9.parameters(): param.requires_grad=False

optimizerD = optim.Adam(filter(lambda p: p.requires_grad, netD.parameters()), lr=lr, betas=(beta1, 0.999))

Can someone confirm that this is the right way to handle this? Thanks!

Yeah the problem in the first code was after leading the state dict, you had to create the optimizer again, which you did in the second piece of code.

You don’t need to filter the model parameters to pass to the optimizer, setting requires_grad=False alone is sufficient to prevent the gradient computation of those layers with requires_grad=False