Backward,step and zero_grad

1414b35e42c77e0a57dd · January 30, 2019, 2:43am

I have a Generator and two classifiers. I want to train Generator(G) with classifier’s output loss.
I’m not sure classifier’s output loss can reach generator training.
If I train like this, loss_dis can affect G(Generator) ?

G = Generator()
C1 = Classifier()
C2 = Classifier()

self.opt_g = optim.SGD(self.G.parameters(),
                    lr=lr, weight_decay=0.0005,
                    momentum=momentum)
self.opt_c1 = optim.SGD(self.C1.parameters(),
                      lr=lr, weight_decay=0.0005,
                      momentum=momentum)
self.opt_c2 = optim.SGD(self.C2.parameters(),
                     lr=lr, weight_decay=0.0005,
                     momentum=momentum)

feat = G(img)
output_t1 = C1(feat)
output_t2 = C2(feat)
loss_dis = cross_entropy_loss(output_t1, output_t2)
loss_dis.backward()

self.opt_g.step()

opt_g.zero_grad()
opt_c1.zero_grad()
opt_c2.zero_grad()

vmirly1 · January 30, 2019, 3:32am

I think you need to detach feat before passing it to the two classifiers (discriminator) so that the backward call on loss_dis do not affect the gradients on the generator:

feat = G(img).detach()
...

Having said that, I think an alternative way is to move the call to opt_g.zero_grad() before doing any operations for the optimizing the generator. Because that will also clear all the gradients. However, obvisouly, detaching feat will avoid computing unwanted gradients and is therefore more efficient than computing them and then clearing them ==> redundant computations.

1414b35e42c77e0a57dd · January 30, 2019, 3:44am

@vmirly1
Thanks for reply.
My question is somewhat confusing.
I want to loss_dis affect the generator.

vmirly1 · January 30, 2019, 4:03am

The official GAN, the loss_dis does not affect the generator, since the generator has its own loss (loss_gen) which should be used for updating the generator. But, if you want to do it the other way on purpose, then you can leave the .detach(), and include the list of parameters of G in the optimizers defined for discriminator.