Perplexity about Generator training in conditional GAN

When training the generator of my conditional GAN what I do is:

    output = disc(fake, real[1]).view(-1)
    lossG = criterion(output, torch.ones_like(output))
    gen.zero_grad()
    lossG.backward()
    opt_gen.step()

    if batch_idx == 0:
        print(
            f"Epoch [{epoch}/{num_epochs}] Batch {batch_idx}/{len(loader)} \
                  Loss D: {lossD:.4f}, loss G: {lossG:.4f}"
        )

        step += 1

loss_D.append(lossD.detach().numpy())
loss_G.append(lossG.detach().numpy())

Since we are minimizing, should I declare instead:

output = -disc(fake, real[1]).view(-1)

?

Hi Frederico!

You don’t tell us what criterion is, so we have to make some
(hopefully reasonable) assumptions …

If lossG becomes smaller when disc does a worse job of
distinguishing fake from real (and you are only using
lossG.backward() to modify the weights of gen and not of
disc), then this should work as written. Backpropagation
of lossG will train gen to do a better job of fooling disc.

But if criterion returns a typical loss for the performance of
disc that becomes smaller when disc does a better job of
distinguishing fake from real, then …

No, not quite. This flips the sign of whatever it is that disc returns
and that might not be what you want. (It would depend on the details
of criterion.)

It is true – assuming that criterion becomes smaller when disc
performs better – that you will have to flip a sign somewhere so that
you train gen to do a better job of fooling disc.

Assuming again that you are only using lossG.backward() to update
gen (and not disc) the most straightforward thing to do is just flip the
sign of lossG:

lossG = -criterion(output, torch.ones_like(output))
lossG.backward()

Now lossG becomes smaller when disc does worse, which means
that gen did a better job of fooling disc, which is what you want to
train gen to do.

Best.

K. Frank

Hi Frank and thanks for the kind answer.

Long story short, I’ve been implementing both (Conditional) GAN and Wasserstein GAN with gradient penalty, and looking at tutorials of people implementing those I was pretty confused, but now I think that I’ve cleared out my perplexities. The code I’ve posted above is generator training of my standard GAN and we don’t use the minus sign as we are trying to optimize the following objective function

If we consider the GP-WGAN then the objective function for the generator becomes

where the first term is ignored asd we’re minimizing over $\theta$. I think that’s where my confusion arose, while looking at code tutorials people have provided where they were putting minus sign for the critic predictions of fake samples.

Tell me your thoughts about this, if you think that’s correct.

Also, in WGAN case, how do we interpret the losses for generator and discriminator? I guess we still can say that if the critic loss is low then we’re separating ‘in a good way’ p_data and p_G while if the generator loss is low then we are succeeding in getting p_G close to p_data (In a more Euclidean sense this time by considering Wasserstein metric).