I’m trying to figure out, why don’t we put generator model in eval() mode at the end when using fixed_noise as input or when we are just training the discriminator. I believe the batch norm layers should behave differently evaluation mode.
I tried training with eval mode but the model collapses to one particular image.
because GAN training is highly unstable, the .eval() mode is not as good as .train() mode particularly for DCGAN.
For a model to be stable and good in eval() mode, you first have to stop training and do generations for a few mini-batches, so that the running_mean and running_std of batchnorm stabilize.
Hi @smth just trying to make it crystal, is this what you meant?
# Training phase
G.train()
D.train()
for epoch in range(n_epoch):
.... # train D and G
# Now run generation for a few min-batch while keeping the mode as "train"
for _ in range(additional_iters):
z = ... # sample a z
_ = G(z)
# Now switch to eval mode to do the actual generation
G.eval()
D.eval()
z = ...
samples2output = G(z)
Could you explain why we still need to manually get running_mean and running_std in for eval() mode? Does pytorch already keep a running_mean and running_std during train() and use them for eval()?
My cycleGAN model does not use dropout but uses instancenorm. It performs badly when switching to eval() mode.
I understand that we need a stable running_mean and running_std for eval(). But I thought pytorch already keeps a running_mean and running_std during train() and use them for eval(). Is that not the case? I’m a bit confused on why we need to get a running_mean and running_std for eval() separately. We cannot use the running_mean and running_std from train()?
Also, I am using InstanceNorm2D which is suppose to perform the same in train() and eval()?
Very interesting! Thanks for sharing the tip
I have tried it in my code and it does not much different when using eval during training. Just one question, do we need train G in the additional_iters @smth ? Thanks
# Now run generation for a few min-batch while keeping the mode as "train"
for _ in range(additional_iters):
z = ... # sample a z
_ = G(z)
# Do we need train G here without train D?
I’m also curious about this, I mean you must have to train the GAN during the addtional iterations right? Otherwise how could the batchnorm parameters change?
Is another solution to just train the GAN for a few epochs at the end with a batch size of 1?