Why don't we put models in .train() or .eval() modes in DCGAN example

devansh20la · September 15, 2017, 4:50am

I’m trying to figure out, why don’t we put generator model in eval() mode at the end when using fixed_noise as input or when we are just training the discriminator. I believe the batch norm layers should behave differently evaluation mode.

github.com

pytorch/examples/blob/master/dcgan/main.py

from __future__ import print_function
import argparse
import os
import random
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.optim as optim
import torch.utils.data
import torchvision.datasets as dset
import torchvision.transforms as transforms
import torchvision.utils as vutils
from torch.autograd import Variable


parser = argparse.ArgumentParser()
parser.add_argument('--dataset', required=True, help='cifar10 | lsun | imagenet | folder | lfw | fake')
parser.add_argument('--dataroot', required=True, help='path to dataset')
parser.add_argument('--workers', type=int, help='number of data loading workers', default=2)

This file has been truncated. show original

I tried training with eval mode but the model collapses to one particular image.

Thank you

zhangboknight · September 19, 2017, 3:55am

I am confused about this too. Did you solve the problem?

smth · September 19, 2017, 4:04am

because GAN training is highly unstable, the .eval() mode is not as good as .train() mode particularly for DCGAN.

For a model to be stable and good in eval() mode, you first have to stop training and do generations for a few mini-batches, so that the running_mean and running_std of batchnorm stabilize.

haoyangz · March 23, 2018, 9:03pm

Hi @smth just trying to make it crystal, is this what you meant?

# Training phase
G.train()
D.train()
for epoch in range(n_epoch):
    ....  # train D and G

# Now run generation for a few min-batch while keeping the mode as "train"
for _ in range(additional_iters):
    z = ...  # sample a z
    _ = G(z)

# Now switch to eval mode to do the actual generation
G.eval()
D.eval()
z = ...
samples2output = G(z)

smth · March 24, 2018, 2:25am

@haoyangz yes. that’s what I meant.

LilFish · May 15, 2018, 9:50pm

Could you explain why we still need to manually get running_mean and running_std in for eval() mode? Does pytorch already keep a running_mean and running_std during train() and use them for eval()?

My cycleGAN model does not use dropout but uses instancenorm. It performs badly when switching to eval() mode.

smth · May 15, 2018, 10:02pm

read my comment above (two comments above), where I explain why.

LilFish · May 15, 2018, 11:28pm

I understand that we need a stable running_mean and running_std for eval(). But I thought pytorch already keeps a running_mean and running_std during train() and use them for eval(). Is that not the case? I’m a bit confused on why we need to get a running_mean and running_std for eval() separately. We cannot use the running_mean and running_std from train()?

Also, I am using InstanceNorm2D which is suppose to perform the same in train() and eval()?

Thanks!

smth · May 16, 2018, 2:41am

because GAN training is highly unstable, the .eval() mode is not as good as .train() mode particularly for DCGAN.

John1231983 · May 20, 2018, 12:46pm

Very interesting! Thanks for sharing the tip
I have tried it in my code and it does not much different when using eval during training. Just one question, do we need train G in the additional_iters @smth ? Thanks

# Now run generation for a few min-batch while keeping the mode as "train"
 for _ in range(additional_iters): 
      z = ... # sample a z
       _ = G(z)
      # Do we need train G here without train D?

Eoin_Kenny · September 7, 2020, 6:28pm

I’m also curious about this, I mean you must have to train the GAN during the addtional iterations right? Otherwise how could the batchnorm parameters change?

Is another solution to just train the GAN for a few epochs at the end with a batch size of 1?