DCGAN Tutorial: Weight Initialisation

I am currently going through the DCGAN tutorial. It performs weight initialisation using the following method.

Why did the author initialized conv layers with numbers from the normal distribution of mean 0 and batch norm layers with weights from normal distribution of mean 1?

What is the intuition of using two different normal distributions for initialising weights?

# custom weights initialization called on netG and netD
def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        nn.init.normal_(m.weight.data, 0.0, 0.02)
    elif classname.find('BatchNorm') != -1:
        nn.init.normal_(m.weight.data, 1.0, 0.02)
        nn.init.constant_(m.bias.data, 0)

Usually you initialize the weights close to zero using a random distribution as was done for the conv layers.
The weight and bias in BatchNorm work as the rescaling parameters gamma and beta from the original paper.
Since BatchNorm uses the batch statistics (mean and std) to normalize the activations, their values should be close to zero with a stddev of 1.
After the normalization gamma and beta might “rescale” the activations again, i.e. the normalization might be undone. However, if you start with gamma=1 and beta=0, the first batch(es) will be normalized using their stats.

1 Like