How "fan_in" and "fan_out" work in "torch.nn.init"?

Hello everyone

I am trying to initialize the layers of my network using kaiming_normal_ initializer. This method has an argument called mode with fan_in and fan_out options.

I read the docs and found out it depends on the number of filters or channels. For example, I have a Conv2d layer with size of [64, 3, 4, 4]. When I calculate the values, I get fan_in=48 and fan_out=1024 which is a huge difference and result in std = gain/math.sqrt(fan) is like this:

fan_in enabled: std=0.2
fan_out enabled: std=0.04

And std is passed to tensor.normal_ method.

The question is I did not find any specific explanation about which mode should be chose in the paper I am trying to implement or other resources.

Best regards

1 Like

As far as I know, kaiming_normal_ or he_normal is generally initialized using fan_in.

There are two parts:

  • As Avinash points out, the default mode 'fan_in' is probably a good choice.
  • For some intuition of why this is: Each output is a weighted sum of fan_in inputs. For linear, this is one row of the matrix multiplication, for convolutions it is number of in-channels * kernel size.

When @ptrblck and I implemented StyleGAN for PyTorch (but I don’t think the StyleGAN authors necessarily invented it), I’ve come across the idea of not having the multiplier used during init, but applying them to the weight before using it. This has the effect of using the scaling for both initial values and gradient updates. (Even if it is not that efficient without a hand-made convolution kernel.)

Best regards



Why dose resnet in torchvision apply kaiming_normal_ with mode ‘fan_out’? Is there a specific reason to do so?