Gain value of kaiming_initialization

I know that the default initialization of layers are torch.nn.init.kaiming_uniform(tensor,a=sqrt(5))
where a is the gain value of nonlinearity.

In my VGG, all my nonlinearity is ReLU. So according to the paper of kaiming_initialization, i should set a=0. When i use that initialization, loss fly to NAN.

But when i use the default initialization, i trained my net successfully.

What’s the problem with that?Why can pytorch set default gain equal to sqrt(5)?

Have a look at this answer.

1 Like

Thanks for solving my problem:laughing: