Clarity on default initialization in pytorch

Hi,

Let me explain it step by step.

  1. Here is kaiming_uniform_.



    Where negative_slope=sqrt(5) so the gain=sqrt(2/6)=1/sqrt(3) for kaiming.
    If we replace this in bound formula, we get bound = [1/sqrt(3) ] * [sqrt(3/ fan_in)] which with a little simplification, it will be bound = 1/sqrt(fan_in) which can be represented by bound^2 = 1 / fan_in.

  2. In linear implementation code you referenced:
    image

So what we have here is that k= 1/in_feautres which in case of kaming it can be represented k=1/fan_in. Also, we want a boundary of [-sqrt(k), sqrt(k)] where k = bound^2= 1 / fan_in from step 1.

For simplcity, just replace sqrt(5) in gain formula then optain bound in kaiming_uniform_ and replace the bound as k in linear.

Edit: Add some related posts

  1. Kaiming init of conv and linear layers, why gain = sqrt(5) · Issue #15314 · pytorch/pytorch · GitHub
  2. Why the default negative_slope for kaiming_uniform initialization of Convolution and Linear layers is √5?

Bests

3 Likes