Clarity on default initialization in pytorch

Nikronic · June 9, 2020, 1:30pm

Hi,

Let me explain it step by step.

Here is kaiming_uniform_.

image764×150 15.6 KB

image731×425 8.69 KB

Where negative_slope=sqrt(5) so the gain=sqrt(2/6)=1/sqrt(3) for kaiming.
If we replace this in bound formula, we get bound = [1/sqrt(3) ] * [sqrt(3/ fan_in)] which with a little simplification, it will be bound = 1/sqrt(fan_in) which can be represented by bound^2 = 1 / fan_in.
In linear implementation code you referenced:

image710×170 15.9 KB

So what we have here is that k= 1/in_feautres which in case of kaming it can be represented k=1/fan_in. Also, we want a boundary of [-sqrt(k), sqrt(k)] where k = bound^2= 1 / fan_in from step 1.

For simplcity, just replace sqrt(5) in gain formula then optain bound in kaiming_uniform_ and replace the bound as k in linear.

Edit: Add some related posts

Bests