I was wondering what the reason is that weights are initialized the way they are? For torch.nn.Linear the weights are initialized with samples from Uniform(-sqrt(k), sqrt(k)) with k = 1/no_features. Linear — PyTorch 1.7.0 documentation In the source code “kaiming_uniform” appears. However, to my un…

Hi, I think this thread and its references can answer your question. [image] Clarity on default initialization in pytorch According to the documentation for torch.nn, the default initialization uses a uniform distribution bounded by 1/sqrt(in_features), but this code ap…

What is the theoretical reason for the default way of Pytorchs weight initialization?

Nikronic (Nikan Doosti) December 29, 2020, 6:29pm 2

Hi,

I think this thread and its references can answer your question.

Feel free to ask if it did not help.

Bests

1 Like