What is the theoretical reason for the default way of Pytorchs weight initialization?

Hi,

I think this thread and its references can answer your question.

Feel free to ask if it did not help.

Bests

1 Like