Did the default weight initialization method change? (torch1.1 vs torch1.3)

I found some differences between the torch 1.1 and torch 1.3 initialization methods during deeplearning experiments.
Are there really different default weight initialization methods for torch1.1 and torch1.3?

The default weight initialization for batch norm layers was updated to 1s instead of sampling from a uniform distribution in this PR, which shipped with 1.2, if I’m not mistaken.
Besides that I’m unaware of any changes.


Thank you very much for your answer!

1 Like