Hi,
I think the motivation has been explained in the link @harsha_g has referenced. But for more clarification, I have also explained how to get the values especially sqrt(5)
in this post: Clarity on default initialization in pytorch
But if I want to summarize, both do the same thing but older versions of PyTorch have the convention used in Lua Torch and after a while that some modules in PyTorch have been constructed, they tried to achieve same concept using the new modules. For instance, in your example, both approchs sample from uniform distribution which can be achieved by kaiming_uniform_
to which I have explained in the referenced link.
Bests