Xavier Initialization PyTorch vs MxNet

I am porting an MxNet paper implementation to PyTorch

mx.init.Xavier(rnd_type="uniform", factor_type="avg", magnitude=0.0003)

and

torch.nn.init.xavier_uniform_(array, gain=0.0003) 

Should be pretty much the same, right?

But the docs and source code show another “definition” of magnitude and gain

Even when scaling gain and magnitude correctly, I am still getting different ranges of numbers.

Both starting from an empty array and initializing it.

The image can show the docs of both PyTorch and MxNet.

Am I missing something?
How can I make sure that both PyTorch and MxNet functions are initializing a specific input array in the same way?

I can’t find the usage of magnitude in the docs.
Based on the default value, and the source code it seems to be the 3 inside the sqrt.

If that’s the case, you won’t get the same output for gain, magnitude = 0.0003.

For magnitude = 0.0003, you would need to use gain = sqrt(0.0003 / 6).

1 Like

Based on the source code of MxNet:

factor = (fan_in + fan_out) / 2.0
...
scale = np.sqrt(self.magnitude / factor)

Their docs show the function:
c = \\sqrt{\\frac{3.}{0.5 * (n_{in} + n_{out})}}
Which could possibly mean that the default magnitude=3 is the one in frac{3}
So they have in the upper part of the fraction (inside sqrt) 3(magnitude) * 2(the 0.5 in the denominator)

In PyTorch gain acts differently.
If we take it inside the function we will have gain^2 * 6 which should be equal to the MxNet magnitude*2 which makes gain = sqrt(magnitude/3) meaning that our magnitude of 0.0003 would be a gain of sqrt(0.0003/3) = 0.01

Which is still weird given that I am getting different ranges of results.

It may be a question more suitable for the MxNet forum so I will ask it there, make sure to reach a solution, then go back here to post either a link to the solution or my own explanation of it.

Thank you @ptrblck for taking the time to check it out!