Somthing about Linear.py in nn moudles

mutouqizi · January 19, 2021, 10:01am

In official documentation, the weights of Linear layer are initialized by the formula:1/in_features , but l find the formula in linear.py is another one ,as show in picture.I want to know why " a=math.sqrt(5)" ,when i choose ReLU as activate function.WHY?

SumanthRH · January 20, 2021, 4:58am

The initialization done is same as in the docs. The implementation simply uses the kaiming_uniform_ function (torch.nn.init — PyTorch 1.7.0 documentation) and passes the right parameter to get a uniform distribution in (-1/sqrt(in_features),1/sqrt(in_features)) . In Kaiming uniform initialization, there is an extra sqrt(3). So they’ve made the gain value 1/sqrt(3) using

gain = torch.nn.init.calculate_gain(nonlinearity='leaky_relu', param=math.sqrt(5))

Check pytorch/init.py at 4ff1823fac8833d4bdfe333c49fd486143175395 · pytorch/pytorch · GitHub.

Seems like a workaround to avoid writing another function that might duplicate code in kaiming_uniform_

mutouqizi · January 20, 2021, 8:18am

I can understand the operation of initializing weights. What i can’t understand is that in docs( torch.nn.linear) it says the weights of Linear layer are initialized by the formula:1/in_features. But in linear.py, it use kaiming_uniform_,which youcan find in this link :https://github.com/pytorch/pytorch/blob/4ff1823fac8833d4bdfe333c49fd486143175395/torch/nn/modules/linear.py#L35

mutouqizi · January 20, 2021, 8:31am

And here’s the weirdest thing for me is the function init.kaiming_uniform_(self.weight, a=math.sqrt(5)),which means the weights for the full connection layer are initialized by default using the formula corresponding to LeakyRelu.I want to know why it choose a=math.sqrt(5) instead of the negative_slope chosed by user.

SumanthRH · January 20, 2021, 10:15am

Hey. I realise now that my answer wasn’t clear earlier. They’ve used the Kaiming uniform initialisation, yes but they’ve chosen the parameter a such that gain is set to be 1/sqrt(3). The way they’ve done that is using the calculate_gain function (pytorch/init.py at 4ff1823fac8833d4bdfe333c49fd486143175395 · pytorch/pytorch · GitHub). With nonlinearity=leaky_relu, the gain factor is math.sqrt(2.0 / (1 + param ** 2)). Choose a=param=math.sqrt(5). Then the gain factor is 1/sqrt(3).

mutouqizi · January 20, 2021, 12:52pm

Maybe I didn’t make my question clear. I have understood the operation of Kaiming initialization. What I don’t understand is why when i use ReLu as the activation function , the full connection layer uses the default scheme: init.kaiming_uniform_(self.weight, a=math.sqrt(5))to initialize its weights, which is different from its docs and l have check its resnet.py(vision/resnet.py at 146dd8556c4f3b2ad21fad9312f4602cd049d6da · pytorch/vision · GitHub), in which ReLu is used as activation function, but FC is initialied by deauflt, too.
I think the initialization parameter a should be set to 0 when ReLu is used as activation function, but now it is sqrt(5),which shoulid be used in LeakyReLU.

mutouqizi · January 20, 2021, 12:54pm

init.kaiming_uniform_(self.weight, a=math.sqrt(5)) in Linear.py.Check it in:pytorch/linear.py at 4ff1823fac8833d4bdfe333c49fd486143175395 · pytorch/pytorch · GitHub

googlebot · January 20, 2021, 1:22pm

Linear class has no knowledge about activations, so this reset_parameters function had to be universal. It is also invoked from the constructor, thus it has no arguments, you’re supposed to manually re-initialize parameters after construction if defaults don’t work well.

SumanthRH · January 21, 2021, 4:07am

As @googlebot mentions, this initialization is universal, invoked in the constructor of any linear layer. The nonlinearity argument passed to the kaiming_uniform_ function is not based on the nonlinearity that follows the Linear layer. The whole idea is to use the kaiming_uniform_ function to the get the uniform initialization we want, by passing the right set of arguments (a and nonlinearity).

You can maybe think about this as PyTorch devs trying to avoid writing any code that duplicates statements in kaiming_uniform_.

mutouqizi · January 21, 2021, 8:44am

Thank you very much for your constant reply. I have understood it.