how we can do the Weight Initialization for nn.linear?

HITerStudy · April 24, 2019, 4:58am

I write the function for weight initialization, as follows:
def initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
print(m)
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal (0, math.sqrt(2. / n))
if m.bias is not None:
m.bias.data.zero_()
elif isinstance(m, nn.BatchNorm2d):
print(m)
m.weight.data.fill_(1)
m.bias.data.zero_()
elif isinstance(m, nn.BatchNorm1d):
print(m)
m.weight.data.fill_(1)
m.bias.data.zero_()
obviously, the function will not init the nn.linear, but when I place this function as two methods, that is,

…
_initialize_weights()
self.fc = nn.linear(in_feature, out_feature)

or
…
self.fc = nn.linear(in_feature, out_feature)
_initialize_weights()

Intuitively, two methods should have the same effect, however, the result is still different, why?, please give some reasons if possible?Thank you!

ptrblck · April 24, 2019, 5:03am

Most likely your model contains some other layers (e.g. conv layers) which will be initialized calling this method. This will basically create a call to the pseudorandom number generator to sample the parameters. The samples linear parameters will thus be from the default init scheme, but will have different values. If you would like to sample exactly the same values, you could try to set the seed right before creating nn.Linear using torch.manual_seed(your_seed).