Manual weight reset differs from first initialization

Hey all,
I narrowed my problem down to the following:

  1. I set a random seed and create a new model (this should call the standard init for every parameter)
  2. I save this model (to look at those parameters later on)
  3. I again set the original random seed
  4. I reset the model parameters via
def weight_reset(m):
    if (
        isinstance(m, nn.Conv1d)
        or isinstance(m, nn.Conv2d)
        or isinstance(m, nn.Linear)
        or isinstance(m, nn.Conv3d)
        or isinstance(m, nn.ConvTranspose1d)
        or isinstance(m, nn.ConvTranspose2d)
        or isinstance(m, nn.ConvTranspose3d)
        or isinstance(m, nn.BatchNorm1d)
        or isinstance(m, nn.BatchNorm2d)
        or isinstance(m, nn.BatchNorm3d)
        or isinstance(m, nn.GroupNorm)
    ):
        m.reset_parameters()
model.apply(weight_reset)
  1. I compare the parameters for the original and the re-initialized model

Result: Most parameters are equal, but some conv-layers differ (not all!)
The model is mostly identical to the ResNet18 in torchvision. (To simplify things, I removed the custom initialization from both the ResNet constructor and the weight-reset method.)
Here is the output I get when I compare the parameters:

conv1.weight tensor(False)
bn1.weight tensor(True)
bn1.bias tensor(True)
layer1.0.conv1.weight tensor(False)
layer1.0.bn1.weight tensor(True)
layer1.0.bn1.bias tensor(True)
layer1.0.conv2.weight tensor(False)
layer1.0.bn2.weight tensor(True)
layer1.0.bn2.bias tensor(True)
layer1.1.conv1.weight tensor(False)
layer1.1.bn1.weight tensor(True)
layer1.1.bn1.bias tensor(True)
layer1.1.conv2.weight tensor(False)
layer1.1.bn2.weight tensor(True)
layer1.1.bn2.bias tensor(True)
layer2.0.conv1.weight tensor(False)
layer2.0.bn1.weight tensor(True)
layer2.0.bn1.bias tensor(True)
layer2.0.conv2.weight tensor(False)
layer2.0.bn2.weight tensor(True)
layer2.0.bn2.bias tensor(True)
layer2.0.downsample.0.weight tensor(False)
layer2.0.downsample.1.weight tensor(True)
layer2.0.downsample.1.bias tensor(True)
layer2.1.conv1.weight tensor(False)
layer2.1.bn1.weight tensor(True)
layer2.1.bn1.bias tensor(True)
layer2.1.conv2.weight tensor(False)
layer2.1.bn2.weight tensor(True)
layer2.1.bn2.bias tensor(True)
layer3.0.conv1.weight tensor(False)
layer3.0.bn1.weight tensor(True)
layer3.0.bn1.bias tensor(True)
layer3.0.conv2.weight tensor(False)
layer3.0.bn2.weight tensor(True)
layer3.0.bn2.bias tensor(True)
layer3.0.downsample.0.weight tensor(False)
layer3.0.downsample.1.weight tensor(True)
layer3.0.downsample.1.bias tensor(True)
layer3.1.conv1.weight tensor(False)
layer3.1.bn1.weight tensor(True)
layer3.1.bn1.bias tensor(True)
layer3.1.conv2.weight tensor(False)
layer3.1.bn2.weight tensor(True)
layer3.1.bn2.bias tensor(True)
layer4.0.conv1.weight tensor(False)
layer4.0.bn1.weight tensor(True)
layer4.0.bn1.bias tensor(True)
layer4.0.conv2.weight tensor(False)
layer4.0.bn2.weight tensor(True)
layer4.0.bn2.bias tensor(True)
layer4.0.downsample.0.weight tensor(False)
layer4.0.downsample.1.weight tensor(True)
layer4.0.downsample.1.bias tensor(True)
layer4.1.conv1.weight tensor(False)
layer4.1.bn1.weight tensor(True)
layer4.1.bn1.bias tensor(True)
layer4.1.conv2.weight tensor(False)
layer4.1.bn2.weight tensor(True)
layer4.1.bn2.bias tensor(True)
fc.weight tensor(False)
fc.bias tensor(False)

When I do the manual reset twice and compare the outputs, I can see that the output is identical.
Does anyone have an idea what I am missing here?

The order of the layer initialization seems to be different, so that resetting the seed won’t yield the same results.
After removing the init methods and resetting the seed before each layer creation (in the ResNet class as well as in BasicBlock, conv1x1, and conv3x3) I get the same results.

That being said, I wouldn’t rely on the seed to yield the exact same results in such a setup, where the order of operations might be different.
I.e. the order in which each modules are passed to model.apply() is apparently different from the order these layers are created in the ResNet class.

1 Like

Thank you for the quick answer!
Then this must be the explanation.
So as long as I want to perfectly reproduce the initial weights it’s safest to construct a new model or re-load the initial state-dict, right?
I’m just glad there is no bug in the weight_reset method, so I can still use it as long as I don’t need a perfect reproduction of the initial weights.

Yes, I would recommend to set the seed once in a script, initialize the model, and store the state_dict.
This state_dict is then the reference for this particular seed.
Making sure that all operations are executed in the same order is often too complicated.