Hey all,
I narrowed my problem down to the following:
- I set a random seed and create a new model (this should call the standard init for every parameter)
- I save this model (to look at those parameters later on)
- I again set the original random seed
- I reset the model parameters via
def weight_reset(m):
if (
isinstance(m, nn.Conv1d)
or isinstance(m, nn.Conv2d)
or isinstance(m, nn.Linear)
or isinstance(m, nn.Conv3d)
or isinstance(m, nn.ConvTranspose1d)
or isinstance(m, nn.ConvTranspose2d)
or isinstance(m, nn.ConvTranspose3d)
or isinstance(m, nn.BatchNorm1d)
or isinstance(m, nn.BatchNorm2d)
or isinstance(m, nn.BatchNorm3d)
or isinstance(m, nn.GroupNorm)
):
m.reset_parameters()
model.apply(weight_reset)
- I compare the parameters for the original and the re-initialized model
Result: Most parameters are equal, but some conv-layers differ (not all!)
The model is mostly identical to the ResNet18 in torchvision. (To simplify things, I removed the custom initialization from both the ResNet constructor and the weight-reset method.)
Here is the output I get when I compare the parameters:
conv1.weight tensor(False)
bn1.weight tensor(True)
bn1.bias tensor(True)
layer1.0.conv1.weight tensor(False)
layer1.0.bn1.weight tensor(True)
layer1.0.bn1.bias tensor(True)
layer1.0.conv2.weight tensor(False)
layer1.0.bn2.weight tensor(True)
layer1.0.bn2.bias tensor(True)
layer1.1.conv1.weight tensor(False)
layer1.1.bn1.weight tensor(True)
layer1.1.bn1.bias tensor(True)
layer1.1.conv2.weight tensor(False)
layer1.1.bn2.weight tensor(True)
layer1.1.bn2.bias tensor(True)
layer2.0.conv1.weight tensor(False)
layer2.0.bn1.weight tensor(True)
layer2.0.bn1.bias tensor(True)
layer2.0.conv2.weight tensor(False)
layer2.0.bn2.weight tensor(True)
layer2.0.bn2.bias tensor(True)
layer2.0.downsample.0.weight tensor(False)
layer2.0.downsample.1.weight tensor(True)
layer2.0.downsample.1.bias tensor(True)
layer2.1.conv1.weight tensor(False)
layer2.1.bn1.weight tensor(True)
layer2.1.bn1.bias tensor(True)
layer2.1.conv2.weight tensor(False)
layer2.1.bn2.weight tensor(True)
layer2.1.bn2.bias tensor(True)
layer3.0.conv1.weight tensor(False)
layer3.0.bn1.weight tensor(True)
layer3.0.bn1.bias tensor(True)
layer3.0.conv2.weight tensor(False)
layer3.0.bn2.weight tensor(True)
layer3.0.bn2.bias tensor(True)
layer3.0.downsample.0.weight tensor(False)
layer3.0.downsample.1.weight tensor(True)
layer3.0.downsample.1.bias tensor(True)
layer3.1.conv1.weight tensor(False)
layer3.1.bn1.weight tensor(True)
layer3.1.bn1.bias tensor(True)
layer3.1.conv2.weight tensor(False)
layer3.1.bn2.weight tensor(True)
layer3.1.bn2.bias tensor(True)
layer4.0.conv1.weight tensor(False)
layer4.0.bn1.weight tensor(True)
layer4.0.bn1.bias tensor(True)
layer4.0.conv2.weight tensor(False)
layer4.0.bn2.weight tensor(True)
layer4.0.bn2.bias tensor(True)
layer4.0.downsample.0.weight tensor(False)
layer4.0.downsample.1.weight tensor(True)
layer4.0.downsample.1.bias tensor(True)
layer4.1.conv1.weight tensor(False)
layer4.1.bn1.weight tensor(True)
layer4.1.bn1.bias tensor(True)
layer4.1.conv2.weight tensor(False)
layer4.1.bn2.weight tensor(True)
layer4.1.bn2.bias tensor(True)
fc.weight tensor(False)
fc.bias tensor(False)
When I do the manual reset twice and compare the outputs, I can see that the output is identical.
Does anyone have an idea what I am missing here?