I think PRNG is the reason. We can specify our own intialization, but I believe this kind of behaviour is somewhat hidden and undesirable.
Hi @ptrblck, I have just checked that a simple model with or without unused registered parameters give the same results in either case. Thatās great. But I have trapped into confusion when Iām going to check whether the model is updating its parameters. To do this, have used the following lines:
a = list(model.parameters())[0].clone()
loss.backward()
optimizer.step()
b = list(model.parameters())[0].clone()
print(torch.equal(a.data, b.data))
What should be the output when the model will update its parameters? It should be āfalseā because after loss backward the parameters would be updated. So, a and b would not be same. When I run the model without any unused parameters, it gives āfalseā. But when thereāre any unused paramters. it gives ātrueā. Could please tell me what are reasons in there? Due to lack of knowledge I couldnāt find out the reasons.
Could you check, which parameter you are comparing via:
list(model.parameters())[0].clone()
If the first parameter (accessed via [0]
) is the unused parameter, it should not be updated and your code would work.
You could use dict(model.named_parameters())
to also get the name of the parameter besides the values.
I think those unused layers may change the initialization order between layers, and then disturb the initialization weights in some layers.