Unused model parameters affect optimization for Adam

I think PRNG is the reason. We can specify our own intialization, but I believe this kind of behaviour is somewhat hidden and undesirable.

Hi @ptrblck, I have just checked that a simple model with or without unused registered parameters give the same results in either case. That’s great. But I have trapped into confusion when I’m going to check whether the model is updating its parameters. To do this, have used the following lines:

        a = list(model.parameters())[0].clone()
        loss.backward()
        optimizer.step()
        b = list(model.parameters())[0].clone()
        print(torch.equal(a.data, b.data))

What should be the output when the model will update its parameters? It should be ‘false’ because after loss backward the parameters would be updated. So, a and b would not be same. When I run the model without any unused parameters, it gives ‘false’. But when there’re any unused paramters. it gives ‘true’. Could please tell me what are reasons in there? Due to lack of knowledge I couldn’t find out the reasons.

Could you check, which parameter you are comparing via:

list(model.parameters())[0].clone()

If the first parameter (accessed via [0]) is the unused parameter, it should not be updated and your code would work.
You could use dict(model.named_parameters()) to also get the name of the parameter besides the values.

Thanks @ptrblck. Got the point.

I think those unused layers may change the initialization order between layers, and then disturb the initialization weights in some layers.