Unused model parameters affect optimization for Adam

I think PRNG is the reason. We can specify our own intialization, but I believe this kind of behaviour is somewhat hidden and undesirable.

Hi @ptrblck, I have just checked that a simple model with or without unused registered parameters give the same results in either case. Thatā€™s great. But I have trapped into confusion when Iā€™m going to check whether the model is updating its parameters. To do this, have used the following lines:

        a = list(model.parameters())[0].clone()
        loss.backward()
        optimizer.step()
        b = list(model.parameters())[0].clone()
        print(torch.equal(a.data, b.data))

What should be the output when the model will update its parameters? It should be ā€˜falseā€™ because after loss backward the parameters would be updated. So, a and b would not be same. When I run the model without any unused parameters, it gives ā€˜falseā€™. But when thereā€™re any unused paramters. it gives ā€˜trueā€™. Could please tell me what are reasons in there? Due to lack of knowledge I couldnā€™t find out the reasons.

Could you check, which parameter you are comparing via:

list(model.parameters())[0].clone()

If the first parameter (accessed via [0]) is the unused parameter, it should not be updated and your code would work.
You could use dict(model.named_parameters()) to also get the name of the parameter besides the values.

Thanks @ptrblck. Got the point.

I think those unused layers may change the initialization order between layers, and then disturb the initialization weights in some layers.