Untrained parameters affect final model performance

I wrote some DL models in PyTorch, and fixed random seeds. Some unused functions affect the final model performance. For example, this is part of my code.

self.linear_1 = nn.Linear(27, d_model//2)
self.linear_2 = nn.Linear(d_model//2, d_model)

Even if I wrote self.linear_1 & self.linear_2, they are not under forward command. However, I’ll get different results if I remove them. I feel it’s weird since they are not in the model. Can anyone explain it?

That’s expected since the order of calls into the pseudo-random number generator (PRNG) differs if you are randomly initializing these layers even if they are never used in the forward method.
Seeding will guarantee that the same sequence of PRNG calls will generate the same random number sequence.

1 Like

Thanks for the kind explanations. The performance decreased after I remove those unused functions, and it seems that there is no way to fix it.

Yes, you are right that there is no proper way of “fixing” the expected noise by changing random seeds.
You should expect to see the same behavior in your final model accuracy by simply using another initial seed. If the stddev of the final accuracy is too large, you would have to check your training routine and try to stabilize the training.

1 Like