Some linear layers have a valid grad_sample, others don't

I have two linear layers defined in the model as such: -

self.feature_reduction = nn.Linear(in_features = 512, out_features = 256)
self.class_embed = nn.Linear(in_features = self.hidden_dim, out_features = 2)

They both work fine in a non-DP setting but with Opacus, for some reason, param.grad_sample for self.class_embed shows up as None. And consequently, in the optimizer.step() part, I get a “Per sample gradient is not initialized. Not updated in backward pass?” error.

I saw an issue on the Pytorch forum that if a layer is initialized but not used, it’s grad_sample comes up as None and we get the aforementioned error. But I know for sure that the self.class_embed layer is called every time, so how is it possible that it doesn’t have a valid gradient?

I cannot see the rest of your code, however I assume there is the error as these two lines seem correct.
Are you certain you use the layer in the forward pass?

Hi @Anirban_Nath

Do you mind sharing a reproducible code here so we can debug this error? template Colab

@tls430 @ashkan_software Hi. I don’t think I will be able to reproduce my code through the Colab template because the error is oddly specific to one LayerNorm and one linear layer among hundreds of other layers that are there in my code. My code is too large to be reproduced but I have made sure that the layers in question are being called using their respective forward function. It is pretty weird that these two layers are the only ones that are causing problems. Is there any other way I can communicate my problem?

Hi @Anirban_Nath

Do you think you can simply your code and send? Maybe instead of having hundreds of layers, just stick to a few (including the ones you think are causing the problem), to make sure we have a small example. In order for us to help, we should see the code and see where the issue is happening :wink:

Also, a small example may even give you some clue and you may even find the bug!