I am facing this issue of gradient being 0 even though the loss is not zero. loss stays at 1 while gradients are 0. I’m using the MSE loss function. Can anyone please help me here in debugging this?
I think that code is well-written even though having some weird parts…
Loss value can be stuck with non-zero gradient but another case is not possible if your model is nicely defined.
Check every layer in your model to find any NaN or something.
I see. You are right. Didn’t notice that:)
Also, there are many unknown things in the question.
What are the range of values in tgts, what about the network design and the final layer activation etc.
@thecho7 there are weird parts as the problem in itself is unique. I need the weights to be interpretable. A particular weight value ranges between 0 and 1 but at the end I want them to be either 0 or 1. After training I have to read formula using the learned weights of the network.
hi,
maybe you’re cutting computation graph somewhere in forward pass.
i think you can check it by filling parameter.grad with value other than zero.
then backward loss to see if it changes at all.
Just a hypothesis. Can you plot the distribution of self.layer_and_weights for every training iteration?
Is there a chance that after some iterations of training, this self.layer_and_weights is going 0?
Can you verify that?
@mMagmer yes you are right I guess. gradients doesn’t change at all after doing what you said to do. But why is this not happening in case of other examples. And how to find out what is cutting the computation graph?
now that i think about it, the test is not the right way to check for computation graph.
i wanna say it’s in util.tnorm_n_inputs part, but i’m not sure.