I am using softmax at the end of my model.
However after some training softmax is giving negative probability.In some situations I have encountered nans as probability as well.
one solution i found on searching is to use normalized softmax…however I can not find any pytorch imlpementaion for this.
Can someone please help to let know if there is a normalized softmax available or how to achieve this so that forward and backward propagations are smooth.
Please note that I am already using torch.nn.utils.clip_grad_norm_(model.parameters(), 40) to avoid exploding gradients