However after some training softmax is giving negative probability.In some situations I have encountered nans as probability as well.
one solution i found on searching is to use normalized softmax…however I can not find any pytorch imlpementaion for this.
Can someone please help to let know if there is a normalized softmax available or how to achieve this so that forward and backward propagations are smooth.
Please note that I am already using torch.nn.utils.clip_grad_norm_(model.parameters(), 40) to avoid exploding gradients
The focus of pytorch is not on data processing. All related issues must be resolved with the help of third-party toolkits. You can use the StandardScaler of scikit-learn. Of course, you can also learn about skorch (usually you don’t need this package, just scikit-learn).
Thanks, I am trying to take a sigmoid before taking softmax…also can you please help to let me know what is the ideal value for clip in torch.nn.utils.clip_grad_norm_ while training an A3C
softmax is used normalized (max value subtracted), and negative softmax outputs are mathematically impossible. You should recheck what are you looking at. And maybe try LayerNorm(affine=true) or x.clamp_(-10.0,10.0) before softmax.