Softmax giving nans and negative values as output

granth_jain · September 9, 2020, 11:19am

Hi,

I am using softmax at the end of my model.

However after some training softmax is giving negative probability.In some situations I have encountered nans as probability as well.

one solution i found on searching is to use normalized softmax…however I can not find any pytorch imlpementaion for this.

Can someone please help to let know if there is a normalized softmax available or how to achieve this so that forward and backward propagations are smooth.

Please note that I am already using torch.nn.utils.clip_grad_norm_(model.parameters(), 40) to avoid exploding gradients

hhaoao · September 9, 2020, 1:16pm

The focus of pytorch is not on data processing. All related issues must be resolved with the help of third-party toolkits. You can use the StandardScaler of scikit-learn. Of course, you can also learn about skorch (usually you don’t need this package, just scikit-learn).

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html?highlight=standardscaler#sklearn.preprocessing.StandardScaler

granth_jain · September 9, 2020, 1:18pm

Hi.

Thanks, I am trying to take a sigmoid before taking softmax…also can you please help to let me know what is the ideal value for clip in torch.nn.utils.clip_grad_norm_ while training an A3C

hhaoao · September 9, 2020, 1:25pm

Sorry, I am not familiar with the a3c algorithm you mentioned, and I cannot provide you with more help.But I found a newbie guide, hope it helps you.
https://pytorch.org/tutorials/beginner/transformer_tutorial.html

googlebot · September 9, 2020, 2:18pm

softmax is used normalized (max value subtracted), and negative softmax outputs are mathematically impossible. You should recheck what are you looking at. And maybe try LayerNorm(affine=true) or x.clamp_(-10.0,10.0) before softmax.