Gradients are becoming nan

I am trying to train siamese network for sentence similarity task. i am using same lstm with pack_padded_sequence to two sentences and getting the norm difference between the two final output of two sequences as similarity and finding the error with actual similarity score and backpropagating. after some time (in first epoch only) gradients are becoming very low and then they are becoming nan.

gradient of norm at 0 is inf.
We fixed this instability 2 days ago in the master branch, so that for norm, the subgradient is used instead.

It’ll be part of the next release, or if you are interested you can install the master branch from source via instructions:

okay. Thanks a lot for reply.