I am trying to train siamese network for sentence similarity task. i am using same lstm with pack_padded_sequence to two sentences and getting the norm difference between the two final output of two sequences as similarity and finding the error with actual similarity score and backpropagating. after some time (in first epoch only) gradients are becoming very low and then they are becoming nan.
gradient of norm
at 0 is inf
.
We fixed this instability 2 days ago in the master
branch, so that for norm
, the subgradient is used instead.
It’ll be part of the next release, or if you are interested you can install the master
branch from source via instructions: https://github.com/pytorch/pytorch#from-source
okay. Thanks a lot for reply.