I need to compute softmax for a two dimensional matrix w, batch * seq_length. Sequences have different length, and they are denoted by a mask matrix mask_d, also of size batch * seq_length.

I have written the following code, however, it runs into all nan after a couple of iterations. Is there a better way to implement this, or is there an existing SoftMax implementation in PyTorch that can handle a batch of sequences of variable length by mask, and is numerically stable?

But if I just want to get SoftMax instead of LogSoftMax, what should I do? And SoftMax do not allow me to do batch operations of variable sequence lengths, so I have to define my own softmax operations.

Thanks. But I think you misunderstand my question. I am working on a batch_size*max_sequence_length matrix, and the sequences are of variable lengths, that’s why I need a mask matrix to mask out some padding elements. Seems neither Softmax nor LogSoftmax supports this operation of masked softmax. And if I use LogSoftmax, should I do exp(w) to convert it back to Softmax, and do you mean that this will work?

I guess w_sum * mask_d is zero in the last step. if you print it, you can find it out. Also I’m wondering why you do w_max.data[w_max.data < 0] = 0.
You may try smooth tricks, like add eps*seq_length in w_sum and eps in w. But I think it would better to find the cause of this problem in your model.