Seq2seq masking

I am testing seq2seq model on my training data as a sanity check. I implemented masked backpropagation for taking into consideration batching different length sentences together. But even if i remove the masking ,the NN will eventually do well on the training data. So my question is how do i figure out by evaluation if my masking is correct based on the performance seen? Thanks :slight_smile:

I commented masking and in the same number of iterations the loss reduced lesser. Is that a valid sanity check?

Also in general is there any other method of validating your implementation other than testing on training data