MaskZeroCriterion

I dont think there is any implementation of maskedcriterion. The best way is to mask the score vector/logits yourself as per this thread, using masked_select. gather doesn’t works for variable length sequences.

Having said that, i’m also interested to know if there’s a better approach to this