Ignore padding area in loss computation

An alternative way of option B

loss_masked = torch.masked_select(loss, loss_mask)
loss_masked.mean()