NLL Loss Without Sum

magic282 · June 20, 2017, 6:05am

Hi all!
I am implementing the model of the paper, Gulcehre, Caglar, et al. “Pointing the unknown words.” arXiv preprint arXiv:1603.08148 (2016).
But I need a special NLL loss function which does not sum the values inside it.

copySwitchValue = sigmoid(CopySwitch(concat([DecGRUState, attentionState], dim=1)));
copyVocabProb = attentionProb;
copyOutProb = copyVocabProb * copySwitchValue + 0.000001;
copyLoss = LogLoss(copyOutProb, trgCopyId) * copyMask;

readoutState = ReadOut(concat([wordEmbed, attentionState, DecGRUState], dim=1));
scoringResult = Scoring.BuildNet(readoutState);
vocabProb = softmax(scoringResult) * (1 - copySwitchValue) + 0.000001;
vocabLoss = LogLoss(vocabProb, trgWordId) * (1 - copyMask);

For example, suppose the source sentence has 20 words, target has 18 words, batch size is 64 and vocab size 30000. So we have copyOutProb (18, 64, 20), vocabProb (18, 64, 30000), copyMask (18, 64).

What I want to implement here is to get two LogLoss value matrices with shape (18, 64) so I can mask them with copyMask. After being masked, they can be summed up to get the final loss value. The current NLLLoss is summing everything up, so I am wondering is there a way to do this loss masking thing?

Thanks.