Using `ignore_index` only at test time

jennhu · May 9, 2020, 3:10pm

I am training a language model. I want to ignore a certain class index (-1) in computing loss only at test time, but consider all class indices in the loss function during training.

So at test time, I map all instances of the to-be-ignored vocab items to that index (-1), and then pass ignore_index=-1 to torch.nn.CrossEntropyLoss. But that gives me this error:

/opt/conda/conda-bld/pytorch_1565287148058/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, 
Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [4,0,0] Assertion `t >=
 0 && t < n_classes` failed.

Is this because I am training and testing my model with a different number of classes? How can I fix this? Thank you for your help!

ptrblck · May 10, 2020, 6:53am

Are you sure you are using the right criterion?
This dummy code snippet seems to work:

criterion = nn.CrossEntropyLoss(ignore_index=-1)
output = torch.randn(10, 10, requires_grad=True)
target = torch.randint(0, 10, (10,))
target[0] = -1
loss = criterion(output, target)
loss.backward()
print(output.grad)

The gradients at output.grad[0] are all zeros as expected.

If that doesn’t work, could you please post a code snippet so that we could have a look?