When setting the label index to -100 (the same as the default ignore_index for NLLLoss), which is meaningless, the error occurs (RuntimeError: CUDA error: device-side assert triggered). But when setting both the ignore_index of NLLLoss and the meaningless label index to 255, no error is reported. I am completely confused. What happens? Who can tell me. Thanks a lot.
Could you post a minimal code snippet to reproduce this issue, as this dummy code works fine:
output = torch.randn(10, 10, requires_grad=True)
target = torch.randint(0, 10, (10,))
target[0] = -100
output = output.to('cuda')
target = target.to('cuda')
criterion = nn.CrossEntropyLoss()
loss = criterion(output, target)
loss.backward()