When setting the label index to -100 (the same as the default ignore_index for NLLLoss), which is meaningless, the error occurs (RuntimeError: CUDA error: device-side assert triggered). But when setting both the ignore_index of NLLLoss and the meaningless label index to 255, no error is reported. I am completely confused. What happens? Who can tell me. Thanks a lot.
Could you post a minimal code snippet to reproduce this issue, as this dummy code works fine:
output = torch.randn(10, 10, requires_grad=True) target = torch.randint(0, 10, (10,)) target = -100 output = output.to('cuda') target = target.to('cuda') criterion = nn.CrossEntropyLoss() loss = criterion(output, target) loss.backward()