I am trying to understand how ignore_index works with the cross entropy loss. So I am working with a segmentation problem and if the all the segmentation values are -100 , I dont want it to propagate the loss as the segmentation doesn’t not exist for that specific case.
So I just tested out the code
import torch.nn as nn
import torch
target_tensor = torch.ones(960,960)*-100
target_tensor.requires_grad = True
input_tensor = torch.randn(960, 960, requires_grad=True)
loss = nn.CrossEntropyLoss(ignore_index=-100,reduction='mean')
output = loss(input_tensor, target_tensor )
output.backward()
Now when I see what the values for output is it shows : -707187.4375 and the according to the docs the expected input gradient should be 0 since everything is -100 but it shows some values
print(input_tensor.grad)
tensor([[-0.0546, 0.1607, -0.0571, ..., 0.0920, 0.0534, -0.1826],
[ 0.1276, 0.1603, 0.1651, ..., 0.0619, 0.1603, -0.2230],
[ 0.1149, -0.0944, 0.1460, ..., 0.1027, 0.1783, -0.0054],
...,
[ 0.0059, 0.1577, 0.1724, ..., 0.1459, 0.1738, 0.1727],
[ 0.1814, 0.1275, -0.2737, ..., -0.2048, -0.0896, 0.1874],
[ 0.0066, 0.0401, -0.6887, ..., 0.0292, 0.1217, 0.1145]])
Can anyone tell me whats wrong or show me some example where ignore index is used and the loss is not propagated.