I need to obtain the loss of individual points, but I would prefer to do so in batches of size greater than 1. I have attempted to do so using
loss = torch.nn.functional.nll_loss(output, label, reduction='none')
but I suspect this to be incorrect. Although I do get a feasible loss for each point, the process in which these losses are used is performing significantly worse than I’d expect, and I believe this to be due to this step being performed incorrectly.
Could you describe why using
reduction='none' is wrong? It will return the loss for each sample of the batch without taking the default
It may well be correct, I’m just trying to locate the point of failure in my system, and I thought I may have been using the incorrect method, but if
reduction='none' truly results in per-element losses, then the failure must be elsewhere.
You can verify it quickly by comparing the raw loss (with a
mean reduction) against the already averaged loss as seen here:
batch_size = 64
nb_classes = 10
output = torch.randn(batch_size, nb_classes, requires_grad=True)
output = output.log_softmax(dim=1)
label = torch.randint(0, nb_classes, (batch_size,))
loss_raw = torch.nn.functional.nll_loss(output, label, reduction='none')
loss = torch.nn.functional.nll_loss(output, label)
# tensor(2.6052, grad_fn=<MeanBackward0>)
# #tensor(2.6052, grad_fn=<NllLossBackward0>)
and it should return the same value.