I need to obtain the loss of individual points, but I would prefer to do so in batches of size greater than 1. I have attempted to do so using
loss = torch.nn.functional.nll_loss(output, label, reduction='none')
but I suspect this to be incorrect. Although I do get a feasible loss for each point, the process in which these losses are used is performing significantly worse than I’d expect, and I believe this to be due to this step being performed incorrectly.
Could you describe why using reduction='none' is wrong? It will return the loss for each sample of the batch without taking the default 'mean' reduction.
It may well be correct, I’m just trying to locate the point of failure in my system, and I thought I may have been using the incorrect method, but if reduction='none' truly results in per-element losses, then the failure must be elsewhere.