What is the correct input for BCElogitLoss?

luminoussin · May 15, 2020, 2:48am

Hello, this is the first time I implement BCElogitLoss and I was wondering if my input is correct or not because I experienced a sudden spike of loss in during my training for classification loss. I read from Logit Explanation that the input should be [-inf, inf] and I check my input is something like this :

torch.Size([1, 4]) tensor([[-1376.6078, -2134.1909, -1130.7600,  -517.1730]], device='cuda:0',
       grad_fn=<ViewBackward>)
torch.Size([1, 4]) tensor([[-1015.1113, -2017.0060,  -598.0123,  -647.5376]], device='cuda:0',
       grad_fn=<ViewBackward>)
torch.Size([1, 4]) tensor([[ -948.6944, -2063.7595,  -120.7232,   -31.2307]], device='cuda:0',
       grad_fn=<ViewBackward>)
torch.Size([1, 4]) tensor([[-1494.8126, -2984.7998,  -173.2264,  -605.0916]], device='cuda:0',
       grad_fn=<ViewBackward>)
torch.Size([1, 4]) tensor([[ -759.2620, -6767.8813, -6867.0396,   155.1411]], device='cuda:0',
       grad_fn=<ViewBackward>)
torch.Size([1, 4]) tensor([[-1216.1967, -1960.8824,   781.1366,  -871.1536]], device='cuda:0',
       grad_fn=<ViewBackward>)

and my target is something like this :

tensor([[0., 0., 1., 0.]], device='cuda:0', grad_fn=<FloorBackward>)
tensor([[0., 0., 0., 1.]], device='cuda:0', grad_fn=<FloorBackward>)
tensor([[0., 1., 0., 0.]], device='cuda:0', grad_fn=<FloorBackward>)
tensor([[0., 0., 0., 1.]], device='cuda:0', grad_fn=<FloorBackward>)
tensor([[0., 0., 0., 1.]], device='cuda:0', grad_fn=<FloorBackward>)
tensor([[0., 0., 0., 1.]], device='cuda:0', grad_fn=<FloorBackward>)
tensor([[0., 0., 1., 0.]], device='cuda:0', grad_fn=<FloorBackward>)
tensor([[0., 0., 0., 1.]], device='cuda:0', grad_fn=<FloorBackward>)
tensor([[0., 0., 1., 0.]], device='cuda:0', grad_fn=<FloorBackward>)
tensor([[0., 0., 0., 1.]], device='cuda:0', grad_fn=<FloorBackward>)
tensor([[0., 0., 1., 0.]], device='cuda:0', grad_fn=<FloorBackward>)

So is this the correct input format? if it is wrong, can you provide example of how the input should look like. If anyone is wondering, my last layer is just a simple conv2d and then I change the shape into [1,4] just like I show above. Thank you

ptrblck · May 15, 2020, 6:00pm

The input and target shapes look as if you were dealing with a multi-label classification use case, i.e. each sample might belong to zero, one, or more classes.
If that’s the case, then your approach should be correct.

However, the example targets seem to be one-hot encoded, which looks like a multi-class classification, where each sample belongs to one class only.
In that case you should use nn.CrossEntropyLoss and use class indices as your target tensor via: target = torch.argmax(target,1).

KFrank · May 16, 2020, 1:13am

Hello @ptrblck (and luminoussin)!

Is it a concern that the target contains a grad_fn? That strikes
me as odd.

Best.

K. Frank

ptrblck · May 16, 2020, 1:52am

That is an unusual use case, but apparently there might be some valid use cases, which require the targets to get gradients:

x = torch.randn(2, 4, requires_grad=True)
y = torch.randint(0, 2, (2, 4)).float().requires_grad_(True)

criterion = nn.BCEWithLogitsLoss()
loss = criterion(x, y)
loss.backward()

print(x.grad)
> tensor([[ 0.0799,  0.0592, -0.0284, -0.0983],
          [ 0.0537, -0.0384, -0.0770, -0.0801]])
print(y.grad)
> tensor([[-0.0715,  0.0134, -0.1531,  0.1629],
          [ 0.0353, -0.1015,  0.0592,  0.0722]])

However, I don’t know, when you would need to update the targets.

luminoussin · May 16, 2020, 1:58am

Hi, about that target with grad_fn. you can forget about it, I just print out a similar variable that looks like the target so it has those grad_fn tracking. For my problem, at first I actually looking for multi class classification but after looking for both description, I think I will try for multi label due to future development as well but since I only have training data for multi class classification, do you think it is still possible to train it like on multi label classification?

ptrblck · May 16, 2020, 5:35am

It should be possible, but the performance might be worse than with a standard multi-class classification.
Also, of course you might get outputs for a multi-label classification and would have to deal with it somehow. I.e. is your current use case suitable to get e.g. two predicted classes although the target only contains a single class?

luminoussin · May 18, 2020, 3:26am

I dont think it is possible then. I only have training data that consist of single predicted classes, I suppose the best way is to create multi-class classification for now. Thank you for your help so far