What is the correct input for BCElogitLoss?

Hello, this is the first time I implement BCElogitLoss and I was wondering if my input is correct or not because I experienced a sudden spike of loss in during my training for classification loss. I read from Logit Explanation that the input should be [-inf, inf] and I check my input is something like this :

torch.Size([1, 4]) tensor([[-1376.6078, -2134.1909, -1130.7600,  -517.1730]], device='cuda:0',
       grad_fn=<ViewBackward>)
torch.Size([1, 4]) tensor([[-1015.1113, -2017.0060,  -598.0123,  -647.5376]], device='cuda:0',
       grad_fn=<ViewBackward>)
torch.Size([1, 4]) tensor([[ -948.6944, -2063.7595,  -120.7232,   -31.2307]], device='cuda:0',
       grad_fn=<ViewBackward>)
torch.Size([1, 4]) tensor([[-1494.8126, -2984.7998,  -173.2264,  -605.0916]], device='cuda:0',
       grad_fn=<ViewBackward>)
torch.Size([1, 4]) tensor([[ -759.2620, -6767.8813, -6867.0396,   155.1411]], device='cuda:0',
       grad_fn=<ViewBackward>)
torch.Size([1, 4]) tensor([[-1216.1967, -1960.8824,   781.1366,  -871.1536]], device='cuda:0',
       grad_fn=<ViewBackward>)

and my target is something like this :

tensor([[0., 0., 1., 0.]], device='cuda:0', grad_fn=<FloorBackward>)
tensor([[0., 0., 0., 1.]], device='cuda:0', grad_fn=<FloorBackward>)
tensor([[0., 1., 0., 0.]], device='cuda:0', grad_fn=<FloorBackward>)
tensor([[0., 0., 0., 1.]], device='cuda:0', grad_fn=<FloorBackward>)
tensor([[0., 0., 0., 1.]], device='cuda:0', grad_fn=<FloorBackward>)
tensor([[0., 0., 0., 1.]], device='cuda:0', grad_fn=<FloorBackward>)
tensor([[0., 0., 1., 0.]], device='cuda:0', grad_fn=<FloorBackward>)
tensor([[0., 0., 0., 1.]], device='cuda:0', grad_fn=<FloorBackward>)
tensor([[0., 0., 1., 0.]], device='cuda:0', grad_fn=<FloorBackward>)
tensor([[0., 0., 0., 1.]], device='cuda:0', grad_fn=<FloorBackward>)
tensor([[0., 0., 1., 0.]], device='cuda:0', grad_fn=<FloorBackward>)

So is this the correct input format? if it is wrong, can you provide example of how the input should look like. If anyone is wondering, my last layer is just a simple conv2d and then I change the shape into [1,4] just like I show above. Thank you

The input and target shapes look as if you were dealing with a multi-label classification use case, i.e. each sample might belong to zero, one, or more classes.
If that’s the case, then your approach should be correct.

However, the example targets seem to be one-hot encoded, which looks like a multi-class classification, where each sample belongs to one class only.
In that case you should use nn.CrossEntropyLoss and use class indices as your target tensor via: target = torch.argmax(target,1).

Hello @ptrblck (and luminoussin)!

Is it a concern that the target contains a grad_fn? That strikes
me as odd.

Best.

K. Frank

That is an unusual use case, but apparently there might be some valid use cases, which require the targets to get gradients:

x = torch.randn(2, 4, requires_grad=True)
y = torch.randint(0, 2, (2, 4)).float().requires_grad_(True)

criterion = nn.BCEWithLogitsLoss()
loss = criterion(x, y)
loss.backward()

print(x.grad)
> tensor([[ 0.0799,  0.0592, -0.0284, -0.0983],
          [ 0.0537, -0.0384, -0.0770, -0.0801]])
print(y.grad)
> tensor([[-0.0715,  0.0134, -0.1531,  0.1629],
          [ 0.0353, -0.1015,  0.0592,  0.0722]])

However, I don’t know, when you would need to update the targets.

Hi, about that target with grad_fn. you can forget about it, I just print out a similar variable that looks like the target so it has those grad_fn tracking. For my problem, at first I actually looking for multi class classification but after looking for both description, I think I will try for multi label due to future development as well but since I only have training data for multi class classification, do you think it is still possible to train it like on multi label classification?

It should be possible, but the performance might be worse than with a standard multi-class classification.
Also, of course you might get outputs for a multi-label classification and would have to deal with it somehow. I.e. is your current use case suitable to get e.g. two predicted classes although the target only contains a single class?

I dont think it is possible then. I only have training data that consist of single predicted classes, I suppose the best way is to create multi-class classification for now. Thank you for your help so far