Logpt = logpt.gather(1,target) IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

I found this implementation of focal loss in GitHub and I am using it for an imbalanced dataset binary classification problem.

# IMPLEMENTATION CREDIT: https://github.com/clcarwin/focal_loss_pytorch
    class FocalLoss(nn.Module):
    def __init__(self, gamma=0.5, alpha=None, size_average=True):
        super(FocalLoss, self).__init__()
        self.gamma = gamma
        self.alpha = alpha
        if isinstance(alpha,(float,int)): self.alpha = torch.Tensor([alpha,1-alpha])
        if isinstance(alpha,list): self.alpha = torch.Tensor(alpha)
        self.size_average = size_average

    def forward(self, input, target):
        if input.dim()>2:
            input = input.view(input.size(0),input.size(1),-1)  # N,C,H,W => N,C,H*W
            input = input.transpose(1,2)    # N,C,H*W => N,H*W,C
            input = input.contiguous().view(-1,input.size(2))   # N,H*W,C => N*H*W,C
        target = target.view(-1,1)

        logpt = F.log_softmax(input)
        logpt = logpt.gather(1,target)
        logpt = logpt.view(-1)
        pt = Variable(logpt.data.exp())

        if self.alpha is not None:
            if self.alpha.type()!=input.data.type():
                self.alpha = self.alpha.type_as(input.data)
            at = self.alpha.gather(0,target.data.view(-1))
            logpt = logpt * Variable(at)

        loss = -1 * (1-pt)**self.gamma * logpt
        if self.size_average: return loss.mean()
        else: return loss.sum()

also

gamma=args.gamma
alpha=args.alpha

criterion = FocalLoss(gamma, alpha)
m = nn.Sigmoid()

I use the criterion as follows in train phase:

for i_batch, sample_batched in enumerate(dataloader_train):  
            #pdb.set_trace()        
            feats = torch.stack(sample_batched['image']) 
            labels = torch.as_tensor(sample_batched['label']).cuda() 
            print('feats shape: ', feats.shape)
            print('labels shape: ', labels.shape)
            output = model(feats)
            loss = criterion(m(output[:,1]-output[:,0]), labels.float())

The error is:

train: True test: False
preparing datasets and dataloaders......
creating models......

=>Epoches 1, learning rate = 0.0010000, previous best = 0.0000
training...
feats shape:  torch.Size([64, 419, 512])
labels shape:  torch.Size([64])
main_classifier.py:86: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
  logpt = F.log_softmax(input)
Traceback (most recent call last):
  File "main_classifier.py", line 346, in <module>
    loss = criterion(m(output[:,1]-output[:,0]), labels.float())
  File "/home/jalal/research/venv/dpcc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "main_classifier.py", line 87, in forward
    logpt = logpt.gather(1,target)
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

How should I fix this error?

Is this implementation of FocalLoss correct wrt newer versions of PyTorch?

The dimension error is a similar one as mentioned on your other post.
This time gather expects to be used on dim1 while logpt seems to have a single dimension.
Usually focal loss is used in a segmentation use case, so your output would have 4 dimensions.
Check the input to this criterion and if you’ve flattened it into a single dimension.

1 Like

Hi Piotr, thanks a lot for your response. I was actually doubtful if the warning from the other post is causing the error here.

So, I am using focal loss here for binary classification. My model is a vision transformer which output two values.

I use

output = model(feats)
loss = criterion(m(output[:,1]-output[:,0]), labels.float())

for feeding into the focal loss. I followed same methodology we did for BCEwithLogitLoss. Am I wrong? I am not exactly sure how to feed my input to focal loss criterion. I am also noticing majority of its use cases are around multi-class (many class) classification, rather than simple binary implementation.

Also, I printed the shape of items passed onto the criterion:

print('feats shape: ', feats.shape)
print('labels shape: ', labels.shape)
output = model(feats)
print('output shape: ', output.shape)
print('m(output[:,1]-output[:,0]) shape: ', m(output[:,1]-output[:,0]).shape)
print('labels shape: ', labels.shape)
loss = criterion(m(output[:,1]-output[:,0]), labels.float())

and result is:

train: True test: False
preparing datasets and dataloaders......
creating models......

=>Epoches 1, learning rate = 0.0010000, previous best = 0.0000
training...
feats shape:  torch.Size([64, 419, 512])
labels shape:  torch.Size([64])
output shape:  torch.Size([64, 2])
m(output[:,1]-output[:,0]) shape:  torch.Size([64])
labels shape:  torch.Size([64])
Traceback (most recent call last):
  File "main_classifier.py", line 350, in <module>
    loss = criterion(m(output[:,1]-output[:,0]), labels.float())
  File "/home/jalal/research/venv/dpcc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "main_classifier.py", line 87, in forward
    logpt = F.log_softmax(input, dim=1)
  File "/home/jalal/research/venv/dpcc/lib/python3.8/site-packages/torch/nn/functional.py", line 1769, in log_softmax
    ret = input.log_softmax(dim)
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

From what I see here, it seems I am flattening them. So, what do you suggest?

You could always define a binary use case as a “2 class multi-class” use case (although it might not be the most intuitive way).

In any case, you are indexing the outputs here and are subtracting them:

output[:,1]-output[:,0]

which yields the 1-dim tensor.
Could you explain this usage a bit more. i.e. why is the indexing/subtraction used (is it a common use case?) and what is m() in this line of code?

I forgot the m:
m = nn.Sigmoid()
I learned this from another PyTorch forum’s post for weighted BCEWithLogitLoss for the same exact model (Vision Transformer that has two outputs) linked here BCELoss are unsafe to autocast - #8 by ptrblck and here Predicted labels stuck at 1 for test set where class 0 is 20% of data - #9 by mMagmer and Predicted labels stuck at 1 for test set where class 0 is 20% of data - #2 by mMagmer
So, I thought it would make sense to use the same strategy here as well. I am honestly not 100% sure about its soundness.

Also, here’s output shape and its m(of difference) shape:

output shape:  torch.Size([64, 2])
m(output[:,1]-output[:,0]) shape:  torch.Size([64])

Here’s my output from Vision Transformer:

tensor([[-0.4628, -0.0162],
        [-0.1771, -0.2762],
        [-0.3501, -0.3124],
        [-0.0345, -0.2116],
        [-0.6834, -0.6267],
        [-0.3947, -0.3422],
        [-0.5291, -0.3093],
        [-0.3404, -0.4409],
        [-0.4053, -0.0817],
        [-0.2567, -0.5358],
        [-0.4409, -0.4376],
        [-0.3592, -0.5107],
        [-0.6554, -0.0408],
        [-0.6338, -0.7211],
        [-0.2038, -0.3258],
        [-0.3502, -0.2161],
        [-0.2310, -0.4300],
        [ 0.1375, -0.4513],
        [-0.1515, -0.2475],
        [-0.2232, -0.5464],
        [-0.5991, -0.0105],
        [-0.6468, -0.3417],
        [-0.9478, -0.5296],
        [-0.3018,  0.0058],
        [-0.4747, -0.0496],
        [-0.1090, -0.1725],
        [-0.3093, -0.3793],
        [-0.2367,  0.0939],
        [-0.4250, -0.1503],
        [-0.4808, -0.9099],
        [-0.6547, -0.1873],
        [-0.4889, -0.2087],
        [-0.4146, -0.0471],
        [-0.3048, -0.1532],
        [-0.5915, -0.7724],
        [-0.6641, -0.3917],
        [-0.3719, -0.2148],
        [-0.0768, -0.5107],
        [-0.6068, -0.4270],
        [-0.5275,  0.0754],
        [-0.3668, -0.2665],
        [-0.0615, -0.4781],
        [-0.6371, -0.2831],
        [-0.5597, -0.4243],
        [-0.2276, -0.1467],
        [-0.3069,  0.0041],
        [-0.1659, -0.4976],
        [-0.6002, -0.4510],
        [-0.2321, -0.2460],
        [-0.4541,  0.1983],
        [-0.3305, -0.3162],
        [-0.5350, -0.0780],
        [-0.4779, -0.3603],
        [-0.1400, -0.4827],
        [-0.4159, -0.1576],
        [-0.5064, -0.7692],
        [-0.8219, -0.3282],
        [-0.5917, -0.6336],
        [-0.2134, -0.2807],
        [-0.6567, -0.5691],
        [-0.3580,  0.1714],
        [-0.2116, -0.3069],
        [-0.5027, -0.0743],
        [-0.6859, -0.1410]], device='cuda:0', grad_fn=<AddmmBackward0>)

and here’s the m(output[:,1]-output[:,0]):

tensor([0.6098, 0.4752, 0.5094, 0.4558, 0.5142, 0.5131, 0.5547, 0.4749, 0.5802,
        0.4307, 0.5008, 0.4622, 0.6490, 0.4782, 0.4696, 0.5335, 0.4504, 0.3569,
        0.4760, 0.4199, 0.6430, 0.5757, 0.6031, 0.5763, 0.6047, 0.4841, 0.4825,
        0.5819, 0.5683, 0.3943, 0.6147, 0.5696, 0.5908, 0.5378, 0.4549, 0.5677,
        0.5392, 0.3932, 0.5448, 0.6463, 0.5251, 0.3973, 0.5876, 0.5338, 0.5202,
        0.5771, 0.4178, 0.5373, 0.4965, 0.6576, 0.5036, 0.6123, 0.5294, 0.4151,
        0.5642, 0.4347, 0.6210, 0.4895, 0.4832, 0.5219, 0.6293, 0.4762, 0.6055,
        0.6329], device='cuda:0', grad_fn=<SigmoidBackward0>)

Thanks for the links. I took a quick look at the posts and they seem right for nn.BCELoss. However, I’m not a huge fan of single dimension inputs to nn.BCELoss (but this might be my personal pet peeve :wink: ).
In any case, your focal loss implementation expects the same inputs as nn.CrossEntropyLoss if I’m not misinterpreting the posted code.
I.e. input should contain logits (remove the sigmoid) and should have the shape [batch_size, nb_classes, *] while the target should contain class indices in [0, nb_classes-1] and have the shape [batch_size, *]. Note that * denotes additional dimensions (e.g. for a segmentation use case).

1 Like

Thanks a lot for your answer and clarification. I confirm that using the same args as in CrossEntropyLoss fixed the problem.

Since I initially used this other implementation Is this a correct implementation and use of focal loss for binary classification on vision transformer output? If it is correct, why are all train and val preds still stuck at zero? of FocalLoss which required Sigmoid, by mistake I used same line of code for this other implementation used in this post.

#loss = criterion(m(output[:,1]-output[:,0]), labels.float())
loss = criterion(output, labels)