Logpt = logpt.gather(1,target) IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

I forgot the m:
m = nn.Sigmoid()
I learned this from another PyTorch forum’s post for weighted BCEWithLogitLoss for the same exact model (Vision Transformer that has two outputs) linked here BCELoss are unsafe to autocast - #8 by ptrblck and here Predicted labels stuck at 1 for test set where class 0 is 20% of data - #9 by mMagmer and Predicted labels stuck at 1 for test set where class 0 is 20% of data - #2 by mMagmer
So, I thought it would make sense to use the same strategy here as well. I am honestly not 100% sure about its soundness.

Also, here’s output shape and its m(of difference) shape:

output shape:  torch.Size([64, 2])
m(output[:,1]-output[:,0]) shape:  torch.Size([64])

Here’s my output from Vision Transformer:

tensor([[-0.4628, -0.0162],
        [-0.1771, -0.2762],
        [-0.3501, -0.3124],
        [-0.0345, -0.2116],
        [-0.6834, -0.6267],
        [-0.3947, -0.3422],
        [-0.5291, -0.3093],
        [-0.3404, -0.4409],
        [-0.4053, -0.0817],
        [-0.2567, -0.5358],
        [-0.4409, -0.4376],
        [-0.3592, -0.5107],
        [-0.6554, -0.0408],
        [-0.6338, -0.7211],
        [-0.2038, -0.3258],
        [-0.3502, -0.2161],
        [-0.2310, -0.4300],
        [ 0.1375, -0.4513],
        [-0.1515, -0.2475],
        [-0.2232, -0.5464],
        [-0.5991, -0.0105],
        [-0.6468, -0.3417],
        [-0.9478, -0.5296],
        [-0.3018,  0.0058],
        [-0.4747, -0.0496],
        [-0.1090, -0.1725],
        [-0.3093, -0.3793],
        [-0.2367,  0.0939],
        [-0.4250, -0.1503],
        [-0.4808, -0.9099],
        [-0.6547, -0.1873],
        [-0.4889, -0.2087],
        [-0.4146, -0.0471],
        [-0.3048, -0.1532],
        [-0.5915, -0.7724],
        [-0.6641, -0.3917],
        [-0.3719, -0.2148],
        [-0.0768, -0.5107],
        [-0.6068, -0.4270],
        [-0.5275,  0.0754],
        [-0.3668, -0.2665],
        [-0.0615, -0.4781],
        [-0.6371, -0.2831],
        [-0.5597, -0.4243],
        [-0.2276, -0.1467],
        [-0.3069,  0.0041],
        [-0.1659, -0.4976],
        [-0.6002, -0.4510],
        [-0.2321, -0.2460],
        [-0.4541,  0.1983],
        [-0.3305, -0.3162],
        [-0.5350, -0.0780],
        [-0.4779, -0.3603],
        [-0.1400, -0.4827],
        [-0.4159, -0.1576],
        [-0.5064, -0.7692],
        [-0.8219, -0.3282],
        [-0.5917, -0.6336],
        [-0.2134, -0.2807],
        [-0.6567, -0.5691],
        [-0.3580,  0.1714],
        [-0.2116, -0.3069],
        [-0.5027, -0.0743],
        [-0.6859, -0.1410]], device='cuda:0', grad_fn=<AddmmBackward0>)

and here’s the m(output[:,1]-output[:,0]):

tensor([0.6098, 0.4752, 0.5094, 0.4558, 0.5142, 0.5131, 0.5547, 0.4749, 0.5802,
        0.4307, 0.5008, 0.4622, 0.6490, 0.4782, 0.4696, 0.5335, 0.4504, 0.3569,
        0.4760, 0.4199, 0.6430, 0.5757, 0.6031, 0.5763, 0.6047, 0.4841, 0.4825,
        0.5819, 0.5683, 0.3943, 0.6147, 0.5696, 0.5908, 0.5378, 0.4549, 0.5677,
        0.5392, 0.3932, 0.5448, 0.6463, 0.5251, 0.3973, 0.5876, 0.5338, 0.5202,
        0.5771, 0.4178, 0.5373, 0.4965, 0.6576, 0.5036, 0.6123, 0.5294, 0.4151,
        0.5642, 0.4347, 0.6210, 0.4895, 0.4832, 0.5219, 0.6293, 0.4762, 0.6055,
        0.6329], device='cuda:0', grad_fn=<SigmoidBackward0>)