I forgot the m
:
m = nn.Sigmoid()
I learned this from another PyTorch forum’s post for weighted BCEWithLogitLoss
for the same exact model (Vision Transformer that has two outputs) linked here BCELoss are unsafe to autocast - #8 by ptrblck and here Predicted labels stuck at 1 for test set where class 0 is 20% of data - #9 by mMagmer and Predicted labels stuck at 1 for test set where class 0 is 20% of data - #2 by mMagmer
So, I thought it would make sense to use the same strategy here as well. I am honestly not 100% sure about its soundness.
Also, here’s output shape and its m(of difference) shape:
output shape: torch.Size([64, 2])
m(output[:,1]-output[:,0]) shape: torch.Size([64])
Here’s my output from Vision Transformer:
tensor([[-0.4628, -0.0162],
[-0.1771, -0.2762],
[-0.3501, -0.3124],
[-0.0345, -0.2116],
[-0.6834, -0.6267],
[-0.3947, -0.3422],
[-0.5291, -0.3093],
[-0.3404, -0.4409],
[-0.4053, -0.0817],
[-0.2567, -0.5358],
[-0.4409, -0.4376],
[-0.3592, -0.5107],
[-0.6554, -0.0408],
[-0.6338, -0.7211],
[-0.2038, -0.3258],
[-0.3502, -0.2161],
[-0.2310, -0.4300],
[ 0.1375, -0.4513],
[-0.1515, -0.2475],
[-0.2232, -0.5464],
[-0.5991, -0.0105],
[-0.6468, -0.3417],
[-0.9478, -0.5296],
[-0.3018, 0.0058],
[-0.4747, -0.0496],
[-0.1090, -0.1725],
[-0.3093, -0.3793],
[-0.2367, 0.0939],
[-0.4250, -0.1503],
[-0.4808, -0.9099],
[-0.6547, -0.1873],
[-0.4889, -0.2087],
[-0.4146, -0.0471],
[-0.3048, -0.1532],
[-0.5915, -0.7724],
[-0.6641, -0.3917],
[-0.3719, -0.2148],
[-0.0768, -0.5107],
[-0.6068, -0.4270],
[-0.5275, 0.0754],
[-0.3668, -0.2665],
[-0.0615, -0.4781],
[-0.6371, -0.2831],
[-0.5597, -0.4243],
[-0.2276, -0.1467],
[-0.3069, 0.0041],
[-0.1659, -0.4976],
[-0.6002, -0.4510],
[-0.2321, -0.2460],
[-0.4541, 0.1983],
[-0.3305, -0.3162],
[-0.5350, -0.0780],
[-0.4779, -0.3603],
[-0.1400, -0.4827],
[-0.4159, -0.1576],
[-0.5064, -0.7692],
[-0.8219, -0.3282],
[-0.5917, -0.6336],
[-0.2134, -0.2807],
[-0.6567, -0.5691],
[-0.3580, 0.1714],
[-0.2116, -0.3069],
[-0.5027, -0.0743],
[-0.6859, -0.1410]], device='cuda:0', grad_fn=<AddmmBackward0>)
and here’s the m(output[:,1]-output[:,0])
:
tensor([0.6098, 0.4752, 0.5094, 0.4558, 0.5142, 0.5131, 0.5547, 0.4749, 0.5802,
0.4307, 0.5008, 0.4622, 0.6490, 0.4782, 0.4696, 0.5335, 0.4504, 0.3569,
0.4760, 0.4199, 0.6430, 0.5757, 0.6031, 0.5763, 0.6047, 0.4841, 0.4825,
0.5819, 0.5683, 0.3943, 0.6147, 0.5696, 0.5908, 0.5378, 0.4549, 0.5677,
0.5392, 0.3932, 0.5448, 0.6463, 0.5251, 0.3973, 0.5876, 0.5338, 0.5202,
0.5771, 0.4178, 0.5373, 0.4965, 0.6576, 0.5036, 0.6123, 0.5294, 0.4151,
0.5642, 0.4347, 0.6210, 0.4895, 0.4832, 0.5219, 0.6293, 0.4762, 0.6055,
0.6329], device='cuda:0', grad_fn=<SigmoidBackward0>)