Predicted labels stuck at 1 for test set where class 0 is 20% of data

@ptrblck so I was quick to judge that this method works.

When I used it now all val_preds are stuck at 0 instead.

Here’s the code:

class Classifier(nn.Module):
    
    def __init__(self, n_class, batch_size):
        super(Classifier, self).__init__()
        self.batch_size = batch_size
        self.transformer = VisionTransformer()
        self.criterion = nn.BCEWithLogitsLoss(reduce=False) # weighted loss
    
    def forward(self, X, labels, mask):
        out = self.transformer(X)
        labels = torch.tensor(labels, dtype=torch.float32) # we need float labels for BCEWithLogitsLoss
        weight = torch.tensor([0.2, 0.8]) # is this correct assignment of weights?
        weight_ = weight[labels.data.view(-1).long()].view_as(labels)
        m = nn.Sigmoid()
        with torch.cuda.amp.autocast():
            loss = self.criterion(m(out[:,1]-out[:,0]), labels.cuda())    
            loss_class_weighted = loss * weight_.cuda()
            loss_class_weighted = loss_class_weighted.mean()
            loss = loss_class_weighted
       
        pred_labels = out.data.max(1)[1]
        #pred_labels = out.argmax(dim=1)
        labels = labels.int()
        return pred_labels, labels, loss

Do you know what accounts for all val_preds getting stuck at 0 or previously at 1?

Also:

  1. Is the weights I have selected correctly if class 0 is 20% of data and class 1 is 80% of data? weight = torch.tensor([0.2, 0.8])
  2. I am not exactly sure what is the logic behind out[:,1]-out[:,0] proposed by mMagmer

Also, here’s an example of out from transformer. For example, if my batch size is 16, I have:

transformer out:  tensor([[ 0.5873, -0.5521],
        [ 0.6407, -0.6954],
        [ 0.1806, -0.3317],
        [-0.1862, -0.1044],
        [ 0.0688, -0.7443],
        [-0.1022, -0.3273],
        [ 0.3243, -0.5698],
        [ 0.1828, -0.3642],
        [ 0.0833, -1.0877],
        [ 0.0405, -0.1679],
        [ 0.2729, -0.3107],
        [ 0.2521, -0.7700],
        [ 0.3601, -0.4803],
        [-0.0508, -0.4775],
        [ 0.2773, -0.6211],
        [ 0.1521, -0.6477]], device='cuda:0', grad_fn=<AddmmBackward0>)
labels:  tensor([1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1], device='cuda:0',
       dtype=torch.int32)
pred labels:  tensor([0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], device='cuda:0')
loss:  tensor(0.3672, device='cuda:0', grad_fn=<MeanBackward0>)
epoch is 0
train accuracy: 0.19

train micro precision: 0.19
train micro recall: 0.19
train micro F1-score: 0.19

train macro precision: 0.59
train macro recall: 0.51
train macro F1-score: 0.17

As you see in train phase, not all train_preds are stuck at either of zero or one, but in validation phase everything is stuck at 1 using the weighted BCEWithLogitLoss.

val epoch preds:   [tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1)]
evaluator.get_scores 0.8088235294117647

Here’s an example of Transformer out from evaluation phase:

transformer out:  tensor([[-0.1766,  1.3507],
        [-0.1280,  1.2671],
        [ 0.0400,  1.4123],
        [-0.1593,  1.4637],
        [-0.2360,  1.3756],
        [-0.2181,  1.3562],
        [-0.1042,  1.3980],
        [-0.0483,  1.4103],
        [-0.2289,  1.2945],
        [-0.0376,  1.4060],
        [-0.2179,  1.2876],
        [-0.1700,  1.3776],
        [ 0.1045,  1.4502],
        [-0.1199,  1.3978],
        [-0.1731,  1.3738],
        [-0.1940,  1.2998]], device='cuda:0')
labels:  tensor([1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1], device='cuda:0',
       dtype=torch.int32)
pred labels:  tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], device='cuda:0')
loss:  tensor(0.2717, device='cuda:0')

Do you know what could be fixed?