Predicted labels stuck at 1 for test set where class 0 is 20% of data

Mona_Jalal · January 28, 2022, 7:36am

So, my class 0 is 20% and class 1 is 80% of the entire data and I have stratified 5 fold train/val/test with division of 60/20/20.
However, my predicted labels for test set are stuck at 1.
Any thoughts how to fix this? I used the weighted loss suggested in

except I am not sure if I used it correctly.

X here is an intermediate representation of NxD where N is number of patches in my images and D is resnet18 last layer feature, 512 1D tensor. My batch size is 16.

I am also unsure if pred_labels = out.argmax(dim=1) is correct or not? Also, couldn’t directly use BCELoss since my out looks like something like:

*one instance from val phase:

x is:  tensor([[ 0.0154,  0.0957],
        [-0.0446,  0.0787],
        [-0.0272,  0.0265],
        [ 0.1656, -0.1316],
        [ 0.0931,  0.0134],
        [ 0.2454,  0.2079],
        [ 0.5408, -0.0578],
        [ 0.2008,  0.1331],
        [ 0.1812,  0.0924],
        [ 0.1018,  0.0468],
        [ 0.1389,  0.2432],
        [-0.3082, -0.2986],
        [ 0.1031,  0.0219],
        [ 0.0910, -0.1239],
        [ 0.4165,  0.0890],
        [ 0.2290, -0.0611]], device='cuda:0')
loss:  tensor([0.6538, 0.6334, 0.7203, 0.5555, 0.7338, 0.7120, 1.0366, 0.7276, 0.7385,
        0.7210, 0.7466, 0.6884, 0.7346, 0.8064, 0.8703, 0.5585],
       device='cuda:0')
weight_:  tensor([0.9000, 0.9000, 0.1000, 0.1000, 0.9000, 0.9000, 0.9000, 0.9000, 0.9000,
        0.9000, 0.1000, 0.9000, 0.9000, 0.9000, 0.9000, 0.1000])
val pred:  tensor([1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0], device='cuda:0')
val label:  tensor([1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0])

self.criterion = nn.CrossEntropyLoss(reduce=False)
#self.criterion = nn.BCELoss(reduce=False)
def forward(self, X, labels):

    stacked_X = torch.stack(X)
    out = self.transformer(stacked_X)
   
    with torch.autocast('cuda'):
        # https://discuss.pytorch.org/t/unclear-about-weighted-bce-loss/21486/2 
        labels = torch.tensor(labels)
        weight = torch.tensor([0.1, 0.9]) # how to exactly decide on this weights?
        weight_ = weight[labels.data.view(-1).long()].view_as(labels)
        loss = self.criterion(out, torch.tensor(labels).cuda())
        print('loss: ', loss)
        print('weight_: ', weight_)
        loss_class_weighted = loss * weight_.cuda()
        loss_class_weighted = loss_class_weighted.mean()
   
   
    #pred = out.data.max(1)[1]
    
    pred_labels = out.argmax(dim=1)
    
    return pred_labels, labels, loss_class_weighted

If I use BCELoss with reduce=False instead, I get this error:

  loss = self.criterion(out, torch.tensor(labels).cuda())
Traceback (most recent call last):
  File "main_classifier.py", line 250, in <module>
    pred,label,loss = trainer.train(sample_batched, model)
    pred, labels, loss = model.forward(feats, labels)
    return self.module(*inputs[0], **kwargs[0])
  File "/home/jalal/research/venv/dpcc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
    loss = self.criterion(out, torch.tensor(labels).cuda())
  File "/home/jalal/research/venv/dpcc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jalal/research/venv/dpcc/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 603, in forward
    return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)
  File "/home/jalal/research/venv/dpcc/lib/python3.8/site-packages/torch/nn/functional.py", line 2906, in binary_cross_entropy
    raise ValueError(
ValueError: Using a target size (torch.Size([16])) that is different to the input size (torch.Size([16, 2])) is deprecated. Please ensure they have the same size.

mMagmer · January 28, 2022, 9:00am

unlike corssentropy which gets model score (-∞,+∞), BCELoss accept output of sigmoid layer as input.
if you want you can use

m = nn.Sigmoid()
loss = nn.BCELoss()
input = torch.randn(3, requires_grad=True)
target = torch.empty(3).random_(2)
output = loss(m(input[:,1]-input[:,0]), target)

with the same model or you can chage your model output shape to match the BCELoss.

Mona_Jalal · January 28, 2022, 8:10pm

Unfortunately that BCELoss didn’t work, even BCELossWithLogitsLoss doesn’t work. Do you know how I can fix it?

class Classifier(nn.Module):
    
    def __init__(self, n_class, batch_size):
        super(Classifier, self).__init__()
        self.batch_size = batch_size
        self.transformer = VisionTransformer()
        #self.criterion = nn.CrossEntropyLoss(reduce=False)
        #self.criterion = nn.BCELoss(reduce=False)
        self.criterion = nn.BCEWithLogitsLoss(reduce=False)



    def forward(self, X, labels):

        stacked_X = torch.stack(X)
        out = self.transformer(stacked_X)
       
        with torch.autocast('cuda'):
            # https://discuss.pytorch.org/t/unclear-about-weighted-bce-loss/21486/2 
            labels = torch.tensor(labels)
            weight = torch.tensor([0.1, 0.9]) # how to decide on this weights?
            weight_ = weight[labels.data.view(-1).long()].view_as(labels)
            m = nn.Sigmoid()
            print('sig: ', m(out[:,1]-out[:,0]))
            loss = self.criterion(torch.cuda.LongTensor(m(out[:,1]-out[:,0])), torch.tensor(labels).cuda())
            print('loss: ', loss)
            print('weight_: ', weight_)
            loss_class_weighted = loss * weight_.cuda()
            loss_class_weighted = loss_class_weighted.mean()
       
       
        #pred = out.data.max(1)[1]
        
        pred_labels = out.argmax(dim=1)
        
        return pred_labels, labels, loss_class_weighted

The error is:

        [ 0.2062, -0.2541],
        [-0.1909,  0.0930],
        [-0.1987, -0.3082],
        [-0.1557,  0.1971],
        [-0.0801, -0.0162],
        [ 0.3868,  0.2435],
        [ 0.7702,  0.0296],
        [ 0.1791, -0.1098],
        [-0.2040,  0.0221],
        [ 0.1514, -0.0552],
        [-0.0038, -0.0221],
        [-0.1212,  0.2830],
        [-0.1849, -0.3254],
        [ 0.0826, -0.2480],
        [ 0.1392, -0.2806]], device='cuda:0', grad_fn=<AddmmBackward0>)
sig:  tensor([0.4937, 0.3869, 0.5705, 0.4727, 0.5873, 0.5160, 0.4642, 0.3229, 0.4283,
        0.5563, 0.4485, 0.4954, 0.5997, 0.4649, 0.4181, 0.3966],
       device='cuda:0', grad_fn=<SigmoidBackward0>)
Traceback (most recent call last):
  File "main_classifier.py", line 250, in <module>
    pred,label,loss = trainer.train(sample_batched, model)
  
    pred, labels, loss = model.forward(feats, labels)

    return self.module(*inputs[0], **kwargs[0])
  File "/home/jalal/research/venv/dpcc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
 
    loss = self.criterion(torch.cuda.LongTensor(m(out[:,1]-out[:,0])), torch.tensor(labels).cuda())
TypeError: expected TensorOptions(dtype=long int, device=cuda, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)) (got TensorOptions(dtype=float, device=cuda:0, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)))

Mona_Jalal · January 28, 2022, 8:16pm

If I do exactly what you suggested,

class Classifier(nn.Module):
    
    def __init__(self, n_class, batch_size):
        super(Classifier, self).__init__()
        self.batch_size = batch_size
        self.transformer = VisionTransformer()
        #self.criterion = nn.CrossEntropyLoss(reduce=False)
        self.criterion = nn.BCELoss(reduce=False)
        #self.criterion = nn.BCEWithLogitsLoss(reduce=False)



    def forward(self, X, labels):

        stacked_X = torch.stack(X)
        out = self.transformer(stacked_X)
       
        with torch.autocast('cuda'):
            # https://discuss.pytorch.org/t/unclear-about-weighted-bce-loss/21486/2 
            labels = torch.tensor(labels)
            weight = torch.tensor([0.1, 0.9]) # how to decide on this weights?
            weight_ = weight[labels.data.view(-1).long()].view_as(labels)
            m = nn.Sigmoid()
            print('sig: ', m(out[:,1]-out[:,0]))
            #loss = self.criterion(torch.cuda.LongTensor(m(out[:,1]-out[:,0])), torch.tensor(labels).cuda())
            loss = self.criterion(m(out[:,1]-out[:,0]), torch.tensor(labels).cuda())

            print('loss: ', loss)
            print('weight_: ', weight_)
            loss_class_weighted = loss * weight_.cuda()
            loss_class_weighted = loss_class_weighted.mean()
       
       
        #pred = out.data.max(1)[1]
        
        pred_labels = out.argmax(dim=1)
        
        return pred_labels, labels, loss_class_weighted

I get the following error:

x is:  tensor([[ 0.9785,  0.2433],
        [ 0.9771, -0.0478],
        [ 0.7964, -0.0357],
        [ 0.8349, -0.0047],
        [ 1.0577, -0.3201],
        [ 1.1935,  0.0038],
        [ 0.9705, -0.4818],
        [ 1.0076,  0.3583],
        [ 0.8457, -0.2120],
        [ 0.8271,  0.4590],
        [ 0.8754,  0.0792],
        [ 0.7807,  0.3120],
        [ 0.5872,  0.1169],
        [ 1.1021,  0.2229],
        [ 0.7981,  0.0122],
        [ 0.9969,  0.5169]], device='cuda:0', grad_fn=<AddmmBackward0>)
sig:  tensor([0.3240, 0.2641, 0.3032, 0.3016, 0.2014, 0.2333, 0.1897, 0.3432, 0.2578,
        0.4090, 0.3108, 0.3849, 0.3845, 0.2933, 0.3131, 0.3823],
       device='cuda:0', grad_fn=<SigmoidBackward0>)
 UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  loss = self.criterion(m(out[:,1]-out[:,0]), torch.tensor(labels).cuda())
Traceback (most recent call last):
  File "main_classifier.py", line 250, in <module>
    pred,label,loss = trainer.train(sample_batched, model)

    pred, labels, loss = model.forward(feats, labels)
  File "/home/jalal/research/venv/dpcc/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/jalal/research/venv/dpcc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
 
    loss = self.criterion(m(out[:,1]-out[:,0]), torch.tensor(labels).cuda())
  File "/home/jalal/research/venv/dpcc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jalal/research/venv/dpcc/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 603, in forward
    return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)
  File "/home/jalal/research/venv/dpcc/lib/python3.8/site-packages/torch/nn/functional.py", line 2915, in binary_cross_entropy
    return torch._C._nn.binary_cross_entropy(input, target, weight, reduction_enum)
RuntimeError: torch.nn.functional.binary_cross_entropy and torch.nn.BCELoss are unsafe to autocast.
Many models use a sigmoid layer right before the binary cross entropy layer.
In this case, combine the two layers using torch.nn.functional.binary_cross_entropy_with_logits
or torch.nn.BCEWithLogitsLoss.  binary_cross_entropy_with_logits and BCEWithLogits are
safe to autocast.

ptrblck · January 29, 2022, 12:40am

The error message explains why your code is failing:

RuntimeError: torch.nn.functional.binary_cross_entropy and torch.nn.BCELoss are unsafe to autocast.
Many models use a sigmoid layer right before the binary cross entropy layer.
In this case, combine the two layers using torch.nn.functional.binary_cross_entropy_with_logits
or torch.nn.BCEWithLogitsLoss.  binary_cross_entropy_with_logits and BCEWithLogits are
safe to autocast.

Mona_Jalal · February 4, 2022, 9:54pm

@ptrblck so I was quick to judge that this method works.

When I used it now all val_preds are stuck at 0 instead.

Here’s the code:

class Classifier(nn.Module):
    
    def __init__(self, n_class, batch_size):
        super(Classifier, self).__init__()
        self.batch_size = batch_size
        self.transformer = VisionTransformer()
        self.criterion = nn.BCEWithLogitsLoss(reduce=False) # weighted loss
    
    def forward(self, X, labels, mask):
        out = self.transformer(X)
        labels = torch.tensor(labels, dtype=torch.float32) # we need float labels for BCEWithLogitsLoss
        weight = torch.tensor([0.2, 0.8]) # is this correct assignment of weights?
        weight_ = weight[labels.data.view(-1).long()].view_as(labels)
        m = nn.Sigmoid()
        with torch.cuda.amp.autocast():
            loss = self.criterion(m(out[:,1]-out[:,0]), labels.cuda())    
            loss_class_weighted = loss * weight_.cuda()
            loss_class_weighted = loss_class_weighted.mean()
            loss = loss_class_weighted
       
        pred_labels = out.data.max(1)[1]
        #pred_labels = out.argmax(dim=1)
        labels = labels.int()
        return pred_labels, labels, loss

Do you know what accounts for all val_preds getting stuck at 0 or previously at 1?

Also:

Is the weights I have selected correctly if class 0 is 20% of data and class 1 is 80% of data? weight = torch.tensor([0.2, 0.8])
I am not exactly sure what is the logic behind out[:,1]-out[:,0] proposed by mMagmer

Also, here’s an example of out from transformer. For example, if my batch size is 16, I have:

transformer out:  tensor([[ 0.5873, -0.5521],
        [ 0.6407, -0.6954],
        [ 0.1806, -0.3317],
        [-0.1862, -0.1044],
        [ 0.0688, -0.7443],
        [-0.1022, -0.3273],
        [ 0.3243, -0.5698],
        [ 0.1828, -0.3642],
        [ 0.0833, -1.0877],
        [ 0.0405, -0.1679],
        [ 0.2729, -0.3107],
        [ 0.2521, -0.7700],
        [ 0.3601, -0.4803],
        [-0.0508, -0.4775],
        [ 0.2773, -0.6211],
        [ 0.1521, -0.6477]], device='cuda:0', grad_fn=<AddmmBackward0>)
labels:  tensor([1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1], device='cuda:0',
       dtype=torch.int32)
pred labels:  tensor([0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], device='cuda:0')
loss:  tensor(0.3672, device='cuda:0', grad_fn=<MeanBackward0>)
epoch is 0
train accuracy: 0.19

train micro precision: 0.19
train micro recall: 0.19
train micro F1-score: 0.19

train macro precision: 0.59
train macro recall: 0.51
train macro F1-score: 0.17

As you see in train phase, not all train_preds are stuck at either of zero or one, but in validation phase everything is stuck at 1 using the weighted BCEWithLogitLoss.

val epoch preds:   [tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1), tensor(1)]
evaluator.get_scores 0.8088235294117647

Here’s an example of Transformer out from evaluation phase:

transformer out:  tensor([[-0.1766,  1.3507],
        [-0.1280,  1.2671],
        [ 0.0400,  1.4123],
        [-0.1593,  1.4637],
        [-0.2360,  1.3756],
        [-0.2181,  1.3562],
        [-0.1042,  1.3980],
        [-0.0483,  1.4103],
        [-0.2289,  1.2945],
        [-0.0376,  1.4060],
        [-0.2179,  1.2876],
        [-0.1700,  1.3776],
        [ 0.1045,  1.4502],
        [-0.1199,  1.3978],
        [-0.1731,  1.3738],
        [-0.1940,  1.2998]], device='cuda:0')
labels:  tensor([1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1], device='cuda:0',
       dtype=torch.int32)
pred labels:  tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], device='cuda:0')
loss:  tensor(0.2717, device='cuda:0')

Do you know what could be fixed?

eqy · February 4, 2022, 10:04pm

I think in this situation a good litmus test would be to see if you can fit your model on a very small fake/generated dataset (e.g., only 4 data points) and see if it can reach 100% accuracy. If it cannot, then there is something broken in the training loop.

You may also want to check if there are any unusual differences between training/validation (e.g., missing preprocessing steps in one stage).

Mona_Jalal · February 5, 2022, 12:24am

@ptrblck why are we using BCEWithLogitLoss if the transformer output has 2 values instead of one and why do we do the following? out[:,1]-out[:,0] I can’t follow closely and cannot find a similar example which does this subtraction. I get the reason behind using sigmoid, not the subtraction.

transformer out:  tensor([[ 0.2196, -0.0613],
        [ 0.4505,  0.1374],
        [ 0.3460,  0.0719],
        [ 0.4002,  0.0553],
        [ 0.4386, -0.1715],
        [ 0.2193,  0.2193],
        [ 0.5641,  0.1940],
        [ 0.4318, -0.0859]], device='cuda:0')

^here the batch size is 8.

Here’s also another example with print of all intermediate values of out:

transformer out:  tensor([[-0.1355, -0.9723],
        [-0.0794, -0.9947],
        [-0.1470, -1.1221],
        [-0.2700, -1.3382],
        [-0.1400, -1.1675],
        [-0.1970, -1.1121],
        [-0.1418, -1.3136],
        [-0.1814, -1.2491]], device='cuda:0')
out[:,1]:  tensor([-0.9723, -0.9947, -1.1221, -1.3382, -1.1675, -1.1121, -1.3136, -1.2491],
       device='cuda:0')
out[:,0]:  tensor([-0.1355, -0.0794, -0.1470, -0.2700, -0.1400, -0.1970, -0.1418, -0.1814],
       device='cuda:0')
out[:,1]-out[:,0]:  tensor([-0.8368, -0.9153, -0.9751, -1.0682, -1.0276, -0.9151, -1.1718, -1.0676],
       device='cuda:0')
m(out[:,1]-out[:,0]):  tensor([0.3022, 0.2859, 0.2739, 0.2557, 0.2636, 0.2859, 0.2365, 0.2559],
       device='cuda:0')

mMagmer · February 5, 2022, 8:36am

If we have two class and f_0 and f_1 as score for each class, then for cross entropy loss
eta_0 = SoftMax(f)[0] = exp(f_0)/(exp(f_0)+exp(f_1)) = 1/(1+exp(f_1-f_0))
hence, loss is equivalent to BCE with logit for single score (f_1-f_0)