BCELoss are unsafe to autocast

I am trying to do autocast but I am getting the following error

RuntimeError: torch.nn.functional.binary_cross_entropy and torch.nn.BCELoss are unsafe to autocast.
Many models use a sigmoid layer right before the binary cross entropy layer.
In this case, combine the two layers using torch.nn.functional.binary_cross_entropy_with_logits
or torch.nn.BCEWithLogitsLoss.  binary_cross_entropy_with_logits and BCEWithLogits are
safe to autocast.

This is saying me to use BCEWithLogitsLoss instead of BCELoss, but my last layer contain tanh instead of sigmoid. How can I use BCEWithLogitsLoss though I don’t want sigmoid layer.
Thanks

1 Like

nn.BCELoss expects probabilities in the range [0, 1] as inputs and will yield an error otherwise:

criterion = nn.BCELoss()
x = torch.randn(10, 1)
criterion(torch.tanh(x), torch.randint(0, 2, (10, 1)).float())
> RuntimeError: all elements of input should be between 0 and 1

so you would have to use an activation function, which creates these values (such as sigmoid) and could thus switch to nn.BCEWithLogitsLoss.

1 Like

Hi Piotr, using sigmoid and then BCEWithLogitsLoss doesn’t work for me. Do you know what could be the problem? Predicted labels stuck at 1 for test set where class 0 is 20% of data - #3 by Mona_Jalal

As you see in that post, I have converted the values to [0, 1] using sigmoid: sig: tensor([0.4937, 0.3869, 0.5705, 0.4727, 0.5873, 0.5160, 0.4642, 0.3229, 0.4283, 0.5563, 0.4485, 0.4954, 0.5997, 0.4649, 0.4181, 0.3966],

That’s expected, since nn.BCEWithLogitsLoss expects logits (remove the sigmoid activation) while nn.BCELoss expects probabilities (use sigmoid here).
Generally using logits and nn.BCEWithLogitsLoss is the recommended approach as it has better numerical stability.

2 Likes

do you mean:
self.criterion = nn.BCEWithLogitsLoss(reduce=False)
with:

stacked_X = torch.stack(X)
        out = self.transformer(stacked_X)
       
        with torch.autocast('cuda'):
            # https://discuss.pytorch.org/t/unclear-about-weighted-bce-loss/21486/2 
            labels = torch.tensor(labels)
            weight = torch.tensor([0.1, 0.9]) # how to decide on this weights?
            weight_ = weight[labels.data.view(-1).long()].view_as(labels)
 
            loss = self.criterion(out[:,1]-out[:,0], torch.tensor(labels).cuda())

Because I am still getting error:

return F.binary_cross_entropy_with_logits(input, target,
  File "/home/jalal/research/venv/dpcc/lib/python3.8/site-packages/torch/nn/functional.py", line 2982, in binary_cross_entropy_with_logits
    return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)
RuntimeError: result type Float can't be cast to the desired output type Long

could you please show how to use it in my context?

Could you post an executable code snippet using random tensors, which would reproduce this issue, so that I could debug it, please?

1 Like

Yeah, definitely. Thanks a lot for your time. Here it is:

import torch
out = torch.tensor([[-0.2422, -0.1971],
        [-0.4763,  0.2703],
        [-0.5091,  0.1275],
        [-0.4697,  0.3374],
        [-0.1803, -0.2947],
        [-0.6293,  0.1251],
        [-0.1562,  0.3778],
        [-0.5306,  0.1107],
        [-0.3032,  0.4657],
        [-0.8656,  0.2032],
        [-0.3078,  0.2820],
        [-0.4398,  0.1044],
        [-0.4838, -0.0568],
        [-0.7053,  0.4384],
        [-0.4647, -0.0030],
        [-0.4993,  0.2855]], device='cuda:0')
labels = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0]
labels = torch.tensor(labels)


with torch.autocast('cuda'):
        criterion = torch.nn.BCEWithLogitsLoss(reduce=False)
        weight = torch.tensor([0.1, 0.9]) # how to decide on this weights?
        weight_ = weight[labels.data.view(-1).long()].view_as(labels)
        m = torch.nn.Sigmoid()
        print('out[:,1]-out[:,0]: ', out[:,1]-out[:,0])
        print('m(out[:,1]-out[:,0]): ', m(out[:,1]-out[:,0]))
        #loss = self.criterion(torch.cuda.LongTensor(m(out[:,1]-out[:,0])),        labels.cuda())
        #loss = criterion(out[:,1]-out[:,0], labels.cuda()) #I think this is wrong with a high probability
        loss = criterion(m(out[:,1]-out[:,0]), labels.cuda())

        print('loss: ', loss)
        print('weight_: ', weight_)
        loss_class_weighted = loss * weight_.cuda()
        loss_class_weighted = loss_class_weighted.mean()

Error/output is:

out[:,1]-out[:,0]:  tensor([ 0.0451,  0.7466,  0.6366,  0.8071, -0.1144,  0.7544,  0.5340,  0.6413,
         0.7689,  1.0688,  0.5898,  0.5442,  0.4270,  1.1437,  0.4617,  0.7848],
       device='cuda:0')
m(out[:,1]-out[:,0]):  tensor([0.5113, 0.6784, 0.6540, 0.6915, 0.4714, 0.6801, 0.6304, 0.6550, 0.6833,
        0.7444, 0.6433, 0.6328, 0.6052, 0.7584, 0.6134, 0.6867],
       device='cuda:0')
/home/jalal/research/venv/dpcc/lib/python3.8/site-packages/torch/nn/_reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
  warnings.warn(warning.format(ret))
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [16], in <module>
     28 print('m(out[:,1]-out[:,0]): ', m(out[:,1]-out[:,0]))
     29 #loss = self.criterion(torch.cuda.LongTensor(m(out[:,1]-out[:,0])), labels.cuda())
     30 #loss = criterion(out[:,1]-out[:,0], labels.cuda()) #I think this is wrong with a high probability
---> 31 loss = criterion(m(out[:,1]-out[:,0]), torch.FloatTensor.cuda(labels))
     33 print('loss: ', loss)
     34 print('weight_: ', weight_)

File ~/research/venv/dpcc/lib/python3.8/site-packages/torch/nn/modules/module.py:1102, in Module._call_impl(self, *input, **kwargs)
   1098 # If we don't have any hooks, we want to skip the rest of the logic in
   1099 # this function, and just call forward.
   1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102     return forward_call(*input, **kwargs)
   1103 # Do not call functions when jit is used
   1104 full_backward_hooks, non_full_backward_hooks = [], []

File ~/research/venv/dpcc/lib/python3.8/site-packages/torch/nn/modules/loss.py:704, in BCEWithLogitsLoss.forward(self, input, target)
    703 def forward(self, input: Tensor, target: Tensor) -> Tensor:
--> 704     return F.binary_cross_entropy_with_logits(input, target,
    705                                               self.weight,
    706                                               pos_weight=self.pos_weight,
    707                                               reduction=self.reduction)

File ~/research/venv/dpcc/lib/python3.8/site-packages/torch/nn/functional.py:2982, in binary_cross_entropy_with_logits(input, target, weight, size_average, reduce, reduction, pos_weight)
   2979 if not (target.size() == input.size()):
   2980     raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))
-> 2982 return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)

RuntimeError: result type Float can't be cast to the desired output type Long

Thanks!
nn.BCEWithLogitsLoss expects FloatTensor targets, so use:

labels = torch.tensor(labels, dtype=torch.float32)

and it should work.

1 Like

Thanks a lot. This resolved the issue.