Crash in BCEWithLogitsLoss

I’m training a network that consists of convolutional layers, which in the end outputs a single binary value (1.0 = true, 0.0 = false). I’m training used mixed precision, using Linear layers at the end, with BCEWithLogitsLoss, with Adam optimizer, using PyTorch 1.12.1.

Training works fine, generally. But completely randomly, in the middle of the training, training sometimes suddenly aborts with the following error:

RuntimeError: Subtraction, the -operator, with two bool tensors is not supported. Use the^orlogical_xor() operator instead.

This happens in “return torch.binary_cross_entropy_with_logits”.

Am I doing something wrong? Since the error happens randomly, the network seems to be fine, generally. So can it still be my fault? Any tips?

Network structure looks something like this:

        data   # image with 128x80 resolution
        data = self.convs(data)  # some convolutional layers
        data = F.interpolate(data, scale_factor = 1.0 / 2.0, mode="bilinear", align_corners=False)  # 64x40
        data = self.convs(data)
        data = F.interpolate(data, scale_factor = 1.0 / 2.0, mode="bilinear", align_corners=False)  # 32x20
        data = self.convs(data)
        data = F.interpolate(data, scale_factor = 1.0 / 2.0, mode="bilinear", align_corners=False)  # 16x10
        data = self.convs(data)
        data = F.interpolate(data, scale_factor = 1.0 / 2.0, mode="bilinear", align_corners=False)  # 9x5
        data = self.convs(data)
        data = F.interpolate(data, scale_factor = 1.0 / 2.0, mode="bilinear", align_corners=False)  # 4x2
        data = data.view(data.shape[0], -1)
        data = self.linearLayers(data)  # uses Linear layers to get down to 1 element
        return data

The error can be raised if you are passing the model output or target as a BoolTensor to nn.BCEWithLogitsLoss as seen here:

criterion = nn.BCEWithLogitsLoss()

output = torch.randn(10, 1, requires_grad=True)
target = torch.randint(0, 2, (10, 1)).float()

# works
loss = criterion(output, target)

# fails
loss = criterion(output, target.bool())
# RuntimeError: Subtraction, the `-` operator, with two bool tensors is not supported. Use the `^` or `logical_xor()` operator instead.

loss = criterion(output.bool(), target)
# RuntimeError: Negation, the `-` operator, on a bool tensor is not supported. If you are trying to invert a mask, use the `~` or `logical_not()` operator instead.

Based on the error message it seems the target tensor is a BoolTensor so check how it’s created and make sure it’s a floating point tensor using the same dtype as the model output.

Thanks for your reply! I’ve added a “print” instruction like this:

print(networkOutputTensor, labelTensor)
loss = self.loss_fn(networkOutputTensor, labelTensor)

Here’s the output:

tensor([-5.6328, -5.1094, -6.1445,  9.6719, -5.9961, 13.7109, -5.1367, 32.8438,
         3.5176, -4.2031, 13.0859,  0.6255, 29.5312, 16.6406, -6.0039, -4.9219],
       device='cuda:0', dtype=torch.float16, grad_fn=<SelectBackward0>)
tensor([0., 0., 0., 1., 0., 1., 0., 1., 1., 0., 1., 0., 1., 1., 0., 0.],
       device='cuda:0', dtype=torch.float64)

So it would seem both tenstors are float. Still, I randomly get that runtime error (mentioned above) in the middle of training. The loss function is “BCEWithLogitsLoss()”.

Any ideas? :smiling_face_with_tear:

Edit: At some place later in the code I have this, though:

        for batch in range(networkOutputTensor.shape[0]):
          if (torch.sigmoid(networkOutputTensor[batch]) > 0.5) and (labelTensor[batch] <= 0.5):
            falsePositives += 1
          elif (torch.sigmoid(networkOutputTensor[batch]) <= 0.5) and (labelTensor[batch] > 0.5):
            notDetected += 1

But this shouldn’t be a problem, right? The runtime error callstack clearly points to the loss function.

Would you share the message you got?

Since the error seems to be raised “randomly” I would assume that the print statement would show a BoolTensor right before the failure occurs.
Also, make sure you are checking the right line of code as sometimes these issues are caused in e.g. the validation or testing loop which might look “random” as these might be triggered at a specific interval.

Oh wow, you’re right!! I’ve run training with the “print” until it fails, and this is the output:

tensor([ 4.9922,  7.8398,  7.8320,  5.6445,  3.8691, 11.3438,  7.5859, 12.3906,
         8.8203, 12.3281,  5.8672, 12.6562,  6.7852,  6.2578, 13.4375,  5.0234],
       device='cuda:0', dtype=torch.float16, grad_fn=<SelectBackward0>)
tensor([True, True, True, True, True, True, True, True, True, True, True, True,
        True, True, True, True], device='cuda:0')
Traceback (most recent call last):
  File "train.py", line 124, in <module>
    train(0, 1);
  File "train.py", line 93, in train
    falsePositives, notDetected = model.train(images[:, :, :, 0:3], images[:, :, :, 3:6], images[:, :, :, 6:9])
  File "/home/ubuntu/classify/model/classify.py", line 96, in update
    loss_G = self.loss(modelOutput, label)
  File "/usr/lib/python3/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/lib/python3/dist-packages/torch/nn/modules/loss.py", line 714, in forward
    return F.binary_cross_entropy_with_logits(input, target,
  File "/usr/lib/python3/dist-packages/torch/nn/functional.py", line 3150, in binary_cross_entropy_with_logits
    return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)
RuntimeError: Subtraction, the `-` operator, with two bool tensors is not supported. Use the `^` or `logical_xor()` operator instead.

So it does seem that the labels are sometimes Bools, and I don’t know why. Usually, they’re Floats. Will have to dig deeper to figure out why they’re sometimes Bools.

Thanks a lot for your help!!!

1 Like

Good to hear my assumption was correct. Let us know once you were able to narrow down the issue and the operation creating these BoolTensors (e.g. if the Dataset created them or another operation).

I’ve found the issue: One branch in my dataset produced bool labels… :smiling_face_with_tear:

Now I’m embarrassed. Thanks a ton for your on-point feedback. You insisting that bool tensors must be involved was key to solving the issue.

1 Like