"binary_cross_entropy" not implemented for 'Long'

Hello, I am implementing a segmentation task with masks and images in input and one class in output. I am trying to use the binary cross entropy from pytorch but I have this error :

> RuntimeError: “binary_cross_entropy” not implemented for 'Long’

Indeed, I have formatted my mask as a long type, otherwise this message appear from the loss :

> RuntimeError: Found dtype Float but expected Long

Does someone already have this error and know from where it come from ?
Note that I am in torch ‘1.6.0’

Thank you in advance

binary_cross_entropy expects FloatTensors as the model output and target as seen here:

F.binary_cross_entropy(torch.sigmoid(torch.randn(10, 10)), torch.rand(10, 10)) # works
F.binary_cross_entropy(torch.sigmoid(torch.randn(10, 10)), torch.rand(10, 10).long()) # RuntimeError: Found dtype Long but expected Float

Are you sure the second RuntimeError is raised by this loss function and not nn.CrossEntropyLoss?

Well yes I am using nn.BCELoss(), but it seems to call binary_cross_entropy as in the error list there is :

  File "C:\Users\gueganj\Miniconda3\envs\pytorch_env\lib\site-packages\torch\nn\modules\loss.py", line 529, in forward
    return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)

  File "C:\Users\gueganj\Miniconda3\envs\pytorch_env\lib\site-packages\torch\nn\functional.py", line 2484, in binary_cross_entropy
    input, target, weight, reduction_enum)

RuntimeError: Found dtype Float but expected Long

Actually, I am not really sure to understand the difference between using nn or functionnal, Is there any subtleties to know here to make it work ?

Also note that output for the dtype of my tensors are torch.int64

The error message is (wrongly) raised, if you pass the input as a LongTensor, which won’t work since gradients aren’t calculated for int/long types.
How are you creating the input to this loss function?
If you are calling output.long() on the model output, note that this operation will detach the tensor from the computation graph.

First, I format my mask in 1 and 0 (cause when loading, values are 255 and some images noise/artefact appear) :

mask = np.where(mask>mask.mean(), 1, 0)
mask = torch.from_numpy(mask) 
mask = mask.long()

It is now torch.int64, should I parse it to float ?

Then, for predictions, I do :

out = model(inputs)
_, predictions = torch.max(out, 1)

prediction is now a torch.int64, should I parse the result in float also here ? Should I do something like :

predictions = predictions.float()
predictions.requires_grad = True


It depends what your use case is.
Based on this code:

 out = model(inputs)
_, predictions = torch.max(out, 1)

it seems you are dealing with a multi-class classification (or segmentation) and the out tensor would have the shape [batch_size, nb_classes, *].
If that’s the case, you should use nn.CrossEntropyLoss and the target tensor should be a LongTensor in the shape [batch_size, *] containing the class indices in the range [0, nb_classes-1].

On the other hand you are using nn.BCEWithLogitsLoss, which is used for a binary or multi-label classification.

Could you explain your use case a bit?

I think you just raise a mistake from my side ! Indeed, I want to do a binary segmentation but I think wrongly design my model by outputting 2 values instead of 1 and then taking the max. I am kind of new to the field so an error happened quickly … :sweat:

So what I should do is have only one output from my model (a softmax probability or a logit ?) and then giving to the nn.BCELoss() ?

Thank you for your big help (as always)

You can

  • use a single output unit without any activation function at the end and lass this logit to nn.BCEWithLogitsLoss. For this the targets should have the same shape as the model output and be FloatTensors. To get the predicted label you can apply torch.sigmoid and use a threshold via preds = output > threshold.
  • use two output units (treat the binary segmentation as a multi-class segmentation) and pass the logits to nn.CrossEntropyLoss. The target would be the LongTensor as described before. To get the predicted classes you can use preds = torch.argmax(output, dim=1).
1 Like

Thank you it is indeed what I should have done ! (Still the error message was a bit confusing in my case)

But do I have to expect different result between this two methods (BCE+threshold and CE+argmax) ? assuming my threshold is not badly chosen

I would expect some differences, since the model architecture would have to be changed and also these loss functions use a different loss calculation.
E.g. for a binary classification your last layer would have 2 output units using nn.CrossEntropyLoss, while only one using nn.BCEWithLogitsLoss. While this might be a “small” difference I would still expect to get different results.

Hi ptrblck,

Thanks for the solution. It took me ages to find this page and implement the solution. This is not obvious and totally different from other common AI/ML frameworks. Is there a webpage/document that someone can access this information in the first place?

It will be great if such an examples are added to the PyTorch tutorials/examples.


If you mean the expected shapes and dtypes for each loss function, you could check the docs as they explain each expected input. If you think the docs are lacking a proper explanation, all feedback is welcome.

1 Like

For me it was BCEWithLogitsLoss and not BCE and then all the other instructions which like"

" For this the targets should have the same shape as the model output and be FloatTensors. To get the predicted label you can apply torch.sigmoid and use a threshold via preds = output > threshold"

Thanks anyway.