Hello, I am implementing a segmentation task with masks and images in input and one class in output. I am trying to use the binary cross entropy from pytorch but I have this error :
> RuntimeError: “binary_cross_entropy” not implemented for 'Long’
Indeed, I have formatted my mask as a long type, otherwise this message appear from the loss :
> RuntimeError: Found dtype Float but expected Long
Does someone already have this error and know from where it come from ? Note that I am in torch ‘1.6.0’
binary_cross_entropy expects FloatTensors as the model output and target as seen here:
F.binary_cross_entropy(torch.sigmoid(torch.randn(10, 10)), torch.rand(10, 10)) # works
F.binary_cross_entropy(torch.sigmoid(torch.randn(10, 10)), torch.rand(10, 10).long()) # RuntimeError: Found dtype Long but expected Float
Are you sure the second RuntimeError is raised by this loss function and not nn.CrossEntropyLoss?
Well yes I am using nn.BCELoss(), but it seems to call binary_cross_entropy as in the error list there is :
File "C:\Users\gueganj\Miniconda3\envs\pytorch_env\lib\site-packages\torch\nn\modules\loss.py", line 529, in forward
return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)
File "C:\Users\gueganj\Miniconda3\envs\pytorch_env\lib\site-packages\torch\nn\functional.py", line 2484, in binary_cross_entropy
input, target, weight, reduction_enum)
RuntimeError: Found dtype Float but expected Long
Actually, I am not really sure to understand the difference between using nn or functionnal, Is there any subtleties to know here to make it work ?
Also note that output for the dtype of my tensors are torch.int64
The error message is (wrongly) raised, if you pass the input as a LongTensor, which won’t work since gradients aren’t calculated for int/long types.
How are you creating the input to this loss function?
If you are calling output.long() on the model output, note that this operation will detach the tensor from the computation graph.
It depends what your use case is.
Based on this code:
out = model(inputs)
_, predictions = torch.max(out, 1)
it seems you are dealing with a multi-class classification (or segmentation) and the out tensor would have the shape [batch_size, nb_classes, *].
If that’s the case, you should use nn.CrossEntropyLoss and the target tensor should be a LongTensor in the shape [batch_size, *] containing the class indices in the range [0, nb_classes-1].
On the other hand you are using nn.BCEWithLogitsLoss, which is used for a binary or multi-label classification.
I think you just raise a mistake from my side ! Indeed, I want to do a binary segmentation but I think wrongly design my model by outputting 2 values instead of 1 and then taking the max. I am kind of new to the field so an error happened quickly …
So what I should do is have only one output from my model (a softmax probability or a logit ?) and then giving to the nn.BCELoss() ?
use a single output unit without any activation function at the end and lass this logit to nn.BCEWithLogitsLoss. For this the targets should have the same shape as the model output and be FloatTensors. To get the predicted label you can apply torch.sigmoid and use a threshold via preds = output > threshold.
use two output units (treat the binary segmentation as a multi-class segmentation) and pass the logits to nn.CrossEntropyLoss. The target would be the LongTensor as described before. To get the predicted classes you can use preds = torch.argmax(output, dim=1).
I would expect some differences, since the model architecture would have to be changed and also these loss functions use a different loss calculation.
E.g. for a binary classification your last layer would have 2 output units using nn.CrossEntropyLoss, while only one using nn.BCEWithLogitsLoss. While this might be a “small” difference I would still expect to get different results.
Thanks for the solution. It took me ages to find this page and implement the solution. This is not obvious and totally different from other common AI/ML frameworks. Is there a webpage/document that someone can access this information in the first place?
It will be great if such an examples are added to the PyTorch tutorials/examples.
If you mean the expected shapes and dtypes for each loss function, you could check the docs as they explain each expected input. If you think the docs are lacking a proper explanation, all feedback is welcome.
For me it was BCEWithLogitsLoss and not BCE and then all the other instructions which like"
" For this the targets should have the same shape as the model output and be FloatTensors. To get the predicted label you can apply torch.sigmoid and use a threshold via preds = output > threshold"