CrossEntropyLoss for Image Segmentation Error

Lukas_Lansche · April 30, 2020, 4:17pm

Hi, I’m a little stuck with the CrossEntropyLoss,
I have a dataset with 500 Images all pixelwise labeled for Semantic Segmentation. The Dataset contains 5 classes, now the problem is that one class covers about 84% of all pixels.
That’s why I wanted to use the CrossEntropyLoss, to weight the other classes higher.

I’m stuck with this error:

Traceback (most recent call last):
  File "C:\Users\lukas\anaconda3\envs\Python\lib\site-packages\segmentation_models_pytorch\utils\train.py", line 47, in run
    loss, y_pred = self.batch_update(x, y)
  File "C:\Users\lukas\anaconda3\envs\Python\lib\site-packages\segmentation_models_pytorch\utils\train.py", line 104, in batch_update
    loss = self.loss(prediction, y.long())
  File "C:\Users\lukas\anaconda3\envs\Python\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\lukas\anaconda3\envs\Python\lib\site-packages\torch\nn\modules\loss.py", line 916, in forward
    ignore_index=self.ignore_index, reduction=self.reduction)
  File "C:\Users\lukas\anaconda3\envs\Python\lib\site-packages\torch\nn\functional.py", line 2021, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "C:\Users\lukas\anaconda3\envs\Python\lib\site-packages\torch\nn\functional.py", line 1840, in nll_loss
    ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: 1only batches of spatial targets supported (non-empty 3D tensors) but got targets of size: : [2, 5, 928, 928]

Process finished with exit code 1

both tensors, input and target have the same size of [2, 5, 928, 928] with Batch_size = 2, number_classes = 5 and the image Size [928, 928]

any suggestions what I’m doing wrong ? I also tried the target as a not one-hot encoded tensor but ended in another error …

KFrank · April 30, 2020, 6:37pm

Hi Lukas!

This is your problem. CrossEntropyLoss expects integer
class labels for its target. So target should have shape
[2, 928, 928], and the target values should be integers
that run from 0 to 4.

Try again not one-hotting your target. If it still doesn’t work, please
post the code where you instantiate CrossEntropyLoss, the code
where you call your loss function, the shape of your input and
target, and any error messages you get.

Good luck.

K. Frank

Lukas_Lansche · April 30, 2020, 7:54pm

Hi Frank , thank you for your reply here is my function, where I call the loss function:

    def batch_update_cross(self, x, y):
        self.optimizer.zero_grad()
        prediction = self.model.forward(x)
        depth = y.shape[1]
        merged_mask = torch.zeros([2, 928 ,928])
        for j in range(depth):
            merged_mask[y[:, j, :, :] == 1] = int(j + 1)
        y = merged_mask.long().cuda()
        loss = self.loss(prediction, y)
        loss.backward()
        self.optimizer.step()
        return loss, prediction

I first tried it with :

y = merged_mask.int().cuda()

but then I got this error:

RuntimeError: Expected object of scalar type Long but got scalar type Int for argument #2 'target' in call to _thnn_nll_loss2d_forward

So i changed .int() into .long() and this following error appears:

C:/w/1/s/tmp_conda_3.7_100118/conda/conda-bld/pytorch_1579082551706/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:104: block: [9,0,0], thread: [526,0,0] Assertion `t >= 0 && t < n_classes` failed.
Traceback (most recent call last):
  File "C:/Users/lukas/Documents/SchneidbiegeRoboter/PyTorch/Pytroch_train_model.py", line 150, in <module>
    m, o = train(train_idx=training_idx, val_idx=validation_idx, retrain=settings.retrain)
  File "C:/Users/lukas/Documents/SchneidbiegeRoboter/PyTorch/Pytroch_train_model.py", line 91, in train
    train_logs = train_epoch.run(train_loader)
  File "C:\Users\lukas\anaconda3\envs\Python\lib\site-packages\segmentation_models_pytorch\utils\train.py", line 47, in run
    loss, y_pred = self.batch_update_cross(x, y)
  File "C:\Users\lukas\anaconda3\envs\Python\lib\site-packages\segmentation_models_pytorch\utils\train.py", line 94, in batch_update_cross
    loss.backward()
  File "C:\Users\lukas\anaconda3\envs\Python\lib\site-packages\torch\tensor.py", line 195, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "C:\Users\lukas\anaconda3\envs\Python\lib\site-packages\torch\autograd\__init__.py", line 99, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

now there could be one issue I’m thinking of, my class labels aren’t 0 to 4, they are 1 to 5 does this make any difference ?

Cheers Lukas

KFrank · April 30, 2020, 8:17pm

Hello Lukas!

Lukas_Lansche:

I first tried it with :

y = merged_mask.int().cuda()

but then I got this error:

RuntimeError: Expected object of scalar type Long but got scalar type Int for argument #2 'target' in call to _thnn_nll_loss2d_forward

This makes sense. CrossEntropyLoss requires long for its
target.

This looks like you’re having a cuda / gpu error.

First try running your code entirely on your cpu. (Get rid of any
.cuda() calls so that all of your tensors – model, input, target,
etc. – are on the cpu.) See if you can get this to work.

Then try something simple on the gpu. Create two FloatTensors,
move them to the gpu, and then try to add them together, or the
like.

If this works, but your real code keeps giving cuDNN error, then
there will be more debugging to do.

Yes, this will also be a problem (but it doesn’t explain the cuda
error). The target for CrossEntropyLoss must be (long) integer
class labels that run from 0 to nClass - 1. If they include a
value of nClass (or larger), you will get an error – “index out
of range”, or something similar, I think.

Good luck.

K. Frank

Lukas_Lansche · May 2, 2020, 7:00am

Hi Frank,
Thank you so much for your advice. I got it running!
I did exactly what you said, tried it with the cpu and got the following error:

IndexError: Target 5 is out of bounds.

So then I rewrote my class labels to a range of 0 to nClass - 1 and tried again and it worked for cpu and for cuda. Thank you so much !