1only batches of spatial targets supported (non-empty 3D tensors) but got targets of size: : [1, 3, 375, 1242]


(Xiaoyu Song) #1

My batch size is 1, my image is 375 *1242 . *3, I’ve changed the numpy image to tensor image as 3 . 375 . 1242.
When I call

criterion = nn.CrossEntropyLoss().cuda()
loss = criterion(outputs, labels.long())

The error is showing as the tiltle:

RuntimeError: 1only batches of spatial targets supported (non-empty 3D tensors) but got targets of size: : [1, 3, 375, 1242]

I can’t figure out where is the problem, someone can help me? Thank you.

Here is the more detailed codes:

 for epoch in range(2):
    # loop over the dataset multiple times
    running_loss = 0.0
    for i, data in enumerate(dataloader):
        #get the inputs
        inputs, labels = data['image'], data['semantic']
        print(inputs.size())
        print(labels.size())
        # put to GPU
        labels = labels.cuda().float()
        #zero the parameter gradients
        optimizer.zero_grad()
        
        #forward + backward +optimize
        outputs = unet(inputs.cuda())
        outputs.cuda()
        print(outputs.size())
        print(labels.long().size())
        
        loss = criterion(outputs, labels.long())
        loss.backward()
        optimizer.step()
        
        #print statistics
        running_loss += loss.item()
        if i%20 ==19: 
            # print every 20 mini-batchs
            print('[%d, %5f] loss: %.3f'%(epoch +1, i+1, running_loss/20))
            running_loss = 0.0
print('Finish Training')

(Xiaoyu Song) #2

I check the documentation of the nn.CrossEntropyLoss and I find that for the target, the dimension is N H W, so in my case, I use a RGB mask, which doesn’t work.
When I change my target dataset to grayscale images., it works fine.
I would like to know if I want to use a RGB image as target, which loss should I use instead?
Thank you


#3

nn.CrossEntropyLoss is usually used for classification use cases.
It seems you are trying to reconstruct the image somehow. If your targets are normalized tensors with values in [0, 1], you could use nn.BCELoss.


(Xiaoyu Song) #4

Yes, I’m trying to do an image segmentation with the image and the image-mask. I will try your recommendation, thank you :wink:


#5

Thanks for the info.
In that case I would stick to a classification criterion.
Could you post a sample mask with its values?
Usually your mask should have the shape [batch_size, h, w] and contain the class indices for each pixel.


(Xiaoyu Song) #6

in fact, my sample mask have the shapebatch_size, Channel, h, w, the same size as the sample image. so the problem is the channel here. because my sample mask is a RGB image. In this case, which Loss Function should I use?


#7

I guess your mask has some kind of color code in RGB, e.g. red ([255, 0, 0]) means car, while blue ([0, 0, 255]) means building.
If that’s the case, could you post a sample image or post the mapping directly, since we would need to transform the RGB color-coded mask into a class index mask.


(Xiaoyu Song) #8

hi, thanks for replying.
I have two types of mask image. One is in RGB, the other is not.

Now I encounter a new problem, do I need to normalize my image with values in [0,1]

because I have tried to build a Unet, and every time after training, the predicted mask is always a red image.
Do you know where possibly is the problem?


#9

Your input can and should probably be normalized to properly train the model.
The mask however, should most likely not be normalized, as it contains some kind of class information.
How is the other mask stored if not as RGB? Does it store the class indices directly for each pixel?

Could you post some values and the shape of an example mask you are currently using?


(Xiaoyu Song) #10

Sure, here is one sample of my RGB mask:
[[[107 142 35]
[107 142 35]
[107 142 35]

[ 70 130 180]
[ 70 130 180]
[ 70 130 180]]

[[107 142 35]
[107 142 35]
[107 142 35]

[ 70 130 180]
[ 70 130 180]
[ 70 130 180]]

[[107 142 35]
[107 142 35]
[107 142 35]

[ 70 130 180]
[ 70 130 180]
[ 70 130 180]]

[[128 64 128]
[128 64 128]
[128 64 128]

[128 64 128]
[128 64 128]
[128 64 128]]

[[128 64 128]
[128 64 128]
[128 64 128]

[128 64 128]
[128 64 128]
[128 64 128]]

[[128 64 128]
[128 64 128]
[128 64 128]

[128 64 128]
[128 64 128]
[128 64 128]]]

The shape is (375, 1242, 3)


(Xiaoyu Song) #11


This is one example of my mask


#12

Thanks for the example.
Are you using these RGB values directly for your classes?
If so, I would recommend to use a mapping such that each pixel contains only a class index in the range [0, nb_classes-1].


(Xiaoyu Song) #13

Sorry I’m new to pytorch, what do you mean by mapping?
I put my code link below if you are interested.


#14

I mean something like a key value pair between your color codes in RGB and the corresponding class index.
E.g. [128, 64, 128] would map to class0.
Have a look at this post for another example using grayscale images.
Could you post the classes for each separate color in your segmentation mask?


(Xiaoyu Song) #15

Thank you for your explanation, I’m using the KITTI semantic segmentation datasets. which are conform with The Cityscapes Dataset, it has 30 classes. But I didn’t find the key value pair between my colour codes and the corresponding class index.
I have seen the post you mentioned, and I don’t know the class mapping value neither, could you help?


#17

I’m not sure where to find the mapping. Here it seems a mapping is given for 11 classes.
However, if you can’t find the right mapping, you could also just get all unique color codes and create your own mapping.


(Xiaoyu Song) #18

Thanks a lot for the effort you put. I really appreciate it. At the moment I am getting CUDA error: out of memory. I will sort out memory issue and will update you.


(Xiaoyu Song) #19

Sorry to bother again, I have more questions:

  1. In the unet example, if I do the image mapping, for example, if I have 10 classes for the labels, the last layer of the network is log_softmax, does it mean that the output of the network is the probability map of of every pixel.
  2. If so, after using the NLLLoss and the Adam optimisation, the weight in the network is optimized, now I feed the network with a random training image, the output is a probability map, if I want to visualize it, I need to simply imshow(output) or I have to remap it back?

#20
  1. Yes, you will get the log probabilities with each channel corresponding to the class index.

  2. plt.imshow should work. I would try to transform the output using torch.exp to get the probabilities in the range [0, 1], since the colormap might look more “natural”.


(Xiaoyu Song) #21

Hi, I’m wondering how to create the mapping, I saw that your mapping code in the post, how did you find the mapping relationship?
And how to create my own mapping? Do you mean that I should check all color in my mask image and then find the html code :open_mouth: