CrossEntropyLoss

Michal_Zelenak · July 5, 2020, 5:09pm

HI,
I’m doing road segmentation in lidar points ( i am trying to repeat method described here - https://ieeexplore.ieee.org/abstract/document/7995848 ).
There is 400x200 grid, into which i separate the points(those which are in selected area), and after that 6 statistics about each cell is generated - 6x400x200 is the result.
Authors send me the ground truth. I then projected classes from GT to the equivalent points in point cloud, which for bigger dataset i then rotated or flipped by x-axis. In the GT there are 3 classes: 0-road, 1 - not road, 2 - not known(camera from which was this GT created did not seen this area, but lidar have this data). Also i am using this ‘2’ class for points, which after rotation i cannot assign to 0 or 1 class.
I tried also compare generated GT from 0 rotation, with original GT and it was same.
GT have values {0,1,2}, and i tried also {0,1,2,3-points that was added by rotation}
So, i pass this input (6x400x200) through the network, output from network is Cx400x200, where c should be the number of classes(so 3). And then i am using crossEntropyLoss.
CrossEntropyLoss in it’s docs have argument ignore_index and i want to ask - should i set ignore_index to value 2(to value that i do not want to be counted into loss)?(because those are points that i do not know if are road or are not road). Do i understand right this parameter using? Because word index is a bit confusing for me (i would think that it is some position in input for crossEntropyLoss to ignore in tensor)
Also if i do not use this parameter i am getting error like this:

Traceback (most recent call last):
  File "./baseCNN.py", line 205, in <module>
    loss.backward() #see doc
  File "/home/michal/miniconda3/lib/python3.7/site-packages/torch/tensor.py", line 198, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/michal/miniconda3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 100, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED (createCuDNNHandle at /pytorch/aten/src/ATen/cudnn/Handle.cpp:9)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x46 (0x7f92384d5536 in /home/michal/miniconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x1099f38 (0x7f91d13f0f38 in /home/michal/miniconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #2: at::native::getCudnnHandle() + 0xe54 (0x7f91d13f2714 in /home/michal/miniconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #3: <unknown function> + 0xf1325c (0x7f91d126a25c in /home/michal/miniconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xf142f1 (0x7f91d126b2f1 in /home/michal/miniconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #5: <unknown function> + 0xf1832b (0x7f91d126f32b in /home/michal/miniconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #6: at::native::cudnn_convolution_backward_input(c10::ArrayRef<long>, at::Tensor const&, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, long, bool, bool) + 0xb2 (0x7f91d126f882 in /home/michal/miniconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0xf7f3a0 (0x7f91d12d63a0 in /home/michal/miniconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #8: <unknown function> + 0xfc3c38 (0x7f91d131ac38 in /home/michal/miniconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #9: at::native::cudnn_convolution_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, long, bool, bool, std::array<bool, 2ul>) + 0x4fa (0x7f91d1270f1a in /home/michal/miniconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #10: <unknown function> + 0xf7f6cb (0x7f91d12d66cb in /home/michal/miniconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #11: <unknown function> + 0xfc3c94 (0x7f91d131ac94 in /home/michal/miniconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #12: <unknown function> + 0x2c809b6 (0x7f920f4619b6 in /home/michal/miniconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #13: <unknown function> + 0x2cd0444 (0x7f920f4b1444 in /home/michal/miniconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #14: torch::autograd::generated::CudnnConvolutionBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) + 0x378 (0x7f920f079918 in /home/michal/miniconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #15: <unknown function> + 0x2d89c05 (0x7f920f56ac05 in /home/michal/miniconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #16: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&) + 0x16f3 (0x7f920f567f03 in /home/michal/miniconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #17: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&, bool) + 0x3d2 (0x7f920f568ce2 in /home/michal/miniconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #18: torch::autograd::Engine::thread_init(int) + 0x39 (0x7f920f561359 in /home/michal/miniconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #19: torch::autograd::python::PythonEngine::thread_init(int) + 0x38 (0x7f921ba8c998 in /home/michal/miniconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #20: <unknown function> + 0xc819d (0x7f922561a19d in /home/michal/miniconda3/bin/../lib/libstdc++.so.6)
frame #21: <unknown function> + 0x76db (0x7f923dddc6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #22: clone + 0x3f (0x7f923db0588f in /lib/x86_64-linux-gnu/libc.so.6)

I am getting accuracy around 57% but i do not understand if i am using this parameter right way, i also tried looking into the implementation but it did not helped me.
In case that ignore_index is not suitable parameter for me, is there some loss function that can do that(ignore some values of GT while computing loss)

chetan_patil · July 6, 2020, 4:03am

So basically, you want to ignore the loss for the class-2, right ?
In that case, you can make use of the argument class_weights in nn.CrossEntropyLoss() by setting it to torch.Tensor([1,1,0]).
https://pytorch.org/docs/master/generated/torch.nn.CrossEntropyLoss.html

Michal_Zelenak · July 6, 2020, 8:33am

Really thank you that helped me,
But there still persists this error i wrote above.
I also used this:

print(result.shape,outputFromNetwork.shape)
loss = criterion(result,outputFromNetwork)

and get this:
torch.Size([1, 3, 400, 200]) torch.Size([1, 400, 200])
and then while using critterion i get error mentioned above.
Am i doing something wrong with dimensions, or with cuda, or something else? I really do not know what am i doing wrong.

chetan_patil · July 6, 2020, 8:39am

Could you try if it doesn’t throw any error without the GPU ?
Also could you check if the target spans between 0 to 2 ?

Michal_Zelenak · July 6, 2020, 7:47pm

So i was doing something wrong with cuda on the server side.
To the weight parameter. It is good for me, but if i am understand it right way, it just do not count to the loss the output from the model that represents the ‘2’ class.
Do exists something, or is there some way, that loss would not be counted over this places where gt is ‘2’?
Now if i am understand what this parameter do, it is something like (class0OutputFromModel difference of Gt(where value is 0)) + (class1OutputFromModel difference of Gt(where value is 1)).
But at places where in GT is class ‘2’ i do not know if there should be class 1 or 2.
Or am i wrong and it is doing exactly what i want to achieve?

chetan_patil · July 7, 2020, 5:04am

So, you want the loss such that, if GT appears as 2, the loss for that node should be zero.
The weight parameter pretty much does the same thing that you want.
Here’s an example. The preds could be any 4- nodes out of the (400,200)

preds = torch.Tensor([[0.45,0.2,0.35],
                   [0.1,0.5,0.4],
                   [0.3,0.2,0.5],
                   [0.1,0.75,0.15]])

gt = torch.Tensor([0,1,2,2]).long()

criterion = nn.CrossEntropyLoss(weight = torch.Tensor([1,1,0]), reduction='none')
loss = criterion(preds, lab)
print(loss)

would give

tensor([0.9872, 0.9459, 0.0000, 0.0000])

As you see in the above code-block, the loss for a node whose ground-truth is 2 is 0, which means that if you use criterion = nn.CrossEntropyLoss(weight = torch.Tensor([1,1,0])) , it would yeild a scalar loss equivalent to 0.9665, the average of [0.9872, 0.9459], instead of yielding a scalar loss of 0.48335, the average of [0.9872, 0.9459, 0, 0].
This is what you want right ? Or am I missing something ?

Michal_Zelenak · July 7, 2020, 11:22am

Hi,
so it is doing exactly what i want, you are right, really thank you.
Also i tried it in my demo and i found that it is not counting loss where gt is set to this class, but it is counting loss from this place in input for the CrossEntropyLoss : [2][…][…]. If am i understand it right it is prediction for class 2, which i think is useless for me. Is there way, that i would have 3 classes, but for the loss, i would count only prediction from 2 classes?
Or should i change output from model, so it would not have 3 predictions tensors but only two? Will be then GT with 3 classes work?

I mean that:

preds = torch.Tensor(
        [[[[0.45,0.2,0.0],[0.1,0.9995,0.4],[0.3,0.2,0.5] ],
            [[0.5,0.6,0.0],[0.6,0.4,0.3],[0.0,0.1,0.9]],
            [[0.4,0.22,0.0],[0.51,0.65,0.94],[0.23,0.52,0.56]],
       ]]).float()

gt = torch.Tensor(
    [[[0,1,2],[1,2,0],[0,0,0]]
    ]).long()
criterion = torch.nn.CrossEntropyLoss(weight = torch.Tensor([1,1,0]), reduction='none')
loss = criterion(preds, gt)

is now counting same result as if the preds tensor would be

preds = torch.Tensor(
        [[[[0.45,0.2,0.5],[0.1,0.1,0.4],[0.3,0.2,0.5] ],
            [[0.5,0.6,0.9],[0.6,0.0,0.3],[0.0,0.1,0.9]],
            [[0.4,0.22,0.9],[0.51,0.0,0.94],[0.23,0.52,0.56]],
       ]]).float()

which is really great for me.
But what i would like to achieve for my work is that it would create same result also if it would be:

preds = torch.Tensor(
        [[[[0.45,0.2,0.5],[0.1,0.1,0.4],[0.3,0.2,0.5] ],
            [[0.5,0.6,0.9],[0.6,0.0,0.3],[0.0,0.1,0.9]],
            [[0.0,0.9,0.0],[0.0,0.0,0.3],[0.0,0.0,0.0]], #for example
       ]]).float()

chetan_patil · July 7, 2020, 1:12pm

You will not get the same value of loss.
Because, nn.CrossEntropyLoss() will aggregate the values across all the channels.
Refer to this doc, https://pytorch.org/docs/master/generated/torch.nn.CrossEntropyLoss.html
If you scroll down, there will be a formula with
loss(x,class) = −x[class]+log( j∑ exp(x[j]))

The values which goes inside the summation are different, hence the loss won’t be same.
For example, considering the first element of 2nd and 3rd example,
- 0.45 + sum(exp([0.45, 0.5, 0.4])) is different than
- 0.45 + sum(exp([0.45, 0.5, 0.0])) .
And this really affects the value of loss.