I’m using my last NN layer as a softmax layer for outputting a 2D normalised heatmap (probability distribution of the correct pixel in an image). I had to implement this myself for the layer

Assuming this layer is correct (which seems to be), how to I get a cross entropy between my NN output from this layer and the target heatmap (one hot 2D array for the pixel with the right value)?

If I reshape my tensor to use Torch’s current CrossEntropy, will autograd know automatically what to do to differentiate?

Yep. Basically, if do an operation on a Variable and PyTorch doesn’t complain when you do the operation, nor during .backward(), then it is a fairly safe bet that autograd was able to differentiate it properly.

I’m writing this reply for any human in the distant future, in some not-so-distant galaxy, who needs something similar.
The module I ended up using is generic for any number of channels. It takes a (N_BATCHES, N_CHANNELS, WIDTH, DEPTH) tensor of a batch of N_BATCHES images with N_CHANNELS each. It outputs another (N_BATCHES, N_CHANNELS, WIDTH, DEPTH) tensor containing the log probability for each batch and each channel (calculated and normalized across DEPTH and WIDTH)

class SoftmaxLogProbability2D(torch.nn.Module):
def __init__(self):
super(SoftmaxLogProbability2D, self).__init__()
def forward(self, x):
orig_shape = x.data.shape
seq_x = []
for channel_ix in range(orig_shape[1]):
softmax_ = F.softmax(x[:, channel_ix, :, :].contiguous()
.view((orig_shape[0], orig_shape[2] * orig_shape[3])), dim=1)\
.view((orig_shape[0], orig_shape[2], orig_shape[3]))
seq_x.append(softmax_.log())
x = torch.stack(seq_x, dim=1)
return x

Hello I still confuse with how cross entrophy loss in pytorch works in 1D data. Here is my condition:
-I have 3 classes
-Input = (NCW) output=(N,W) --> Input(64,3,640), output=(64,640)
-Actually I have tried to use nn.CrossEntrophyLoss but something wrong with dimension, and then I try to unsqueeze it and treat it as image.

After unsqueeze --> Input = (NCHW) output=(NHW) --> Input(64,3,1,640), output=(64,1,640)
-I follow this implementation:

-However I got this error: RuntimeError: invalid argument 3: only batches of spatial targets supported (3D tensors) but got targets of dimension: 4 at /opt/conda/conda-bld/pytorch-nightly_1538995270066/work/aten/src/THNN/generic/SpatialClassNLLCriterion.c:59

I’m not sure, what Input and output are.
Based on the shapes it seems that Input is actually your model prediction, while output seems to be the target. Is this correct?
Here is a small example of using nn.CrossEntropyLoss for images, e.g. in a segmentation use case:

Hello @ptrblck thank you for fast response. I have see the documentation again and again and finally I am choosing size NCW and target NW so it will be like (64,3,640) and target (64,640) as state in here which using log softmax followed by NLLLoss. However I got this kind of error:

Good to hear it’s working!
However, let’s check the last error message as I have the feeling there might be a silent bug.
The RuntimeError states that your target contains invalid values, i.e. it should contain values in the range [0, n_classes-1].
Basically your target tensor contains the class indices for all samples, not probabilities!

How did you normalize the target to get the probabilities?

Based on the output shape of your model is looks like you are dealing with three classes.
The target tensor should thus contain long values in [0, 2].

Sorry I have tried to produce it but it failed. However I have the same situation yesterday in the jupyter notebook so I still can see the log, but in different size. But I think in here because my target dimension is wrong. So here is the code.

The shapes of x and m look alright in case you have two classes.
Did you fail to reproduce the error? If that’s the case, it’s alright.
I just wanted to make sure I’m not misunderstanding your normalization of the target.

Basically your target tensor contains the class indices for all samples, not probabilities!

Hi, is there any loss function in PyTorch similar to CrossEntropyLoss but takes the ground truth probabilities as input instead of class indices? If not, any suggestion on how to implement it? Thanks!

Hello @ptrblck sorry this maybe out of thread, but if we implement a loos function in new pytorch 1.0 or 0.4 is that necessary to have backward() function?
Suppose I want to implement dice coefficient loss like this:

During the running time it run smoothly with no error, however I just don’t understand is that really do backpropagation and calculating gradient?, how can we check it?
I work in segmentation so I need to sum the loss like loss = entrophy + dice coeff is that really do backpropagation or it just calculate the entrophy loss not the dice since it is not implementing backward() function.

You need to implement the backward function yourself, if you need non-PyTorch operations (e.g. using numpy) or if you would like to speed up the backward pass and think you might have a performant backward implementation for pure PyTorch operations.

Basically, if you just use PyTorch operations, you don’t need to define backward as Autograd is able to track all operations and create the backward pass.
Your method looks fine! If you want to check for gradients, just call dice_coeff.backward() and check some layers for gradients. Something like this should work:

You can just call loss.backward() on the summed loss.
However, is there a reason you use labels.data for the dice loss instead of directly the tensor?
Probably it won’t make any errors, but the usage of .data is not recommended generally.

No problem @ptrblck I just still remember the old pytorch syntax and need to adapt with the version 1.0 now. Still in learning. Thank you so much for explanation .

I have a similar problem statement where each of my pixels at the end of the convolution layer should have a label between (0-9). I wonder, shouldn’t NLLloss2d work for you as well?