Patch based cross entropy loss

I am trying to implement a custom loss function for a variant of ViT, where the output is a prediction for each patch from the original image.

The input is of shape [BxCxHxW] and the label for each image is of shape [BxNxRHxRW] where N is the number of classes, RH and RW are reshaped patch size, this is better understood using the following example:

Input shape is [5x1x512x512] with 4 classes. Computing patches (patch size is 32x32) results in [5x256x1024] (ignoring the class token for now). The output from the transformer is reshaped into [5x4x16x16] which is also the final output of the model.

The label is of shape [5x4x16x16] where the second dimension is a probability vector of size 4 representing the probabilities of each class (this is done for all patches).

I am trying to compute the loss between the output and the label. Using nn.CrossEntropyLoss does not work because of the extra dimension. I created (borrowed from a similar project) a custom loss function:

class PatchCrossEntropy(nn.Module):
    def __init__(self):
        super(PatchCrossEntropy, self).__init__()

    def forward(self, x, target):
        loss = torch.sum(-target * F.log_softmax(x, dim=1), dim=1)
        return loss.mean()
my_criterion = PatchCrossEntropy()
x = np.random.RandomState(42).normal(size=(batch_size,num_patches,num_classes))
x = x/x.sum(axis=2,  keepdims=True)
y = np.random.RandomState(41).normal(size=(batch_size,num_patches,num_classes))
y = y/y.sum(axis=2,  keepdims=True)

label = torch.as_tensor(x)
target = torch.as_tensor(y)

loss1 = my_criterion(label,label)

tensor(-269.7687, dtype=torch.float64)

I expect that the output to be 0 but its not. I think that the problem is because of which dim I am summing from. I also don’t know if I should be using the softmax in the first place because the last layer of the model is a softmax layer, so I think an extra one inside the loss function is not needed.

I can share more code if it helps illustrate the problem more, but this is part of a project so some parts of the code I might not be able to share.

Could you explain why you expect the output to be zero?

To test the loss function, I use the same label as an input and as a target, the output is supposed to be 0 because both are the same.

This is the part I don’t understand. Why do you expect the output of the specific loss function that you have defined in the code above to be zero when both its inputs are identical?

Correct me if im wrong, but as far as I know, CE will output a loss of 0 when both of inputs are the same. Since CE can’t take the dimensions of my tensors as defined above, I am implementing a custom CE loss function that accepts said dimensions. Which is why I am expecting a zero.

The PyTorch implementation of Cross Entropy loss does not accept two identical inputs: it expects the shapes of the input and target to be different. Is this not the CE that you are talking about?

I am an idiot. The reason I thought that the output is supposed to be 0 is because my calculations on the paper were wrong, I just copied the results. Sorry bout the confusion.

Now I don’t know how am I supposed to test the custom function, my labels are not class labels, they are just probability vectors (basically the labels are precomputed probability maps), therefore I don’t have the option of using the original CE loss.

I can generate new labels that match the input to pytorch’s CE but then I get an error because the labels are still of bigger dimension than it supports.

Will something like NLLLoss help?

This might be it, two questions:

  1. I need to compute the loss on the first dim, as this dimension contains the probability maps after reshaping ,NLLLoss does not give me an option to compute the loss on a specific dim against the target.
  2. How do I test the loss function against toy inputs? I need to be sure that its exactly what I need.

Just tested it, I can’t apply it on my labels because of the shape:
only batches of spatial targets supported (3D tensors) but got targets of dimension: 4

It is what I want, I just need to figure out how to implement it on the first dimension.

I am not sure I understand this. If your labels are a [3,4] tensor, what does this translate to? You need to compute the loss on the first row, on the first column, or something else?

You could use random arrays. I did the following to test the loss function from your code above:

x = torch.rand(3, 4, 5)
torch.sum(-x * F.log_softmax(x, dim=1), dim=1)

In the case of [3,4] I would need to compute the loss per row (every row is a dense map). In my case, labels are of shape [5x4x16x16] and I would need to compute the loss along the second dimension.

Something like:

x = torch.rand(5, 4, 16, 16)
for i in range(4):
    print("Computing loss for" + str(x[:, i, :, :]))


Yes, I want to do this operation but vectorised.

Please correct me if I’m mistaken. I’m afraid the concept seems a little strange to me. Assume a tiny dog is laying on the ground. The target label is therefore “dog,” but the majority of the image is “background,” so how do you designate the background region? It appears that dog-label can only be applied to all image patches. Or perhaps you have taken steps to address such a situation.

Correct, in the label generation process, I assign a label to every patch for every class (including background). This is possible because of the dataset I am working with which comes with a segmentation mask that is used during the labelling of every region.