Torch interpolation

I am new to model and deep learning training , In my training section i am trying to find the loss of the segmentation of the image , so in here before cross entropy loss calculation i have used interpolate to downsize the image which came out of the model,

Here the size of the image from the model which came out is (5,36,180,320)
here the 5 is the batch size , 36 is the channel size and the 180, 320 is the h x w

The target image shape is (5,1,180,320)
the 5 is the batch size , 1 is the channel , 180 x 320 is the h x w
the target image has 1 channel value from 0 to 35 which is number of available segmenatation

Now i am downsizing to the image to (1,180,320)
so it can match the target image size but the issue in here is

i used it like this

outs = F.interpolate(out, target.size()[1:] , mode = 'bilinear' , aligin_corners = False).squeeze(dim = 1 )
# out -> is the image
# size is mentioning (1,180,320) 

crit_loss = crit(outs , target.squeeze(dim=1))
# here crit is cross entropy loss
loss += (loss_coeff * crit_loss)

i get the error

Input and the output must have the same number of spatial dimentions, but got input with spatial dimentions of [180,320] and output size of torch.Size([1,180,320]).please provide input tensor

I have tried all the the ways that i came up with nothing worked ,
if i use the sizr as (180,320) inside the interpolate i get the shape outs as 36,180,320 which cause issue at the crit what should i do where it does went wrong, please help me

I don’t understand your use case and why you want to interpolate the model output.
Based on your description the model output uses 36 channels corresponding to the number of classes. The target contains integer values in the expected [0, nb_classes-1] range but uses the unnecessary channel dimension. Remove it via unsqueeze(1) and the loss function should work.

I am trying to develop a multi head model which detects both depth and the segmentation of the image

I am actually trying to do some changes of my own from this repo
I am using the Cityscapes for segmentation purpose and the NYUDV2 for depth so both are seperate disjoint dataset

This is my changed part of code so i can train the dataset seperately

def train_seg(model, opts, crits, dataloader, loss_coeffs=(1.0,), grad_norm=0.0):

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    loss_meter = AverageMeter()
    pbar = tqdm(dataloader)

    for sample in pbar:
        loss = 0.0
        image , target = sample 
        targets = [ target ]

        # FORWARD
        outputs , _  = model(image)
        outputs = [outputs]

        for out, target, crit, loss_coeff in zip(outputs, targets, crits, loss_coeffs):
            # TODO: Increment the Loss
            loss += loss_coeff * crit(
                    out, size=target.size()[1:], mode="bilinear", align_corners=False

        # BACKWARD
        for opt in opts:

        if grad_norm > 0.0:
            torch.nn.utils.clip_grad_norm_(model.parameters(), grad_norm)
        # TODO: Run one step
        for opt in opts:

            "Loss {:.3f} | Avg. Loss {:.3f}".format(
                loss.item(), loss_meter.avg)

The problem occurs at the training part

loss += loss_coeff * crit(F.interpolate(out, size=target.size()[1:], mode=“bilinear”, align_corners=False).squeeze(dim=1), target.squeeze(dim=1),)

For training segmentation alone i have done this changes
My outputs image shape is (5,36,180,320)
and my target image shape is (5,1,180,320)

And i want to down sample the image to shape of (1,180,320)
so that i can calculate the loss with the traget
The loss function used here is cross entropy loss
but here i get this error as

Input and the output must have the same number of spatial dimentions, but got input with spatial dimentions of [180,320] and output size of torch.Size([1,180,320]).please provide input tensor

i tried the unqueeze part but it didnt work as i expect it lead to wrong segmentation.
what is the mistake in here or am i making any mistake that it is wrong way to train please help me with this i have been trying to solve this more than a week i still could not make sense where it did go wrong

As already described, nn.CrossEntropyLoss is used for multi-class classification and segmentation use cases. The output is expected to have the shape [batch_size, nb_classes, *] containing logits while the target is expected to have the shape [batch_size, *] containing class indices in the range [0, nb_classes-1], where * denotes additional dimensions.
Your model output has already the desired shape and I don’t know why you want to reduce the channel dimension since you will lose the logits corresponding to each class.

Thanks for making me understand it , I just understand in the wrong way. My mistake.