Runtime error: input and target batch or spatial sizes don't match


I am trying to fine-tune the last two convolutional blocks of the VGG 16 model. I have added a conv2d layer with 512 filters on top of VGG 16, for visualization of the output. The data for fine-tuning has both RGB images and binary masks. The code for training is like this:

device = t.device(‘cuda:0’ if t.cuda.is_available() else ‘cpu’)

Here is the loss and optimizer definition

criterion = nn.NLLLoss()
optimizer = t.optim.Adam(model.parameters(), 5e-4, (0.9, 0.999), eps=1e-08, weight_decay=1e-4)
start_epoch = 1
steps_loss = 50
my_start_time = time.time()

The training loop

total_steps = len(train_loader)
epochs = 30
print(f"{epochs} epochs, {total_steps} total_steps per epoch")
for epoch in range(epochs):
print(" — — — TRAINING — EPOCH", epoch, " — — -")
epoch_loss = []
time_train = []
for i, (images, masks) in enumerate(train_loader, 1):
start_time = time.time()
images =
masks = masks.type(t.LongTensor)
masks = masks.reshape(masks.shape[0], masks.shape[2], masks.shape[3])
masks =

    # Forward pass
    output1 = model(images)
    softmax = F.log_softmax(outputs1, dim=1)
    loss = criterion(softmax, masks)
    # Backward and optimize
    if steps_loss > 0 and i%steps_loss == 0:
        average = sum(epoch_loss)/len(epoch_loss)
        print('loss: {average:0.4} (epoch: {epoch}, step: {i})', "//Avg time/img: %.4f s" % (sum(time_train)/len(time_train)/batch_size))

        average_epoch_loss_train = sum(epoch_loss)/len(epoch_loss)

It is throwing a runtime error like

RuntimeError Traceback (most recent call last)
29 output1 = model(images)
30 softmax = F.log_softmax(outputs1, dim=1)
—> 31 loss = criterion(softmax, masks)
33 # Backward and optimize

~/yes/lib/python3.7/site-packages/torch/nn/modules/ in call(self, *input, **kwargs)
530 result = self._slow_forward(*input, **kwargs)
531 else:
–> 532 result = self.forward(*input, **kwargs)
533 for hook in self._forward_hooks.values():
534 hook_result = hook(self, input, result)

~/yes/lib/python3.7/site-packages/torch/nn/modules/ in forward(self, input, target)
203 def forward(self, input, target):
–> 204 return F.nll_loss(input, target, weight=self.weight, ignore_index=self.ignore_index, reduction=self.reduction)

~/yes/lib/python3.7/site-packages/torch/nn/ in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
1838 ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
1839 elif dim == 4:
-> 1840 ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
1841 else:
1842 # dim == 3 or dim > 4

RuntimeError: input and target batch or spatial sizes don’t match: target [8 x 360 x 640], input [8 x 512 x 22 x 40] at /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/THCUNN/generic/

Please help me. I am very new to Pytorch. Sorry if my question sounds dumb. Thank you for the help in advance.

A couple of questions

  1. The NLL loss is used for a classification task. Your target is of shape [8 x 360 x 640], Are you trying to do segmentation? Can you clarify your task?
  2. The height and width of your image are not of the same size. Do print out your extension to the VGG model, input shapes and target shapes which would clarify this.
  1. In a way, I am trying to visualize the output of CONV5-3 layer in VGG 16 by up-sampling it. My target is of size [360x640], where 8 in [8x360x640] represents the batch size.
  2. The model that I have built using VGG 16 looks like this.

class net(nn.Module):
def init(self):

    vgg16 = models.vgg16(pretrained=True)
    encoder = list(vgg16.features.children())[:-1]
    self.encoder = nn.Sequential(*encoder)
    #for param in encoder.parameters():
        #param.requires_grad = False
    self.decoder = nn.Conv2d(512,1,1,padding=0,bias=False)
def forward(self,x):
    e_x = self.encoder(x)
    d_x = self.decoder(e_x)
    d_x = nn.functional.interpolate(d_x,size=(480,640),mode='bilinear',align_corners=False)
    d_x = d_x.squeeze(1)
    mi = t.min(d_x.view(-1,480*640),1)[0].view(-1,1,1)
    ma = t.max(d_x.view(-1,480*640),1)[0].view(-1,1,1)
    n_x = (d_x-mi)/(ma-mi)
    return e_x,n_x

You have returned e_x,n_x in the model. However, when you call the model output_1 = model(images). Here output1 corresponds to e_x (shape : 8,512,22,40) and not n_x, which is what you would have to give to your loss. Try returning the required value as _,output_1 = model(images).

@charan_Vjy I tried changing it to _,output_1 = model(images), but it is showing up the same error.

Can you print out the shape of output_1?

The shape of output1 is [8, 480, 640] and the shape of softmax is [8, 512, 22, 40].

You are taking the logsoftmax of outputs1 rather than output1. Also remember input to logsoftmax has to be of the shape [8, no_of_classes, 480, 640]

Actually, I have masks as images of size 480x640. Model() also returns tensors of shape [480, 640]. Now, I am trying to train the model with a set of training images along with its corresponding masks. Please let me know if my implementation is fine or if I have to use any other loss function?
I also tried giving outputs1 directly to criterion() function. Then, it shows this error: ValueError: Expected target size (8, 640), got torch.Size([8, 360, 640])

Yeah, Do not use the NLL Loss. It is better to use the MSE Loss fucntion. Replace NLL with torch.nn.MSELoss. When you are using MSELosss, remove the LogSoftmax function. Note that the sizes of target and output_1 would have to be the same.

Yeah, I have used MSELoss() function in place of NLLLoss. However, there is still a problem with loss.backward(). It throws this error.

RuntimeError Traceback (most recent call last)
38 # Backward and optimize
39 optimizer.zero_grad()
—> 40 loss.backward()
41 optimizer.step()

~/yes/lib/python3.7/site-packages/torch/ in backward(self, gradient, retain_graph, create_graph)
193 products. Defaults to False.
194 “”"
–> 195 torch.autograd.backward(self, gradient, retain_graph, create_graph)
197 def register_hook(self, hook):

~/yes/lib/python3.7/site-packages/torch/autograd/ in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
97 Variable._execution_engine.run_backward(
98 tensors, grad_tensors, retain_graph, create_graph,
—> 99 allow_unreachable=True) # allow_unreachable flag

RuntimeError: expected dtype Float but got dtype Long

I solved the issue. Thank you.

May I ask that how do you solve the issue?

@Yoga_Hu If you are asking about the “input and target batch don’t match error”, I have used MSE loss instead of NLL loss. If you are asking about the error: “RuntimeError: expected dtype Float but got dtype Long”, I type casted the model outputs to float using .float(). This helped me in solving the problem.

Thanks a lot! I asked about the “input and target batch don’t match error”. I will try the MSE loss.