Image input and mask target does not have the same size

Hi, good day!

I am training a neural network and my input image is a 4D tensor [batch_size, channel, height, width]. I also had a 4D tensor target but since I got an error saying my target size should only be 3D, i tried mask.squeeze(1) to get rid of the channel index. Now after I did that, I had another error, this time it says my input size and target size mismatch.

This is my code

epochs = 1
for epoch in tqdm(range(epochs)):
    for batch_idx, (img, mask) in tqdm(enumerate(train_gen)):
      img = torch.Tensor(img).view(-1, 3, 500, 500)
      mask = torch.Tensor(mask).view(-1, 1, 500, 500)
      mask = mask.squeeze(1)

      model.zero_grad()

      outputs = model(img)
      loss = loss_function(outputs, mask)
      loss.backward()
      optimizer.step()    # Does the update 

    print(f"Epoch: {epochs}. Loss: {loss}")

and my error is

0%|          | 0/1 [00:00<?, ?it/s]
0it [00:06, ?it/s]
  0%|          | 0/1 [00:06<?, ?it/s]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-13-ad775b024a80> in <module>()
     12       outputs = model(img)
     13       outputs = outputs.squeeze(1)
---> 14       loss = loss_function(outputs, mask)
     15       loss.backward()
     16       optimizer.step()    # Does the update

3 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
   2264         ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
   2265     elif dim == 4:
-> 2266         ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
   2267     else:
   2268         # dim == 3 or dim > 4

RuntimeError: size mismatch (got input: [4, 60, 504, 504] , target: [4, 500, 500]

does anyone have any idea how to solve this? Thank you!

Your model seems to increase the spatial size of the input from 500 to 504, so you should take a look at the layers and make sure the output size is as expected.