RuntimeError: input and target batch or spatial sizes don't match

Hello, I am trying on a U-net implementation and I receive this error, I try to change sizes and apply combinations in order to work it but I have nothing. Here is the error:

  0%|          | 0/675 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "C:\Users\ASUS\AppData\Roaming\Python\Python37\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-14-e260720ac9e3>", line 11, in <module>
    loss = criterion(outputs, batch_y)
  File "C:\Program Files\Python37\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Program Files\Python37\lib\site-packages\torch\nn\modules\loss.py", line 916, in forward
    ignore_index=self.ignore_index, reduction=self.reduction)
  File "C:\Program Files\Python37\lib\site-packages\torch\nn\functional.py", line 2021, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "C:\Program Files\Python37\lib\site-packages\torch\nn\functional.py", line 1840, in nll_loss
    ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: input and target batch or spatial sizes don't match: target [50 x 1 x 2], input [50 x 1 x 64 x 64] at C:/w/1/s/windows/pytorch/aten/src\THCUNN/generic/SpatialClassNLLCriterion.cu:23

My train function:

def train(net_):
BATCH_SIZE = 50  # first thing to modify
EPOCHS = 3
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.99)
criterion = nn.CrossEntropyLoss()
for epoch in range(EPOCHS):
    for i in tqdm(range(0, len(train_X), BATCH_SIZE)):
        batch_X = train_X[i:i + BATCH_SIZE].view(-1, 1, 64, 64)
        batch_y = train_y[i:i + BATCH_SIZE].view(50, 1, -1)
        batch_X, batch_y = batch_X.to(device), batch_y.to(device)

        optimizer.zero_grad()  # for specific parts of network
         net.zero_grad(), generally optimizer.zero_grad()
        outputs = model(batch_X)
        loss = criterion(outputs, batch_y)
        loss.backward()
        optimizer.step()

    print(f"Epoch: {epoch}. Loss: {loss}")
train(model)
batch_X.size()
Out[15]: torch.Size([50, 1, 64, 64])

batch_y.size()
Out[16]: torch.Size([50, 1, 2])

outputs.size()
Out[17]: torch.Size([50, 1, 64, 64])

When I removed .view(50, 1, -1); the error is:

 0%|          | 0/675 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "C:\Users\ASUS\AppData\Roaming\Python\Python37\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-6-c11297e71625>", line 15, in <module>
    loss = criterion(outputs, batch_y)
  File "C:\Program Files\Python37\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Program Files\Python37\lib\site-packages\torch\nn\modules\loss.py", line 916, in forward
    ignore_index=self.ignore_index, reduction=self.reduction)
  File "C:\Program Files\Python37\lib\site-packages\torch\nn\functional.py", line 2021, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "C:\Program Files\Python37\lib\site-packages\torch\nn\functional.py", line 1840, in nll_loss
    ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: 1only batches of spatial targets supported (non-empty 3D tensors) but got targets of size: : [50, 2]

Thank you already ^^

I assume you are working on a multi-class segmentation use case, since you are using nn.CrossEntropyLoss.
If that’s the case, your output should have the shape [batch_size, nb_classes, height, width].
While the number of dimensions is correct, it seems you are only dealing with a single class.
Also, the target is expected to have the shape [batch_size, height, width] and contain the class indices in the range [0, nb_classes-1], while your target has the shape [batch_size, 1, 2].

Also, if you are working on a binary segmentation use case, you should either increase the number of output channels of the last conv to 2 or use nn.BCEWithLogitsLoss instead.

I am working on multi-class segmentation.
I tried to equalize height and width, but I could’t because “it is invalid for input of size”:

I think, what my problem is about defining train_y. I am new these concepts, I kind of did bungee jumping into deep learning. So I am missing very fundamental things, I suppose.
here is my previously defined train_y and train_x :

tdata = np.load("D:\\Neural_Networks\\coursera_v3\\coursera\\training_data.npy", allow_pickle=True)
# print(len(training_data))


X = torch.Tensor([i[0] for i in training_data]).view(-1, 64, 64)
X = X / 255.0
y = torch.Tensor([i[1] for i in training_data])

VAL_PCT = 0.1
val_size = int(len(X) * VAL_PCT)
print(val_size)

train_X = X[:-val_size]
train_y = y[:-val_size]

# test_X = X[:-val_size]
# test_y = y[:-val_size]
train_y = train_y.long()
>  train_X.size()
>  torch.Size([33748, 64, 64])
>  train_y.size()
>  torch.Size([8750, 2])

By the way, data is images of dogs:

> training_data.size
> Out[6]: 24998
> training_data.ndim
> Out[7]: 2

/// Note: I think, it was a mistake trying to convert from a CNN architecture :smiley: ; data was set for that one. Yet, it is a good practice and I am stuck. Thank you.

Based on the shape of train_y, your target might be one-hot encoded.
If that’s the case, you could transform it to the right shape and values via:

train_y = torch.argmax(train_y, 1)
1 Like