RuntimeError: only batches of spatial targets supported (3D tensors) but got targets of size: : [32]- Swap

I am transfer learning pretrained ‘swin_tiny_patch4_window7_224’ model, with input image size 224x224x3

I have modified the models input and output layer:

# Modify the model's input and output layers

model.head = nn.Linear(model.head.in_features, 11)

Training code snippet:

# Training loop
for epoch in range(num_epochs):
    train_correct = 0
    train_total = 0
    train_loss = 0.0
    # Training phase
    for images, labels in train_loader:
        images =
        #labels2 = torch.argmax(labels, dim=1)
        labels =

        outputs = model(images)
        _, predicted = torch.max(, 1)
        loss = criterion(outputs, labels)

        train_loss += loss.item()
        train_total += labels.size(0)
        train_correct += (torch.argmax(outputs, dim=1) == labels).sum().item()

    # Calculate training accuracy and loss
    train_accuracy = train_correct / train_total
    train_loss /= len(train_loader)

Images shape: torch.Size([32, 3, 224, 224])
labels shape: torch.Size([32])
outputs.shape: torch.Size([32, 7, 7, 11])
from above code.

I assume you are using nn.CrossEntropyLoss as your criterion.
If so, it seems you are working on a segmentation use case since your model output is a 4D tensor most likely in the shape [batch_size, nb_classes, height, width]. In this case the target should be a 3D tensor in the shape [batch_size, height, width] containing class indices in the range [0, nb_classes-1].

If you are not working on a segmentation use case the output shape of your model would be wrong.

I am working on classification case with 11 classes:
I have defined the input and output:

# Modify the model's input and output layers

model.head = nn.Linear(model.head.in_features, 11)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)

Is there something that I am doing wrong?

I don’t know how the model is defined but would guess you are removing needed layers via:

model.head = nn.Linear(model.head.in_features, 11)

Check what model.head used before you replaced it with a single linear layer as I think it might have used a flattening operation making sure the output of the model has a shape of [batch_size, nb_classes].

Before: modification

(norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(head): ClassifierHead(
(global_pool): SelectAdaptivePool2d (pool_type=avg, flatten=Identity())
(drop): Dropout(p=0.0, inplace=False)
(fc): Linear(in_features=768, out_features=1000, bias=True)
(flatten): Identity()

After modification:
(norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(head): Linear(in_features=768, out_features=11, bias=True)

I would again guess that SelectAdaptivePool2d might be flattening the activation, but based on the code flatten should be a bool value while your code seems to pass an Identity module to it, so you might need to dig into the code and check the internal shapes to see if the activations are indeed flattened.
If so, this op would be missing from your code and would explain the errors.

1 Like