I am transfer learning pretrained ‘swin_tiny_patch4_window7_224’ model, with input image size 224x224x3
I have modified the models input and output layer:
# Modify the model's input and output layers
model.head = nn.Linear(model.head.in_features, 11)
Training code snippet:
# Training loop
for epoch in range(num_epochs):
train_correct = 0
train_total = 0
train_loss = 0.0
# Training phase
model.train()
for images, labels in train_loader:
images = images.to(device)
print(images.shape)
#labels2 = torch.argmax(labels, dim=1)
labels = labels.to(device)
print(labels.shape)
optimizer.zero_grad()
outputs = model(images)
_, predicted = torch.max(outputs.data, 1)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
train_loss += loss.item()
train_total += labels.size(0)
train_correct += (torch.argmax(outputs, dim=1) == labels).sum().item()
# Calculate training accuracy and loss
train_accuracy = train_correct / train_total
train_loss /= len(train_loader)
Images shape: torch.Size([32, 3, 224, 224])
labels shape: torch.Size([32])
outputs.shape: torch.Size([32, 7, 7, 11])
from above code.
I assume you are using nn.CrossEntropyLoss
as your criterion
.
If so, it seems you are working on a segmentation use case since your model output is a 4D tensor most likely in the shape [batch_size, nb_classes, height, width]
. In this case the target should be a 3D tensor in the shape [batch_size, height, width]
containing class indices in the range [0, nb_classes-1]
.
If you are not working on a segmentation use case the output shape of your model would be wrong.
I am working on classification case with 11 classes:
I have defined the input and output:
# Modify the model's input and output layers
model.head = nn.Linear(model.head.in_features, 11)
# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)
Is there something that I am doing wrong?
I don’t know how the model is defined but would guess you are removing needed layers via:
model.head = nn.Linear(model.head.in_features, 11)
Check what model.head
used before you replaced it with a single linear layer as I think it might have used a flattening operation making sure the output of the model has a shape of [batch_size, nb_classes]
.
Before: modification
(norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(head): ClassifierHead(
(global_pool): SelectAdaptivePool2d (pool_type=avg, flatten=Identity())
(drop): Dropout(p=0.0, inplace=False)
(fc): Linear(in_features=768, out_features=1000, bias=True)
(flatten): Identity()
)
After modification:
(norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(head): Linear(in_features=768, out_features=11, bias=True)
I would again guess that SelectAdaptivePool2d
might be flattening the activation, but based on the code flatten
should be a bool
value while your code seems to pass an Identity
module to it, so you might need to dig into the code and check the internal shapes to see if the activations are indeed flattened.
If so, this op would be missing from your code and would explain the errors.
1 Like