I’m training a CNN image classifier. The network classifies 255 x 255 RGB images into five categories numbered 0
to 4
.
But the network is behaving strangely during training. Although the loss function drops smoothly, the model returns identical answers for all the samples in the batch most of the time. Even more strangely, eventually it starts answering only 2.
Here’s a typical training output with batches of 10 images.
LABELS OUTPUT CORRECT
tensor([2, 0, 2, 2, 2, 0, 2, 2, 2, 4]) tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) 2 / 10
tensor([2, 2, 2, 2, 3, 4, 1, 2, 2, 2]) tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) 0 / 10
tensor([2, 2, 2, 0, 2, 4, 3, 1, 2, 2]) tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) 1 / 10
tensor([3, 4, 2, 2, 0, 4, 4, 3, 2, 0]) tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) 2 / 10
tensor([1, 2, 2, 4, 2, 0, 1, 0, 0, 0]) tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) 4 / 10
tensor([2, 2, 2, 3, 2, 0, 0, 1, 2, 2]) tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) 2 / 10
tensor([1, 1, 0, 1, 2, 2, 1, 1, 0, 1]) tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) 2 / 10
tensor([0, 2, 1, 3, 3, 2, 1, 0, 2, 2]) tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) 2 / 10
tensor([2, 3, 2, 2, 3, 1, 0, 1, 0, 2]) tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) 2 / 10
tensor([3, 2, 3, 1, 1, 2, 0, 4, 2, 2]) tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) 1 / 10
tensor([2, 1, 0, 3, 1, 2, 2, 1, 2, 0]) tensor([2, 2, 2, 2, 2, 0, 2, 2, 0, 2]) 2 / 10
tensor([3, 0, 2, 1, 3, 1, 2, 4, 2, 2]) tensor([2, 2, 2, 2, 2, 2, 2, 2, 2, 2]) 4 / 10
tensor([2, 2, 1, 2, 1, 1, 1, 4, 3, 2]) tensor([2, 2, 2, 2, 2, 2, 2, 2, 2, 2]) 4 / 10
# Remaining predictions are always [2, 2, 2...]
# Loss function is not shown, but it declines smoothly and looks well behaved
Although 2
is the most common category in the labels (about 50% of the images), I don’t see why the CNN should ‘concentrate’ on a single answer (0
in the above sample) or always predict 2
at the end.
I expected to get more varied results in the output tensors even if the accuracy wasn’t good enough. What am I doing wrong?
Here’s my code for the network…
class CNN(nn.Module):
def __init__(self, n_layers=3, n_categories=5):
super(CNN, self).__init__()
self.conv1 = nn.Conv2d(n_layers, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.conv3 = nn.Conv2d(16, 16, 5)
self.fc1 = nn.Linear(16 * 28 * 28, 200)
self.fc2 = nn.Linear(200, 84)
self.fc3 = nn.Linear(84, n_categories)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = self.pool(F.relu(self.conv3(x)))
x = x.view(-1, 16 * 28 * 28)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
…the optimizer, loss function and dataloader…
model = CNN()
transforms = v2.Compose([
v2.ToImageTensor(),
v2.ConvertImageDtype(),
v2.Resize((256, 256), antialias=True)
])
dataset = UBCDataset(transforms=transforms)
full_dataloader = DataLoader(dataset, batch_size=10, shuffle=False)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
…and the training loop that produced the above output. Loss function is not shown, but it declines smoothly as expected.
batches = iter(full_dataloader)
print("LABELS OUTPUT CORRECT")
for X, y in batches:
model.train()
pred = model(X)
loss = loss_fn(pred, y)
loss.backward()
optimizer.step()
#optimizer.zero_grad()
print(f"{y} {pred.argmax(1)} {int(sum(y == pred.argmax(1)))} / {len(y)} {loss.item()}")
Even more puzzling, the output from the model (the pred
variable in the training loop) always looks something like this:
tensor([[-0.2310, 0.1805, 0.7584, -0.7285, -0.7594],
[-0.2310, 0.1806, 0.7585, -0.7286, -0.7592],
[-0.2313, 0.1806, 0.7586, -0.7286, -0.7593]],
grad_fn=<AddmmBackward0>)
Any input is appreciated.