Classifier does the wrong prediction on single images

Hello,

I am very new to pytorch and DNNs in general.
I’ve created a classifier on top of a pretrained densenet161, to classify images of flowers, into the groups: daisy(0), dandelion(1), rose(2), sunflower(3) and tulip(4).

The training-process works fine and in order to test the model I went on implementing the example from chaper 5 of the tutorial from the PyTorch site.

All this seems to work, the classifier predicts a defined set of images with an acceptable performance but the next thing I wanted to do, was to test it with a single image.
The first step was to simply set the batch-size to one but at this point the output of my network is wrong.

49
This is the output tensor of the working example, the index of these classes is mapped like shown above.

After this worked, i tried to load a single image like this:

test_transforms = transforms.Compose([
                          transforms.Resize((224,224)), 
                          transforms.ToTensor(),
                          transforms.Normalize([0.485, 0.456, 0.406],
                             [0.229, 0.224, 0.225])])
def make_image(image):
    image_tensor = test_transforms(image)
    image_tensor.unsqueeze_(0)
    return image_tensor

img = make_image(Image.open("path-to-a-sunflower"))
output = torch.exp(model.forward(img)) # exp because of LogSoftMax as output layer 
print(output)
val, pos = out.topk(1,dim=1)

Via the topk I wanted to get the position of the highest value in order to display the class.
The output of the model was:
tensor([[0.1126, 0.2525, 0.1020, 0.0674, 0.4655]], grad_fn=)
which means that it has classified the picture as a tulip.
The strange thing is, that if I try the same images within a dataloader with identical transformation, the results are just right.

I have no idea where to start with this problem, so I wonder if there is a reason for this behavior and, in the best case a solution.

To provide more information I will post some of the Code here, I hope this helps.

Classifier, Criterion and Optimizer:

   model.classifier = nn.Sequential(nn.Linear(model.classifier.in_features, 512),
                                     nn.ReLU(),
                                     nn.Dropout(0.2),
                                     nn.Linear(512, 265),
                                     nn.ReLU(),
                                     nn.Linear(265,5),
                                     nn.LogSoftmax(dim=1))
    criterion = nn.NLLLoss()
    optimizer = optim.Adam(model.classifier.parameters(), lr=0.003)

The Epoch Loop

for epoch in range(epochs):
        for inputs, labels in tqdm_notebook(trainloader,desc="Training batches"):
            steps += 1
            inputs, labels = inputs.to(device), labels.to(device)
            optimizer.zero_grad()
            logps = model.forward(inputs)
            loss = criterion(logps, labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()

            if steps % print_every == 0:
                test_loss = 0
                accuracy = 0
                model.eval()
                with torch.no_grad():
                    for inputs, labels in tqdm_notebook(testloader,desc="Testing batches",leave = False):
                        inputs, labels = inputs.to(device), labels.to(device)
                        logps = model.forward(inputs)
                        batch_loss = criterion(logps, labels)
                        test_loss += batch_loss.item()

                        ps = torch.exp(logps)
                        top_p, top_class = ps.topk(1, dim=1)
                        equals = top_class == labels.view(*top_class.shape)
                        accuracy += torch.mean(equals.type(torch.FloatTensor)).item()
                # plotting
                train_losses.append(running_loss / len(trainloader))
                test_losses.append(test_loss / len(testloader))
                print(f"Epoch {epoch + 1}/{epochs}.. "
                  f"Train loss: {running_loss / print_every:.3f}.. "
                  f"Test loss: {test_loss / len(testloader):.3f}.. "
                  f"Test accuracy: {accuracy / len(testloader):.3f}")
                running_loss = 0
                model.train()

I hope these informations could describe my problem and I appreciate any idea concerning this topic. Thank you very much in advance.

Did you also call model.eval() before testing the single image?

1 Like

That does the trick!
In my first model I didn’t do that but maybe it had not that much of an impact than in the model here, since I use a dropout-layer.

Thank you very much for your answer!

1 Like

Many freshmen just started using pytorch and didn’t know to use model.train() and model.eval(). If you train the model, you can add a model.train() in front. If you test the model, you must add a model.eval().

1 Like

Hey i’m having a similar problem even while using model.eval()

The inference results are good with 90% testing accuracy but when I try to predict with a single input it always outputs 0.
i have two classes and i tried to output the results of each prediction but it is kind of weird because the 0th class output is always positive and 1st class output is negative

tensor([[ 687.3718, -580.9390]], device='cuda:0')
tensor([[ 578.3836, -362.3758]], device='cuda:0')
tensor([[ 507.2051, -494.3643]], device='cuda:0')
tensor([[ 623.5096, -523.1122]], device='cuda:0')
tensor([[ 610.4017, -603.3063]], device='cuda:0')
tensor([[ 711.8994, -506.7224]], device='cuda:0')
tensor([[ 766.9221, -555.6006]], device='cuda:0')
tensor([[ 1092.4576, -1050.7422]], device='cuda:0')
tensor([[1106.5111, -977.1776]], device='cuda:0')
tensor([[1168.5472, -978.1863]], device='cuda:0')

my inference function is

def inference(model, val_dl):
    confmat_fn = torchmetrics.ConfusionMatrix(task="binary", num_classes=N_CLASSES).to(device)
    acc_fn = BinaryAccuracy().to(device)
    precision_fn = BinaryPrecision().to(device)
    speci_fn = BinarySpecificity().to(device)
    f1_fn = BinaryF1Score().to(device)

    model.eval()
    with torch.inference_mode():
        for data in val_dl:
            inputs, labels = data[0].to(device), data[1].to(device)
            inputs_m, inputs_s = inputs.mean(), inputs.std()
            inputs = (inputs - inputs_m) / inputs_s
            outputs = model(inputs)
            _, prediction = torch.max(outputs, 1)

            acc_fn.update(prediction, labels)
            precision_fn.update(prediction, labels)
            speci_fn.update(prediction, labels)
            f1_fn.update(prediction, labels)
            confmat_fn.update(prediction, labels)
        
        acc = acc_fn.compute().item()
        precision = precision_fn.compute().item()
        speci = speci_fn.compute().item()
        f1 = f1_fn.compute().item()
        confmat = confmat_fn.compute()
        print(f'Accuracy: {acc:.2f}, Precision: {precision:.2f}, Specificity: {speci:.2f}, F1 Score: {f1:.2f}')
        cm_display = metrics.ConfusionMatrixDisplay(confmat.cpu().numpy(), display_labels=['Normal', 'Abnormal'])
        cm_display.plot()

and my predicting function for one signal is (Basically i have a group of inputs that belong to the same signal and im calculating the predictions for all of them to take the majority decision):

def predict(model, group):
    preds = []
    model.eval()
    with torch.inference_mode():
        for i in range(len(group)):
            audio_file = group.loc[i, 'relative_path']
            data = torch.load(audio_file)
            data.to(device)
            outputs = model(data.unsqueeze(dim=0))
            print(outputs)
            _, prediction = torch.max(outputs, 1)
            preds.append(prediction.item())
    return preds

Could you describe the class distribution and check if class0 is the majority class with approx. 90% of all samples?

Hey thanks for answering, The distribution of the classes for the different data splits is


(Normal = 0/ Abnormal = 1)
I understand that the class distribution is unbalanced, but I have also reviewed the confusion matrix for the testing:
image_2023-05-03_173014768

Did you create this confusion matrix with a large batch size or the single samples? If both, could you show the difference between these matrices?

I generated the confusion matrix using the same batch size that was used during training. When testing the model on individual samples, all of the samples were predicted to be in the left side of the matrix (Normal) and there were no predictions on the right side (Abnormal). since the testing set is relatively small, I even printed out each prediction and it was all 0s.

Could you post the model definition which would reproduce the issue of changing the output during evaluation based on the batch size?

Sure here’s my model definition, it is a hybrid CNN LSTM model.

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Sequential(
            nn.Conv2d(1, 16, 3), nn.ReLU(), nn.MaxPool2d(2), nn.BatchNorm2d(16)
        )
        self.conv2 = nn.Sequential(
            nn.Conv2d(16, 32, 3), nn.ReLU(), nn.MaxPool2d(2), nn.BatchNorm2d(32)
        )
        self.conv3 = nn.Sequential(
            nn.Conv2d(32, 64, 3), nn.ReLU(), nn.MaxPool2d(4), nn.BatchNorm2d(64)
        )
        self.flatten = nn.Flatten()

    def forward(self, x):
        out = self.conv1(x)
        out = self.conv2(out)
        out = self.conv3(out)
        out = self.flatten(out)
        return out


class RNN(nn.Module):
    def __init__(self, input_size, hidden_size=64, n_layers=1, device="cuda:0"):
        super(RNN, self).__init__()
        self.device = device
        self.n_layers = n_layers
        self.hidden_size = hidden_size

        self.lstm = nn.LSTM(input_size, hidden_size, n_layers, batch_first=True)
        self.flatten = nn.Flatten()

    def forward(self, x):
        # [batch_size, 1, n_mfcc, seq_length]
        out = x.squeeze(dim=1)
        # [batch_size, n_mfcc, seq_length]
        out = out.permute(0, 2, 1)
        hidden_states = torch.zeros(self.n_layers, out.size(0), self.hidden_size).to(
            self.device
        )
        cell_states = torch.zeros(self.n_layers, out.size(0), self.hidden_size).to(
            self.device
        )
        # [batch_size, seq_length, n_mfcc]
        out, _ = self.lstm(out, (hidden_states, cell_states))
        out = self.flatten(out[:, -1, :])
        return out

class CRNN(nn.Module):
    def __init__(
        self,
        input_size,
        n_classes,
        n_layers_rnn=64,
        fc_in=8576,
        device="cuda:0",
    ):
        super(CRNN, self).__init__()
        self.cnn = CNN()
        self.rnn = RNN(input_size, 64, n_layers_rnn, device=device)
        self.fc1 = nn.Linear(fc_in, 32)
        self.relu1 = nn.ReLU()
        self.dropout = nn.Dropout(0.5)
        self.fc2 = nn.Linear(32, n_classes)

    def forward(self, x):
        cnn_out = self.cnn(x)
        rnn_out = self.rnn(x)
        out = torch.cat([cnn_out, rnn_out], dim=1)
        out = self.fc1(out)
        out = self.relu1(out)
        out = self.dropout(out)
        out = self.fc2(out)
        return out

the args of the model are:

myModel = CRNN(
    input_size=32, n_classes=2, n_layers_rnn=64, fc_in=576, device=device
)

Thank for the code!
I cannot reproduce the issue and get the same outputs with the expected small numerical mismatch caused by the limited floating point precision:

x = torch.randn(100, 1, 32, 144)
myModel.eval()

# all samples
ref = myModel(x)

# single sample
out = []
for x_ in x:
    x_.unsqueeze_(0)
    out.append(myModel(x_))
out = torch.cat(out)

print((ref - out).abs().max())
# tensor(8.1956e-08, grad_fn=<MaxBackward1>)

Hey thanks for the reply!
Did you train the model before running this code? because I tried it on an untrained model and it showed the same results you had. But I tried printing out the ref and out tensors and all the outputs were similar maybe that’s why the difference was small.

tensor([[-0.1333, -0.1921],
        [-0.1357, -0.1879],
        [-0.1344, -0.1888],
        [-0.1349, -0.1890],
        [-0.1334, -0.1884],
        [-0.1332, -0.1926],
        [-0.1319, -0.1924],
        [-0.1320, -0.1903],
        [-0.1320, -0.1872],
        [-0.1311, -0.1926],
        [-0.1340, -0.1892],
        [-0.1347, -0.1902],
        [-0.1340, -0.1900],
        [-0.1336, -0.1910],
        [-0.1346, -0.1910],
        [-0.1348, -0.1894],
        [-0.1335, -0.1908],
        [-0.1360, -0.1912],
        [-0.1299, -0.1925],
        [-0.1349, -0.1893],
        [-0.1320, -0.1918],
        [-0.1334, -0.1886],
        [-0.1339, -0.1885],
        [-0.1354, -0.1866],
        [-0.1326, -0.1932],
        [-0.1321, -0.1884],
        [-0.1351, -0.1886],
        [-0.1353, -0.1907],
        [-0.1329, -0.1912],
        [-0.1330, -0.1915],
        [-0.1330, -0.1889],
        [-0.1332, -0.1888],
        [-0.1329, -0.1922],
        [-0.1341, -0.1918],
        [-0.1329, -0.1900],
        [-0.1323, -0.1910],
        [-0.1326, -0.1912],
        [-0.1320, -0.1899],
        [-0.1324, -0.1878],
        [-0.1328, -0.1899],
        [-0.1329, -0.1934],
        [-0.1335, -0.1895],
        [-0.1323, -0.1928],
        [-0.1339, -0.1885],
        [-0.1327, -0.1900],
        [-0.1326, -0.1897],
        [-0.1337, -0.1865],
        [-0.1331, -0.1888],
        [-0.1294, -0.1939],
        [-0.1330, -0.1866],
        [-0.1331, -0.1900],
        [-0.1329, -0.1887],
        [-0.1340, -0.1917],
        [-0.1359, -0.1861],
        [-0.1325, -0.1917],
        [-0.1344, -0.1857],
        [-0.1348, -0.1911],
        [-0.1327, -0.1914],
        [-0.1327, -0.1890],
        [-0.1337, -0.1905],
        [-0.1359, -0.1886],
        [-0.1348, -0.1908],
        [-0.1317, -0.1883],
        [-0.1339, -0.1901]], grad_fn=<AddmmBackward0>)

However when I tried it on my trained model it resulted in

tensor(0.4208, device='cuda:0', grad_fn=<MaxBackward1>)

which I think is pretty high.

No, I didn’t train the model but it also doesn’t change the output if I do:

device = "cuda"
myModel = CRNN(
    input_size=32, n_classes=2, n_layers_rnn=64, fc_in=576, device=device
)
myModel.to(device)


x = torch.randn(100, 1, 32, 144).to(device)
optimizer = torch.optim.Adam(myModel.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()
target = torch.randint(0, 2,(100,)).to(device)

# train
for epoch in range(100):
    optimizer.zero_grad()
    output = myModel(x)
    loss = criterion(output, target)
    loss.backward()
    optimizer.step()
    print(f"epoch: {epoch}; loss: {loss:.3f}")

# epoch: 96; loss: 0.000
# epoch: 97; loss: 0.001
# epoch: 98; loss: 0.000
# epoch: 99; loss: 0.000

myModel.eval()

# all samples
ref = myModel(x)

# single sample
out = []
for x_ in x:
    x_.unsqueeze_(0)
    out.append(myModel(x_))
out = torch.cat(out)

print((ref - out).abs().max())
# tensor(6.6757e-06, device='cuda:0', grad_fn=<MaxBackward1>)

what could be the issue then?

I don’t know, but the model doesn’t seem to be the issue, so I would still need a minimal and executable code snippet to reproduce the issue.

you can find the full code in my GitHub Repository.
Scroll down to section 5. Evaluation in the main_preprocessed.ipynb file there is a comparison between batched signal predictions and unbatched predictions.