Confusing and unpredictable behavior while calculating accuracy

I have built a model and overfit the model for a small batch of 16 samples from the training set. When I calculate the accuracy for this same batch, I expect 100% accuracy. I have used two different methods to calculate accuracy and both of them give such different results.

My forward pass function:

def fwd_pass(X, y, train=False):
    if not train:
        with torch.no_grad():
            outputs = model(X)
            matches = (torch.argmax (outputs, dim = 1) == y).sum()
    
            acc = matches/len(y)
    
            loss = loss_fn(outputs, y)
        
        return acc, loss
        
    outputs = model(X)
    matches = (torch.argmax (outputs, dim = 1) == y).sum()

    acc = matches/len(y)

    loss = loss_fn(outputs, y)

    loss.backward()
    opt.step()
    model.zero_grad()
        
    return acc, loss

My train function:

def train(net, epochs, batch_size, X, y, val_X=None, val_y=None):
    accuracies = []
    losses = []
    val_accuracies = []
    val_losses = []
    
    for ep in tqdm(range(epochs)):
        for i in tqdm(range(0, len(X), batch_size)):
            batch_X = X[i:i+batch_size].to(device)
            batch_y = y[i:i+batch_size].to(device)
            
            acc, loss = fwd_pass(batch_X.float(), batch_y, train=True)
            
            torch.cuda.empty_cache()
            
        if val_X != None and val_y != None:
            val_acc, val_loss = fwd_pass(val_X.to(device), val_y.to(device))
        
        print(f'Ep: {ep+1} Acc: {round(float(acc), 5)} Loss: {round(float(loss), 5)}')
        
        if val_X != None and val_y != None:
            print(f'Val Acc: {round(float(val_acc), 5)} Val Loss: {round(float(val_loss), 5)}')
        
        accuracies.append(acc)
        losses.append(loss)
        
        if val_X != None and val_y != None:
            val_accuracies.append(val_acc)
            val_losses.append(val_loss)
            
    return accuracies, losses, val_accuracies, val_losses

Now here, I calculate the accuracies:

corr = 0
tot = 0


with torch.no_grad():
    for i in range(len(batch_1_X)):
        op = model(batch_1_X[i].view(1, 100, 8).transpose(1, 2).to(device).float())
        pred = torch.argmax(op)
        real = batch_1_y[i]
        
        if pred == real:
            corr += 1
        
        tot += 1
        
print(corr)
print(tot)

The above gives me the following output:

3
16

suggesting that only 3 out of 16 have been predicted correctly, even though the model has been overfit.

Now, this method:

corr = 0
tot = 0

with torch.no_grad():
    op = model(batch_1_X.transpose(1, 2).to(device).float())
    preds = torch.argmax(op, dim=1)
    for p, r in zip(preds, batch_1_y):
        if p == r:
            corr += 1
            
        tot += 1

print(corr)
print(tot)

gives me the output

16
16

suggesting that all predictions are correct

The most confusing part of all was this:

corr = 0
tot = 0
num = 16

with torch.no_grad():
    op = model(batch_1_X[0:num].transpose(1, 2).to(device).float())
    preds = torch.argmax(op, dim=1)
    for p, r in zip(preds, batch_1_y[0:num]):
        if p == r:
            corr += 1
            
        tot += 1

print(corr)
print(tot)

When num is equal to 16, the corr and tot values are 16 and 16. When num is equal to something like 1, 3, 4, 5, 6, 7, 8, 9,…, the corr and tot values are equal. But when num is 2, corr is 1 and tot is 2 suggesting that the model got only 1 of the 2 predictions right.

What is the mistake I am making that is giving me this unpredictable behaivior?

Did you make sure to call model.eval() before calculating the accuracy?
Also, I would double check the view and transpose operation and make sure all different batch sizes use the correct logic.

Instead of model.eval(), I have used all code inside a torch.no_grad() block. With regards to the shapes of input tensors, I checked for the correct shape beforehand and they matched correctly. Are you suggesting that there might be a problem with the values while transposing the tensor itself?

Note that these calls do not perform the same operations.
model.eval() changes the behavior of some layers (e.g. batchnorm layers will use their running stats and dropout will be disabled), while no_grad() disallows the gradient calculation and saves memory by not storing intermediate tensors, so you cannot use one instead of the other.

Here is my code with model.train():

corr = 0
tot = 0

model.train()

with torch.no_grad():
    for i in range(len(batch_1_X)):
        op = model(batch_1_X[i].view(-1, 100, 8).transpose(1, 2).to(device).float())
        pred = torch.argmax(op)
        real = batch_1_y[i]
        
        if pred == real:
            corr += 1
        
        tot += 1
        
print(corr)
print(tot)


print("-"*20)

corr = 0
tot = 0

with torch.no_grad():
    op = model(batch_1_X.transpose(1, 2).to(device).float())
    preds = torch.argmax(op, dim=1)
#     print(batch_1_y)
#     print(preds)
    for p, r in zip(preds, batch_1_y):
        if p == r:
            corr += 1
            
        tot += 1

print(corr)
print(tot)

This gives me the output:

3
16
--------------------
16
16

Switching it to model.eval():

corr = 0
tot = 0

model.eval()

with torch.no_grad():
    for i in range(len(batch_1_X)):
        op = model(batch_1_X[i].view(-1, 100, 8).transpose(1, 2).to(device).float())
        pred = torch.argmax(op)
        real = batch_1_y[i]
        
        if pred == real:
            corr += 1
        
        tot += 1
        
print(corr)
print(tot)


print("-"*20)

corr = 0
tot = 0

with torch.no_grad():
    op = model(batch_1_X.transpose(1, 2).to(device).float())
    preds = torch.argmax(op, dim=1)
#     print(batch_1_y)
#     print(preds)
    for p, r in zip(preds, batch_1_y):
        if p == r:
            corr += 1
            
        tot += 1

print(corr)
print(tot)

This gives the output:

8
16
--------------------
11
16

You were right @ptrblck, model.eval() does procure different results! But I did overfit the model to 100% accuracy for training data and while evaluating on the same training data I am not getting a 100% accuracy. Even if I try to measure the accuracy using two different methods, it gives me differing results. Any suggestions would help!

Hey @ptrblck ! A follow up to this, it was a really silly mistake! In the model definition, I had 2 LSTM layers. For the first, I mentioned batch_first=True but for the second, I had forgotten. I have had this code for months and never thought to re write the model definition. Here is the corrected model with batch_first=True for the second layer as well:

class TFModel(nn.Module):
    def __init__(self):
        super().__init__()
        
        self.conv1 = nn.Conv1d(8, 16, kernel_size=8)
        self.conv2 = nn.Conv1d(16, 32, kernel_size=8)
        self.conv3 = nn.Conv1d(32, 64, kernel_size=8)
        
        self.bn1 = nn.BatchNorm1d(64)  # after pooling of 2
        
        self.conv4 = nn.Conv1d(64, 64, kernel_size=8)
        self.conv5 = nn.Conv1d(64, 128, kernel_size=8)
        
        self.bn2 = nn.BatchNorm1d(128)  # after pooling of 2
        
#         self.flat = nn.Flatten()

        self.lstm1 = nn.LSTM(128, 100, batch_first=True)
        self.lstm2 = nn.LSTM(100, 128, batch_first=True)
        
        self.fc1 = nn.Linear(128, 64)
        self.fc2 = nn.Linear(64, 32)
        self.fc3 = nn.Linear(32, classes)
        
    def exec_conv_block(self, x):
        x = F.relu(self.conv1(x))
        x = F.relu(self.conv2(x))
        x = F.relu(self.conv3(x))
        
        x = F.max_pool1d(x, 2)
#         x = self.bn1(x)
        
        x = F.relu(self.conv4(x))
        x = F.relu(self.conv5(x))
        
        x = F.max_pool1d(x, 2)
#         x = self.bn2(x)
               
        return x
    
    def forward(self, x):
        x = self.exec_conv_block(x)
        
        x, _ = self.lstm1(x.transpose(1, 2))
        x, _ = self.lstm2(x)
        
        x = x[:, -1, :]
        
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        
        return x