Accuracy reaches 100% after one bactch

I am currently working on a transfer learning problem with a resnet-50. Below is my code for the training. It seems to be working, but the accuracy goes from 0 to 1 in second batch and then stays at 1 for the remaining batches and epochs.

import time

epochs = 3
device = torch.device("cuda:0")
# Define Optimizer and Loss Function
loss_func = nn.NLLLoss()
optimizer = torch.optim.Adam(res50.parameters())

for epoch in range(epochs):
    epoch_start = time.time()
    print("Epoch: {}/{}".format(epoch+1, epochs))
     
    # Set to training mode
    res50.train()
     
    # Loss and Accuracy within the epoch
    train_loss = 0.0
    train_acc = 0.0
     
    valid_loss = 0.0
    valid_acc = 0.0
 
    for i, (inputs, labels) in enumerate(train_data):
 
        inputs = inputs.to(device)
        labels = labels.to(device)
         
        # Clean existing gradients
        optimizer.zero_grad()
         
        # Forward pass - compute outputs on input data using the model
        outputs = res50(inputs)
         
        # Compute loss
        loss = loss_func(outputs, labels)
         
        # Backpropagate the gradients
        loss.backward()
         
        # Update the parameters
        optimizer.step()
         
        # Compute the total loss for the batch and add it to train_loss
        train_loss += loss.item() * inputs.size(0)
         
        # Compute the accuracy
        ret, predictions = torch.max(outputs.data, 1)
        correct_counts = predictions.eq(labels.data.view_as(predictions))
         
        # Convert correct_counts to float and then compute the mean
        acc = torch.mean(correct_counts.type(torch.FloatTensor))
         
        # Compute total accuracy in the whole batch and add to train_acc
        train_acc += acc.item() * inputs.size(0)
         
        print("Batch number: {:03d}, Training: Loss: {:.4f}, Accuracy: {:.4f}".format(i, loss.item(), acc.item()))

Below is an example output for first five batches. I am not sure if I am calculating accuracy incorrectly?

Epoch: 1/3
Batch number: 000, Training: Loss: 2.2015, Accuracy: 0.0000
Batch number: 001, Training: Loss: 0.1964, Accuracy: 1.0000
Batch number: 002, Training: Loss: 0.0162, Accuracy: 1.0000
Batch number: 003, Training: Loss: 0.0013, Accuracy: 1.0000
Batch number: 004, Training: Loss: 0.0001, Accuracy: 1.0000

I do have a theory it has to do with classes vs targets. I had to set the dataset targets since I am working with custom dataset. When I do this, the train_data.dataset.targets outputs [2,1,2,3,…,5] according to the class but, the train_data.dataset.classes outputs [’.ipynb_checkpoints’, ‘1’]. I think this is the problem because when I print out labels in nested for loop it is all 1.

Thanks in advance for any help!

If that’s the case, your model would only have to output the highest logit for class1 and would achieve a perfect classification.
Since you are using a custom dataset, I would recommend to check its implementation again and make sure the real targets are returned.

Feel free to post the Dataset implementation in case you get stuck. :wink:

1 Like

Hi ptrblck thanks for quick reply!

Here is my dataset implementation. In this, you can see that I attempt to change the data[‘train’] /valid/test calsses to the correct values.

# Load the Data

# Set train and valid directory paths
train_directory = 'datasets/train'
valid_directory = 'datasets/valid'
test_directory = 'datasets/test'
 
# Batch size
bs = 4
 
# Number of classes
num_classes = 8
 
# Load Data from folders
data = {
    'train': datasets.ImageFolder(root=train_directory, transform=image_transforms['train']),
    'valid': datasets.ImageFolder(root=valid_directory, transform=image_transforms['valid']),
    'test': datasets.ImageFolder(root=test_directory, transform=image_transforms['test'])
}
 
# Size of Data, to be used for calculating Average Loss and Accuracy
train_data_size = len(data['train'])
valid_data_size = len(data['valid'])
test_data_size = len(data['test'])

# I have tried both .classes and .labels same result
data['train'].classes = train_dic
data['valid'].classes = valid_dic
data['test'].classes = test_dic
 
# Create iterators for the Data loaded using DataLoader module
train_data = torch.utils.data.DataLoader(data['train'], batch_size=bs, shuffle=True)
valid_data = torch.utils.data.DataLoader(data['valid'], batch_size=bs, shuffle=True)
test_data = torch.utils.data.DataLoader(data['test'], batch_size=bs, shuffle=True)
 
# Print the train, validation and test set data sizes
train_data_size, valid_data_size, test_data_size

Let me note that train_dic, valid_dic, and test_dic are lists of integers 0-7 according the specific classes. When I try to run the following code snippet of the training

epochs = 3
device = torch.device("cuda:0")
# Define Optimizer and Loss Function
loss_func = nn.NLLLoss()
optimizer = torch.optim.Adam(res50.parameters())

for j, (inputs, labels) in enumerate(train_data):
        inputs = inputs.to(device)
        labels = labels.to(device)
        print(inputs)
        print(labels)

The results for the inputs refer to the image tensors and labels I expect to be the number 0-7 as mentioned but they are all 1.

INPUT:
tensor([[[[-2.1179, -2.1179, -2.1179,  ..., -2.1179, -2.1179, -2.1179],
          [-2.1179, -2.1179, -2.1179,  ..., -2.1179, -2.1179, -2.1179],
          [-2.1179, -2.1179, -2.1179,  ..., -2.1179, -2.1179, -2.1179],
          ...,
          [-2.1179, -2.1179, -2.1179,  ..., -2.1179, -2.1179, -2.1179],
          [-2.1179, -2.1179, -2.1179,  ..., -2.1179, -2.1179, -2.1179],
          [-2.1179, -2.1179, -2.1179,  ..., -2.1179, -2.1179, -2.1179]],
...
LABELS:
tensor([1, 1, 1, 1], device='cuda:0')

Also, I just read that Image folder creates it’s own labeling so I ran

data['train'].class_to_idx

and got the following result which I think is the problem.

{'.ipynb_checkpoints': 0, '1': 1}

can I change the classes after the fact like I tried above?

There seem to be some issues:

  • Could you move the data to another folder, without the hidden .ipynb_checkpoints folder? Currently this folder seems to be recognized as class0, which is most likely wrong.
  • It also seems that you are dealing with a single class folder called 1. If that’s the case even moving this data folder to another location would only yield a single class. ImageFolder expects a root folder containing a subfolder with images for each class.
1 Like

I figured it out. I was misunderstanding the way ImageFolder reads in the folder structure. I fixed it accordingly, and had to remove the hidden notebook checkpoint. Thanks!