Accuracy reaches 100% after one bactch

natelang · July 15, 2020, 4:32am

I am currently working on a transfer learning problem with a resnet-50. Below is my code for the training. It seems to be working, but the accuracy goes from 0 to 1 in second batch and then stays at 1 for the remaining batches and epochs.

import time

epochs = 3
device = torch.device("cuda:0")
# Define Optimizer and Loss Function
loss_func = nn.NLLLoss()
optimizer = torch.optim.Adam(res50.parameters())

for epoch in range(epochs):
    epoch_start = time.time()
    print("Epoch: {}/{}".format(epoch+1, epochs))
     
    # Set to training mode
    res50.train()
     
    # Loss and Accuracy within the epoch
    train_loss = 0.0
    train_acc = 0.0
     
    valid_loss = 0.0
    valid_acc = 0.0
 
    for i, (inputs, labels) in enumerate(train_data):
 
        inputs = inputs.to(device)
        labels = labels.to(device)
         
        # Clean existing gradients
        optimizer.zero_grad()
         
        # Forward pass - compute outputs on input data using the model
        outputs = res50(inputs)
         
        # Compute loss
        loss = loss_func(outputs, labels)
         
        # Backpropagate the gradients
        loss.backward()
         
        # Update the parameters
        optimizer.step()
         
        # Compute the total loss for the batch and add it to train_loss
        train_loss += loss.item() * inputs.size(0)
         
        # Compute the accuracy
        ret, predictions = torch.max(outputs.data, 1)
        correct_counts = predictions.eq(labels.data.view_as(predictions))
         
        # Convert correct_counts to float and then compute the mean
        acc = torch.mean(correct_counts.type(torch.FloatTensor))
         
        # Compute total accuracy in the whole batch and add to train_acc
        train_acc += acc.item() * inputs.size(0)
         
        print("Batch number: {:03d}, Training: Loss: {:.4f}, Accuracy: {:.4f}".format(i, loss.item(), acc.item()))

Below is an example output for first five batches. I am not sure if I am calculating accuracy incorrectly?

Epoch: 1/3
Batch number: 000, Training: Loss: 2.2015, Accuracy: 0.0000
Batch number: 001, Training: Loss: 0.1964, Accuracy: 1.0000
Batch number: 002, Training: Loss: 0.0162, Accuracy: 1.0000
Batch number: 003, Training: Loss: 0.0013, Accuracy: 1.0000
Batch number: 004, Training: Loss: 0.0001, Accuracy: 1.0000

I do have a theory it has to do with classes vs targets. I had to set the dataset targets since I am working with custom dataset. When I do this, the train_data.dataset.targets outputs [2,1,2,3,…,5] according to the class but, the train_data.dataset.classes outputs [’.ipynb_checkpoints’, ‘1’]. I think this is the problem because when I print out labels in nested for loop it is all 1.

Thanks in advance for any help!

ptrblck · July 15, 2020, 10:36am

If that’s the case, your model would only have to output the highest logit for class1 and would achieve a perfect classification.
Since you are using a custom dataset, I would recommend to check its implementation again and make sure the real targets are returned.

Feel free to post the Dataset implementation in case you get stuck.

natelang · July 15, 2020, 11:02pm

Hi ptrblck thanks for quick reply!

Here is my dataset implementation. In this, you can see that I attempt to change the data[‘train’] /valid/test calsses to the correct values.

# Load the Data

# Set train and valid directory paths
train_directory = 'datasets/train'
valid_directory = 'datasets/valid'
test_directory = 'datasets/test'
 
# Batch size
bs = 4
 
# Number of classes
num_classes = 8
 
# Load Data from folders
data = {
    'train': datasets.ImageFolder(root=train_directory, transform=image_transforms['train']),
    'valid': datasets.ImageFolder(root=valid_directory, transform=image_transforms['valid']),
    'test': datasets.ImageFolder(root=test_directory, transform=image_transforms['test'])
}
 
# Size of Data, to be used for calculating Average Loss and Accuracy
train_data_size = len(data['train'])
valid_data_size = len(data['valid'])
test_data_size = len(data['test'])

# I have tried both .classes and .labels same result
data['train'].classes = train_dic
data['valid'].classes = valid_dic
data['test'].classes = test_dic
 
# Create iterators for the Data loaded using DataLoader module
train_data = torch.utils.data.DataLoader(data['train'], batch_size=bs, shuffle=True)
valid_data = torch.utils.data.DataLoader(data['valid'], batch_size=bs, shuffle=True)
test_data = torch.utils.data.DataLoader(data['test'], batch_size=bs, shuffle=True)
 
# Print the train, validation and test set data sizes
train_data_size, valid_data_size, test_data_size

Let me note that train_dic, valid_dic, and test_dic are lists of integers 0-7 according the specific classes. When I try to run the following code snippet of the training

epochs = 3
device = torch.device("cuda:0")
# Define Optimizer and Loss Function
loss_func = nn.NLLLoss()
optimizer = torch.optim.Adam(res50.parameters())

for j, (inputs, labels) in enumerate(train_data):
        inputs = inputs.to(device)
        labels = labels.to(device)
        print(inputs)
        print(labels)

The results for the inputs refer to the image tensors and labels I expect to be the number 0-7 as mentioned but they are all 1.

INPUT:
tensor([[[[-2.1179, -2.1179, -2.1179,  ..., -2.1179, -2.1179, -2.1179],
          [-2.1179, -2.1179, -2.1179,  ..., -2.1179, -2.1179, -2.1179],
          [-2.1179, -2.1179, -2.1179,  ..., -2.1179, -2.1179, -2.1179],
          ...,
          [-2.1179, -2.1179, -2.1179,  ..., -2.1179, -2.1179, -2.1179],
          [-2.1179, -2.1179, -2.1179,  ..., -2.1179, -2.1179, -2.1179],
          [-2.1179, -2.1179, -2.1179,  ..., -2.1179, -2.1179, -2.1179]],
...
LABELS:
tensor([1, 1, 1, 1], device='cuda:0')

natelang · July 15, 2020, 11:06pm

Also, I just read that Image folder creates it’s own labeling so I ran

data['train'].class_to_idx

and got the following result which I think is the problem.

{'.ipynb_checkpoints': 0, '1': 1}

can I change the classes after the fact like I tried above?

ptrblck · July 16, 2020, 1:08am

There seem to be some issues:

Could you move the data to another folder, without the hidden .ipynb_checkpoints folder? Currently this folder seems to be recognized as class0, which is most likely wrong.
It also seems that you are dealing with a single class folder called 1. If that’s the case even moving this data folder to another location would only yield a single class. ImageFolder expects a root folder containing a subfolder with images for each class.

natelang · July 16, 2020, 6:30pm

I figured it out. I was misunderstanding the way ImageFolder reads in the folder structure. I fixed it accordingly, and had to remove the hidden notebook checkpoint. Thanks!