Different test accuracy results

Hi,
I have a pre-trained model and whenever I try to test my model on the same test data, I get different accuracy and loss every time.
For example, for my first try, I get 98% in test accuracy, but then when I re-test the model on the exact same test data loader, I now get 91%.
Since the model weights are fixed during inference, there should be no such thing as randomness.
What could have gone wrong here?

Here is an example:

It seems as if the very first iteration differs from the subsequent ones.
Could you post the model definition or an executable code snippet, which would reproduce this issue, please?

Below is how I loaded my pre-trained model and created test dataloader.

from collections import Counter, OrderedDict

x = torch.load("pretrained_att_resnet50.pth")

torch.manual_seed(42)
torch.cuda.manual_seed(42)
random.seed(42)
np.random.seed(42)

model = TwoAttresnet50()
model.fc = nn.Linear(512*4,2)
model.load_state_dict(x)
model = model.to(device)

class CustomDataset_test(Dataset):
    def __init__(self, files, labels, augmentation, transforms2, valid):
        self.files = files
        self.labels = labels
        self.aug = augmentation
        self.transforms2 = transforms2
        self.valid = valid
        self.data = []
        
        for i in range(len(self.files)):
            sample = {}
            sample['img'] = Image.open(self.files[i])
            sample['label'] = labels[i]
            sample['fname'] = self.files[i]
            self.data.append(sample)
        
    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        x = self.data[idx]['img']
        y = self.data[idx]['label']
        
        x1 = self.aug(x)
        x2 = self.transforms2(x)
        return {"img": np.array(x1, dtype='float32'), "labels": y, "original": np.array(x2, dtype='float32'),"fname": self.data[idx]['fname']}
    
test_transforms = transforms.Compose([
                    #transforms.Resize(224),
                    transforms.ToTensor(),
                    transforms.Normalize((0.2880, 0.2441, 0.2705),(0.2033, 0.1889, 0.1822))
                    #transforms.Normalize((0.2206, 0.1899, 0.2003), (0.1690, 0.1592, 0.1524))
                    ])

original_transforms = transforms.Compose([
    #transforms.Resize(224)
])

torch.manual_seed(42)
torch.cuda.manual_seed(42)
random.seed(42)
np.random.seed(42)

batch_size_test = 1
test_dataset = CustomDataset_test(test_files,test_labels,augmentation=test_transforms, transforms2 = original_transforms, valid=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size_test, shuffle=False, drop_last = True, num_workers =1)

print('######### Dataset class created #########')
print('Number of images: ', len(test_files))
print("test size in no of batch: ",len(test_loader))
print("test size: ",len(test_loader)*batch_size_test)

@ptrblck
I just solved the problem by changing the custom dataset class, but I don’t know why…

Below is my new dataset class

class CustomDataset_test(Dataset):
    def __init__(self, files, labels, augmentation, valid):
        self.files = files
        self.labels = labels
        self.aug = augmentation
        self.valid = valid
        self.data = []
        
        for i in range(len(self.files)):
            sample = {}
            sample['img'] = self.files[i] #Image.open(self.files[i])
            sample['label'] = labels[i]
            sample['fname'] = self.files[i]
            self.data.append(sample)
        
    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        x = self.data[idx]['img']
        y = self.data[idx]['label']
        
        x = Image.open(x)
        x1 = self.aug(x)
        
        return {"img": np.array(x1, dtype='float32'), "labels": y, "original": np.array(x,dtype='uint8'), "fname": str(self.data[idx]['fname'])}   
test_transforms = transforms.Compose([
                    #transforms.Resize(224),
                    transforms.ToTensor(),
                    transforms.Normalize((0.2880, 0.2441, 0.2705),(0.2033, 0.1889, 0.1822))
                    #transforms.Normalize((0.2206, 0.1899, 0.2003), (0.1690, 0.1592, 0.1524))
                    ])

The only difference is that I did not open the image file during initialization, but opened it during getitem function. However, I don’t see the difference. Do you know what is the difference between opening an image file using PIL in init and getitem?

Thank you!

There shouldn’t be any difference besides the memory usage, which would be higher when you are preloading all images.
Besides that, your code is unfortunately not executable, so that I cannot run or debug it, but good to hear it’s working now.

1 Like

@ptrblck
I see, then that’s very strange. Anyway, thank you very much for the help!