Hi there,
Thanks in advance for the help.
Transfer learning with several pre-trained torchvision models on an imagefolder dataset. When I run my evaluation loop, the accuracy runs below what you would expect from random chance the more the model trains. But if I throw a little hack in there and reverse the labels, the accuracy shoots up to a level comparable to the training accuracy. I printed out the dictionaries with the labels for my evaluation and training datasets, and the same numbers correspond to the same labels.
i.e. running this:
trainDataset = torchvision.datasets.ImageFolder(trainDataPath,
transform=torchvision.transforms.ToTensor())
trainLoader = DataLoader(trainDataset,batch_size=batchSize, shuffle=True)
numCategories = len(trainDataset.classes)
valDataset = torchvision.datasets.ImageFolder(valDataPath,
transform=torchvision.transforms.ToTensor())
valLoader = DataLoader(valDataset,batch_size=batchSize, shuffle=True)
print(trainDataset.class_to_idx)
print(valDataset.class_to_idx)
returns identical dictionaries:
{‘LeftFractures’: 0, ‘LeftNon-Fractures’: 1, ‘RightFractures’: 2, ‘RightNon-Fractures’: 3}
{‘LeftFractures’: 0, ‘LeftNon-Fractures’: 1, ‘RightFractures’: 2, ‘RightNon-Fractures’: 3}
My evaluation loop runs as a separate function:
def evaluateModel(model, valLoader, lossFunction, sensitivityFunction, specificityFunction, cuda, numCategories, labelSmoothing=False):
print(valLoader.dataset.class_to_idx)
model = model.eval()
lossList = []
accuracyList = []
sensitivityList = []
specificityList = []
predictions = torch.LongTensor([]).cuda()
ys = torch.LongTensor([]).cuda()
for batch, (x,y) in enumerate(valLoader):
#x = torch.randn_like(x) # FIXME: remove later
if cuda:
y = y.cuda()
x = x.cuda()
yHat = model(x)
predictedCategory = torch.argmax(yHat, dim=1)
predictedCategory = (predictedCategory * -1) + numCategories - 1 # **hack because the model's predictions seem to be reversed somehow**
predictions = torch.cat((predictions, predictedCategory), dim=0)
ys = torch.cat((ys, y), dim=0)
if labelSmoothing:
# note that if labelSmoothing = True and the loss is something like BCE, it will throw an error
smoothY = smoothLabel(y, numCategories, alpha = 0.2, cuda=cuda)
else:
smoothY = y
loss = lossFunction(yHat, smoothY)
lossList.append(loss.item())
model = model.train()
gc.collect()
sensitivities = sensitivityFunction(predictions, ys, numCategories)
specificities = specificityFunction(predictions, ys, numCategories)
accuracy = torch.sum(predictions == ys).item() / predictions.shape[0]
return np.mean(lossList), sensitivities, specificities, accuracy
Note that it prints the same dictionary as before:
{‘LeftFractures’: 0, ‘LeftNon-Fractures’: 1, ‘RightFractures’: 2, ‘RightNon-Fractures’: 3}
For comparison, this is my training loop:
for batchNum, (x, y) in enumerate(trainLoader):
if cuda:
x = x.cuda()
y = y.cuda()
yHat = model(x)
predCat = torch.argmax(yHat, dim=1)
print(y)
print(predCat)
acc = torch.sum(predCat == y).item() / y.shape[0]
if labelSmooth:
# note that if labelSmoothing = True and the loss is something like BCE, it will throw an error
smoothY = smoothLabel(y, numCategories, alpha = 0.2, cuda=cuda)
else:
smoothY = y
loss = lossFunction(yHat, smoothY)
lossList.append(loss.item())
loss.backward()
optimizer.step()
model.zero_grad()
Brownie points and all manner of gratitude if anyone can help me understand what I am doing wrong to make my model’s predictions come out reversed during evaluation time!
Also note that the phenomenon is consistent across several models, from Alexnet to Googlenet. In each case I replace the final linear layer to match the image size and number of categories (4).
Thanks again!