I’m new to the joys of PyTorch and am using someone else’s code, so forgive what feels like a naive question. I am training an image recognition model (based on inceptionv3) based on code in GitHub - macaodha/inat_comp_2018: CNN training code for iNaturalist 2018 image classification competition. I’ve only run a few epochs while I get a sense of how the code works.
As the code trains the model it checks it against a validation set of images.Once trailing is complete you can run a test set of images against the model to get predictions.
My expectation was that if I used the validation data set as “test” data I would get the same set of predictions as the last round of training generates (because the “test” and validation data are the same). However the test results generate very poor scores, and suggest values for the identity of the images that do not exist in the training data.
I suspect that the code is not correctly loading the saved model when it does the testing. But can anyone confirm that my assumption (validation results for last training round = test results if test data = validation data) are correct?
To add some detail, this is the code to load the model:
def build_model_and_optim():
global device, args, resume
# load pretrained model
print("Using pre-trained inception_v3")
# use this line if instead if you want to train another model
#model = models.__dict__[args.arch](pretrained=True)
model = inception_v3(pretrained=True)
model.fc = nn.Linear(2048, args.num_classes)
model.aux_logits = False
model = model.to(device)
optimizer = SGD(model.parameters(), args.lr,
momentum=args.momentum,
weight_decay=args.weight_decay)
# optionally resume from a checkpoint
if args.resume:
if os.path.isfile(args.resume):
print("=> loading checkpoint '{}' for inaturalist-inception".format(
args.resume))
checkpoint = torch.load(args.resume)
args.start_epoch = checkpoint['epoch']
best_prec3 = checkpoint['best_prec3']
model.load_state_dict(checkpoint['state_dict'], strict=False) # https://stackoverflow.com/questions/63057468/how-to-ignore-and-initialize-missing-keys-in-state-dict
optimizer.load_state_dict(checkpoint['optimizer'])
print("=> loaded checkpoint '{}' (epoch {})".format(
args.resume, checkpoint['epoch']))
else:
print("=> no checkpoint found at '{}'".format(args.resume))
return model, optimizer
If training, the following loop is invoked:
for epoch in range(args.start_epoch, args.epochs):
adjust_learning_rate(optimizer, epoch)
# train for one epoch
train(train_loader, model, criterion, optimizer, epoch)
# evaluate on validation set
if 1:
prec3, preds, im_ids = validate(val_loader, model, criterion, True)
with open('predictions-epoch-' + epoch + '.csv', 'w') as opfile:
opfile.write('id,predicted\n')
for ii in range(len(im_ids)):
opfile.write(str(im_ids[ii]) + ',' + ' '.join(str(x) for x in preds[ii,:])+'\n')
else:
prec3 = validate(val_loader, model, criterion, False)
# remember best prec@1 and save checkpoint
is_best = prec3 > best_prec3
best_prec3 = max(prec3, best_prec3)
save_checkpoint({
'epoch': epoch + 1,
#'arch': args.arch,
'state_dict': model.state_dict(),
'best_prec3': best_prec3,
'optimizer' : optimizer.state_dict(),
}, is_best)
This seems to work, the model gets better over time, and the checkpoints are saved. But testing fails (even though validation works).