Can we perform the training and evaluation of a model separately?

Hi, I have some questions:

1- Can I perform model.train() and model.eval() separately, each of them in an independent script? or this will affect the model performance and they have to be called respectively?

2- In case that we can perform only the training: after saving a checkpoint for each epoch during the training, can we load these checkpoints from the checkpoints file to perform the evaluation and plot the evaluation performance?

  1. Yes, you can write own scripts for the training and evaluation of the model. Make sure to load the state_dict properly before starting the evaluation.

  2. Yes, that should be possible.

Hi @ptrblck, thanks for confirming that for me. So, after the training and saving the checkpoint as follow:

## Save checkpoint ##
        PATH = 'checkpoints/'+str(index)+'.pth'{
            'epoch': index + 1,
            'valid_loss': valid_loss,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            }, PATH)

I should add optimizer.zero_grad() and optimizer.step() before loading the checkpoints, is that right?

I would assume your validation script wouldn’t have an optimizer so I’m unsure where you would call zero_grad() and step(). Also, if you are calling these operations right after another without a backward operation, the step() would only change the parameters if running stats were used so I’m also unsure why these operations should be executed.

When I load the checkpoint to evaluate the model and without using these operations, I got this error:

UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()’

before printing the performance. So, I tried to execute them as in training prosses.

test_pred = model(test_data)

test_loss = model_fit(test_pred, test_label)

Doing this removes the warning error but the results seem to be similar. So, do I ignore the warning message or keep using these operations?

Using model_fit on the validation dataset seems wrong and I assume that you might be optimizing the model internally.
If so, make sure to calculate the forward passes only to get the predictions and to compute the accuracy of the model. No learning rate scheduler or optimizer would be needed to do so.

Sorry, model_fit is only a custom function to do the following: loss = F.nll_loss(x_pred, x_output, ignore_index). So, what I understand is that I do not need to call scheduler.step() when performing the evaluation process.

Yes, you don’t even need to create a learning rate scheduler and an optimizer in your evaluation script since you are not training the model at all.

Thanks. It really helps.