Efficient train/dev sets evaluation?

Hello everybody,

What is the most efficient way to log train set and dev set accuracy and loss? Now I use the same function for both sets after training, i.e

for epoch in range(1, N):

I thought that there should be a more concise method.


This workflow looks good, if you really need the resubstitution error for a whole epoch.
Usually it’s suffiecient to calculate a running error during training and print it every X batches.

Thanks for a reply,

And what about model.train() and model.eval() modes? Is it correct to print a loss and calculate an accuracy during training with model.train() mode on? I used it only to track a running loss.

Your loss estimate will be most likely a bit too high, if you use Dropout.
Also the estimate is skewed, since your are summing up the running loss, while the model gets upgraded.

Nevertheless, I think it’s useful for model insights without spending too much time of the evaluation of the training dataset.

I used BatchNorm1d() with my 4-layer net arch on MNIST dataset. I added accuracy estimation to training function (model.train() mode) and it shows loss/accuracy - 0.49/0.48 after couple epochs. And for evaluate function on dev set it shows - 4.26/0.0019. Is it that “a bit to high” estimate during training or it looks more like my arch errors?

That looks like an error.
Usually in a vanilla neural network, your training accuracy might be a bit higher than the validation accuracy.
However, since Dropout prunes the model, the training accuracy might be lower than the evaluation accuracy.

In your example it looks like the eval accuracy is even worse than random change (accuracy ~ 0.1).
Could you post your model and training routine?

I’m not sure about posting a model, but evaluate function is:

def evaluate(epoch):
        nonlocal global_step
        num_correct = total_loss = 0
        for batch_step, (X_valid, y_valid) in enumerate(valid_set):
            X_valid, y_valid = torch.tensor(X_valid, requires_grad=False).to(DEVICE), torch.tensor(y_valid, requires_grad=False).to(DEVICE)
            logit, _ = model(X=X_valid)
            y_pred = logit.data.max(1)[1]
            loss = criterion(input=logit, target=y_valid).to(DEVICE)
            total_loss += loss.item()
            num_correct += y_pred.eq(y_valid).long().sum().item()
        loss = total_loss / len(valid_set)
        accuracy = num_correct / len(valid_set.dataset) * 100
        WRITER.add_scalar(tag='Valid Loss', scalar_value=loss,
        WRITER.add_scalar(tag='Valid Accuracy', scalar_value=accuracy,

        print('\nValidation : Loss: {:.6f} Accuracy: {}/{} ({:.4f}%)\n'.format(loss, num_correct, len(valid_set.dataset), accuracy))

        return loss, accuracy

The code looks good to me.

Okay, thanks for your replies. I’ll try to change my model and will get back with feedback

I found a problem with my code. I added with torch.no_grad(): before for-loop in evaluate function and it works fine. And I also calculate my training accuracy in the evaluate function. So I didn’t find an accurate and better solution rather to not to use the evaluate function on train_loader after train function.

for epoch in range(1, N):

Oh yes, I’ve missed the absence of the torch.no_grad() context manager.
Did this line solve the accuracy problem in the validation run?

Yea. It looks like a truth :slight_smile: