Why to use validation during training, if we are using no_grad?

Hassan_Imani · December 6, 2020, 9:33pm

I have a basic question. I want to know that when we are using torch.no_gard() during the validation process, why to do validation? How it will help our model training? Just we can do validation with testing.

Also,
I converted the code from keras to pytorch. For the first epoch it works like keras, but from the second one, my loss is not decreasing. It fluctuates very small around the mean value. I am doing regression with 3D convs. What can I do?

What maybe is the cause?

Thank you

ptrblck · December 10, 2020, 7:31am

I’m not sure I understand the question correctly, but are you wondering why the validation step is used at all or why torch.no_grad() is used?

The validation step is applied to get a proxy signal about the model performance on the test data (which cannot be used during training as it would be a data leak) and for hyperparameter tuning.

It’s wrapped into a torch.no_grad() block to avoid storing intermediate tensors, which would be needed for the gradient calculation, and are thus unnecessary during the validation loop.
This would save memory and you could potentially increase the batch size to speed up the validation step.