model.train()
and model.eval()
do not change any behavior of the gradient calculations, but are used to set specific layers like dropout and batchnorm to evaluation mode (dropout won’t drop activations, batchnorm will use running estimates instead of batch statistics).
After the with torch.no_grad()
block was executed, your gradient behavior will be the same as before entering the block.