I have also noticed this. Even while I code, even if I miss eval and train. It didn’t have any effect on the result.
But the thing is how we are going about it does matter. We use eval because we won’t be interested in updating the weight of the network. While train weight updation happens. If we can turn off the gradients using torch no grad like your example, be it. It’s another way of approaching I think.
My understanding is that .eval() is to tell the network to disable dropout and batchnorm layers, where as the torch.no_grad() context is to disable gradient calculations. They are different concepts which happen to be used together during inference.
One possibility why .eval() is missing is because there are no dropout or batchnorm layers?