Quick question just to make sure, are you using torch.no_grad() or torch.zero_grad() for validation/test time when you remove model.eval()?
Also, you may want to take a look at these two discussions on this topic (1 and 2) in case you haven’t seen them.