Worse performance using model.eval() and dropout layers for regression task

I am training a model to predict eigenfrequencies and quality factors for a membrane resonator, given certain design parameters. The architecture is a deep neural network with multiple layers interspersed with dropout layers. Batch normalisation is not used.

The problem occurs when I load a trained model and set model.eval(). This causes the performance to drop significantly. From roughly 0.1 percent error on the training set to 20-30 percent on the test set. If I refrain from setting model.eval(), the performance on the test set is comparable to the performance on the training set.

I have experimented a bit, and it seems to have something to do with how the layers are normalised during dropout. If I shift the p-value of the dropout layer just a tiny bit for the trained model (without setting model.eval()), the increase in error is small. If I turn it all the way to zero the error is similar to when I set model.eval(), roughly 20-30 percent.

Anyone have an idea of how to force the model to normalise properly when using model.eval()? Since my metrics are good without setting model.eval() the model is clearly well-trained. Is it even an acceptable strategy to predict without first setting model.eval()?

At this stage, the dropout layers are not cruicial, but my data is continuously generated and at some stage I suspect dropouts layers may actually be beneficial or even necessary in order to succeed.

Thanks in advance