Does torch.jit.script requires no_grad and eval modes?

My goal is to export a model for fast inference.
When I trace a model with jit.trace I usually do it this way:

with torch.no_grad():
    input = torch.rand(size=(1, 3, 500, 500))
    traced_cell = torch.jit.trace("cpu"), (input))

It is not clear to me if model.eval() and with torch.no_grad() are required (or still suggested) when exporting a model with jit.script.
Thank you!

1 Like

model.eval() will change the behavior of some modules (e.g. dropout layers will be disabled and batchnorm layers will use their running stats to normalize the data). torch.jit.trace does not capture any data-dependent control flow, i.e. the code path used by the input will only be captured and other inputs won’t take a different path based on e.g. if statements etc. Given that, it sounds right to use model.eval() before tracing (otherwise the dropout layer would be used with the same mask in each forward pass). I don’t know if disabling the gradient calculation is needed during tracing or could also be added later during the inference. In any case, you might also want to check torch.autograd.inference_mode() for your model deployment.

Thank you @ptrblck! How about jit.script?

torch.jit.script is able to capture data-dependent control flow (e.g. the dropout masks would be randomly sampled in each step in case you leave the model in .train() mode). However, the common use case during inference is to use the .eval() mode, so you might call it nevertheless even if scripting the model. The advantage would be that your model could use other data-dependent control flow, such as conditions based on the shape of the input etc.

Thank you. Should I call .eval() before jit.script or after it? What about torch.autograd.inference_mode? I don’t understcand if the results of these operations “are saved” by jit.script.

jit.script should not capture training/eval mode or the no_grad() context, so you should be able to script the model and call .eval() as well as inference_mode/no_grad during the deployment.
However, if you are seeing any issue with this, please let us know.

1 Like

Thank you @ptrblck . My question is actually the following: is there any benefit on calling a .eval() and enable no_grad on torch script model? Is it something I should after I load a file in order to gain performance, or is it useless?

Yes, if you are deploying call inference_mode/no_grad for performance gains.
model.eval() will change the behavior of some layers, such as disabling dropout and using the running stats of batchnorm layers so it’s not a performance (speed) improvement, but used during the evaluation/test phase of the model.

Hello @ptrblck ! I have a follow up question: I have been recently fine tuned an efficientnet_b0 model. If, after training the model, I do inference using the model without calling model.eval() first, the output of the model is completely wrong. Is this expected? Thank you!

If you are not calling model.eval() e.g. all batchnorm layers will use the input batch stats to normalize the activation as described before, which might be noisy depending on the batch size. Also, dropout would be enabled but I wouldn’t expect to see a huge difference to the training run (besides the noisy updates of course).