Is JIT tracing modules containing dropout or batch normalization submodules a bad idea?

The torch.jit documentation states that one of the limitations of tracing is that calls which differ based on whether the model is in train or eval mode will only ever use whatever mode the module was in at trace time. Specifically:

  • In the returned ScriptModule, operations that have different behaviors in training and eval modes will always behave as if it is in the mode it was in during tracing, no matter which mode the ScriptModule is in.

I know of some commonly used layer types have train / eval behavioral differences; BatchNorm2d and Dropout come to mind. Does this mean that tracing is A Bad Idea for modules making use of these two layers (and others like it?).

If might be a bad idea, if you would like to switch the behavior of the traced model.
In that case you could try to script your model.