The torch.jit documentation states that one of the limitations of tracing is that calls which differ based on whether the model is in train
or eval
mode will only ever use whatever mode the module was in at trace time. Specifically:
- In the returned
ScriptModule
, operations that have different behaviors intraining
andeval
modes will always behave as if it is in the mode it was in during tracing, no matter which mode the ScriptModule is in.
I know of some commonly used layer types have train
/ eval
behavioral differences; BatchNorm2d
and Dropout
come to mind. Does this mean that tracing is A Bad Idea for modules making use of these two layers (and others like it?).