I need to back-propagate some gradients at eval time (to perform sensitivity analysis for explanation generation). However, CuDNN does not support back-prop except in train mode.
This creates an awkward dilemma - if I try to stick with running the model in eval mode then I can only do the explanation generation on CPU. Conversely, if I run the model in train mode I have to worry about the other semantic impacts that has - in particular the fact that it will be interpreted by
For now I have (not very nice) work-arounds such as:
- Hack my model loader to set all dropout values to 0 on load, since I (happen to) know I’m only going to be using it for prediction
- In the master
Moduleclass for my model, add a call (say
allow_back_prop) that calls
train()on the RNN sub-modules only, so that CuDNN sees its components running in train mode even though the overall model is in eval mode (I think this will work, right?)
However, what I really need (absent a relaxation in the constraint CuDNN is imposing) is for PyTorch to separate different aspect of
eval semantics with either separate methods or extra optional parameters to the
If anyone has any other thoughts/recommendations on this I’d love to discuss.