CuDNN issues with backprop for sensitivity analysis

Steve_Draper · August 27, 2019, 6:55pm

I need to back-propagate some gradients at eval time (to perform sensitivity analysis for explanation generation). However, CuDNN does not support back-prop except in train mode.

This creates an awkward dilemma - if I try to stick with running the model in eval mode then I can only do the explanation generation on CPU. Conversely, if I run the model in train mode I have to worry about the other semantic impacts that has - in particular the fact that it will be interpreted by Dropout modules.

For now I have (not very nice) work-arounds such as:

Hack my model loader to set all dropout values to 0 on load, since I (happen to) know I’m only going to be using it for prediction
In the master Module class for my model, add a call (say allow_back_prop) that calls train() on the RNN sub-modules only, so that CuDNN sees its components running in train mode even though the overall model is in eval mode (I think this will work, right?)

However, what I really need (absent a relaxation in the constraint CuDNN is imposing) is for PyTorch to separate different aspect of train vs eval semantics with either separate methods or extra optional parameters to the train method.

If anyone has any other thoughts/recommendations on this I’d love to discuss.