I’m running into the following error:
Cudnn RNN backward can only be called in training mode
The first epoch trains fine, and the error only occurs on the 2nd epoch. I’ve made sure to call
model.train() prior to generating the hidden layer inputs, calculating loss, and before
model.training displays as
True when I check it. If it’s useful I have to use
What other reasons might be causing this error? Are there other items I need to make sure I run
Chances are high that there maybe a
model.eval while computing validation loss running in between epochs (perhaps based on updates rather than epochs) which could be doing this?
In my epoch loop I have train and validate as separate function calls. In the train function I call
model.train() at the start, and at the start of the validate function I call
model.eval(). I thought this would automatically handle making the model trainable again…however when I comment out the validate function (so the model only trains) it actually seems to work.
Do I need to run
model.train() somewhere again?
Thanks yuanzhoulvpi and jerinphilip for your input.
I was using a custom loss function and as it turns out this was the issue - data I was tracking was not being properly detached and this somehow affected the model being able to be fully put back into training mode.
All fixed now!