This happens in a GRU model, specifically with the GRU layer processing the hidden input, judging by the error report.
GRU layer has two inputs: data (embedding layer, size [sequence length, batch size, embedding features]) and hidden features from the previous step, size [1, batch size, hidden features].
output, hidden = self.gru(x, hidden)
The error trace stack is
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [21, 3, 256]], which is output 0 of CudnnRnnBackward, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
The size of the float tensor points to the hidden state of the GRU. I definitely didn’t use an inplace operation on it anywhere. Other tensors and variables seem to be fine.
So I don’t quite understand what to do here.