This happens in a GRU model, specifically with the GRU layer processing the hidden input, judging by the error report.
GRU layer has two inputs: data (embedding layer, size [sequence length, batch size, embedding features]) and hidden features from the previous step, size [1, batch size, hidden features].
output, hidden = self.gru(x, hidden)
The error trace stack is
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [21, 3, 256]], which is output 0 of CudnnRnnBackward, is at version 1;
expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
The size of the float tensor points to the hidden state of the GRU. I definitely didn’t use an inplace operation on it anywhere. Other tensors and variables seem to be fine.
The error refers to the third line in the forward method, because [torch.cuda.FloatTensor [21, 3, 256]] refers to the hidden state (sequence length:21, batch size: 3, number of features: 256), not the embedding vector. Where do I modify hidden inplace?
Also, looking at the error report, I’m not sure I understand what sort of ‘version’ the exception refers to: ‘is at version 1, expected version 0’. What does this mean?
Versions are a way to track inplace operations. every time a tensor is modified inplace, its version counter is incremented by 1.
You might be modifying the output or hidden inplace after it is returned by the encoder gru. In the rest of your forward pass.
Right! It was modified (unsqueeze) before input inthe decoder, and I forgot to remove the _ inplace symbol! So given all these problem with autograd handling inplace operations, when can I actually use them? Just for the dataset manipulation?
well unsqueeze_ is definitely one of the more uself in-place operations, and if it is not allowed, I’m not sure when to use it. The one I had a problem with was not the hidden state, as it seemed from the error stack trace: it was the output of the encoder, the full history of the feedforward, size sequence_length x batch_size x hidden_state_size.