I build a custom LSTM model (it’s quite large), and somewhere in there there is an inplace operation which gives me an error during training.
error:
File "c:\Users\jobei\Desktop\scriptie msc\code\models\train_model.py", line 46, in closure
loss.backward()
File "C:\Users\jobei\anaconda3\envs\machinelearning\lib\site-packages\torch\_tensor.py", line 396, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "C:\Users\jobei\anaconda3\envs\machinelearning\lib\site-packages\torch\autograd\__init__.py", line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [16, 16]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
I’ve already set torch.autograd.set_detect_anomaly(True), but it doesn’t really give me any useful info. Just trying out every small part of the model, but this takes a long time. Hope there is something better I can do.
The only 16x16 tensors in the model are the weights of the gate networks if I’m correct, but I never touch those myself as far as I’m aware.
this is the github link to the files if anyone is interested:
training is where I put the training loops.
simple_linear_networks contains the LSTM gate networks etc,
hawkes loss is a custom loss function I need for this network
sd_PNHP is the network itself. it is a stacked continuous time lstm with two cell states per cell.
it consists of a cellstate class, a class that stacks 4 cellstates onto eachother and finally a layer class that loops through these stacked cells.