Hi all,
I’m trying to implement a custom LSTM based on the code from GitHub - piEsposito/pytorch-lstm-by-hand: A small and simple tutorial on how to craft a LSTM nn.Module by hand on PyTorch..
As part of the training, I want to use the predictions as inputs in the middle of the sequence. Thus, first there is a warm up to update hidden states, then the input changes to the predictions.
Another characteristic of the learning is that the input has 8 input size, and the output_size is 5 elements. When I use the predictions on the training, I only change the last 5 columns and fix the first 3 from the original input.
The code raises the following error
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [4, 8]] is at version 4; expected version 3 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
My python code to recreate this error is:
import torch
from torch.nn.functional import l1_loss
import torch.nn as nn
from itertools import chain
class Net(nn.Module):
def __init__(self, input_sz, hidden_sz):
super().__init__()
self.W = nn.Linear(input_sz, hidden_sz * 4, bias=False)
self.U = nn.Linear(hidden_sz, hidden_sz * 4, bias=False)
self.bias = nn.Parameter(torch.Tensor(hidden_sz * 4))
self.hidden_size = hidden_sz
def forward(self, x,h_t=None, c_t=None):
HS = self.hidden_size
gates = self.W(x) + self.U(h_t) + self.bias
o_t = torch.sigmoid(gates[:, HS*3:])
h_t = torch.mul(o_t, c_t)
return h_t, c_t
hidden_size = 3
batch_size = 4
input_size = 8
output_size = 5
seqlen = 10
predtmp = 4
net = Net(input_size, hidden_size)
fc = nn.Linear(hidden_size, output_size, bias=True)
h_t, c_t = (torch.zeros(batch_size, hidden_size),
torch.zeros(batch_size, hidden_size))
x = torch.rand(batch_size,seqlen,input_size)
y = torch.rand(batch_size, output_size)
for t in range(seqlen):
xt = x[:,t,:].squeeze()
# this works
# h_t, c_t = net(xt, h_t, c_t)
# out = fc(h_t)
# this does not work
if t < seqlen-predtmp:
h_t, c_t = net(xt, h_t, c_t)
out = fc(h_t)
else:
xt[:,3:] = out.clone().detach()
h_t, c_t = net(xt, h_t, c_t)
out = fc(h_t)
loss = l1_loss(out, y)
loss.backward()
I do not understand what happens.
Any help will be greatly appreciated!!
Pablo