Changing input for a recurrent training fails for inplace operation

panegri · February 5, 2021, 5:07pm

Hi all,
I’m trying to implement a custom LSTM based on the code from GitHub - piEsposito/pytorch-lstm-by-hand: A small and simple tutorial on how to craft a LSTM nn.Module by hand on PyTorch..
As part of the training, I want to use the predictions as inputs in the middle of the sequence. Thus, first there is a warm up to update hidden states, then the input changes to the predictions.
Another characteristic of the learning is that the input has 8 input size, and the output_size is 5 elements. When I use the predictions on the training, I only change the last 5 columns and fix the first 3 from the original input.

The code raises the following error

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [4, 8]] is at version 4; expected version 3 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

My python code to recreate this error is:

import torch
from torch.nn.functional import l1_loss
import torch.nn as nn
from itertools import chain

class Net(nn.Module):
    def __init__(self, input_sz, hidden_sz):
        super().__init__()
        self.W = nn.Linear(input_sz, hidden_sz * 4, bias=False)
        self.U = nn.Linear(hidden_sz, hidden_sz * 4, bias=False)
        self.bias = nn.Parameter(torch.Tensor(hidden_sz * 4))                
        self.hidden_size = hidden_sz
    
    def forward(self, x,h_t=None, c_t=None):         
        HS = self.hidden_size
        gates = self.W(x) + self.U(h_t) + self.bias
        o_t = torch.sigmoid(gates[:, HS*3:])        
        h_t = torch.mul(o_t, c_t)
        return h_t, c_t

hidden_size = 3
batch_size = 4
input_size = 8
output_size = 5
seqlen = 10
predtmp = 4

net = Net(input_size, hidden_size)
fc = nn.Linear(hidden_size, output_size, bias=True)

h_t, c_t = (torch.zeros(batch_size, hidden_size), 
        torch.zeros(batch_size, hidden_size))

x = torch.rand(batch_size,seqlen,input_size)
y = torch.rand(batch_size, output_size)

for t in range(seqlen):
    xt = x[:,t,:].squeeze()
    # this works
    # h_t, c_t = net(xt, h_t, c_t)
    # out = fc(h_t)
    # this does not work
    if t < seqlen-predtmp:
        h_t, c_t = net(xt, h_t, c_t)
        out = fc(h_t)
    else:
        xt[:,3:] = out.clone().detach()
        h_t, c_t = net(xt, h_t, c_t)
        out = fc(h_t)

loss = l1_loss(out, y)
loss.backward()

I do not understand what happens.

Any help will be greatly appreciated!!

Pablo

panegri · February 8, 2021, 1:16pm

Hi, all,
Well, I propose this solution, which it not as “elegant” as I would like.

In evaluation mode, I’m copying the output of the recurrent iterations on a temporal tensor. Then, I use this tensor as input for training…

# copy on _x the outputs of the recurrent evaluation
_x = x.clone().detach()
net.eval()
with torch.no_grad():
    for t in range(seqlen):
        xt = x[:,t,:].clone().detach().squeeze()
        if t < seqlen-predtmp:
            h_t, c_t = net(xt, h_t, c_t)
            out = fc(h_t)
        else:
            dout = out.clone().detach()
            _x[:,t,3:] = dout
            xt[:,3:] = dout
            h_t, c_t = net(xt, h_t, c_t)
            out = fc(h_t)

h_t, c_t = (torch.zeros(batch_size, hidden_size), 
        torch.zeros(batch_size, hidden_size))
net.train()
for t in range(seqlen):
    xt = _x[:,t,:].clone().detach().squeeze()
    h_t, c_t = net(xt, h_t, c_t)
    out = fc(h_t)

loss = l1_loss(out, y)
loss.backward()

I hope this help to someone in the same situation.
Bye!!