LSTM's autograd error about an inplace operation

Ralston · June 13, 2020, 12:42pm

I am trying to write biLSTM.
I set many tensors and let them all placed in tensor-matrix.
When I try to run

loss.backward()

and I get an error like

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [50, 1]], which is output 0 of SelectBackward, is at version 21; expected version 20 instead.

def lstm(index):
    global Wf_forward, Wi_forward, Wc_forward, Wo_forward
    global bf_forward, bi_forward, bc_forward, bo_forward
    global Wf_backward, Wi_backward, Wc_backward, Wo_backward
    global bf_backward, bi_backward, bc_backward, bo_backward
    global Ws_backward, Ws_forward
    h_forward = torch.randn(lstm_len + 2, Dh, 1) # I set lstm_len = 20, Dh = 50
    f_forward = torch.randn(lstm_len + 2, Dh, 1)
    i_forward = torch.randn(lstm_len + 2, Dh, 1)
    c_forward = torch.randn(lstm_len + 2, Dh, 1)
    C_forward = torch.randn(lstm_len + 2, Dh, 1)
    o_forward = torch.randn(lstm_len + 2, Dh, 1)
    for i in range(lstm_len + 1): # I set lstm_len = 20
        k_index = index - lstm_len + i
        k = i + 1
        index_k = np.array([k_index])
        _x = get_x(exercise_list, index_k, 1)
        f_forward[k] = torch.sigmoid( torch.mm(Wf_forward, torch.cat((h_forward[k-1], _x), 0)) + bf_forward )
        i_forward[k] = torch.sigmoid( torch.mm(Wi_forward, torch.cat((h_forward[k-1], _x), 0)) + bi_forward )
        c_forward[k] = torch.tanh( torch.mm(Wc_forward, torch.cat((h_forward[k-1], _x), 0)) + bc_forward )
        C_forward[k] = f_forward[k] * C_forward[k-1] + i_forward[k] * c_forward[k]
        o_forward[k] = torch.sigmoid( torch.mm(Wo_forward, torch.cat((h_forward[k-1], _x), 0)) + bo_forward )
        h_forward[k] = o_forward[k] * torch.tanh(C_forward[k])
    h_backward = torch.randn(lstm_len + 2, Dh, 1)
    f_backward = torch.randn(lstm_len + 2, Dh, 1)
    i_backward = torch.randn(lstm_len + 2, Dh, 1)
    c_backward = torch.randn(lstm_len + 2, Dh, 1)
    C_backward = torch.randn(lstm_len + 2, Dh, 1)
    o_backward = torch.randn(lstm_len + 2, Dh, 1)
    for i in range(lstm_len + 1):
        k_index = index + lstm_len - i
        k = i + 1
        index_k = np.array([k_index])
        _x = get_x(exercise_list, index_k, 1)
        f_backward[k] = torch.sigmoid(torch.mm(Wf_backward, torch.cat((h_backward[k - 1], _x), 0)) + bf_backward)
        i_backward[k] = torch.sigmoid(torch.mm(Wi_backward, torch.cat((h_backward[k - 1], _x), 0)) + bi_backward)
        c_backward[k] = torch.tanh(torch.mm(Wc_backward, torch.cat((h_backward[k - 1], _x), 0)) + bc_backward)
        C_backward[k] = f_backward[k] * C_backward[k - 1] + i_backward[k] * c_backward[k]
        o_backward[k] = torch.sigmoid(torch.mm(Wo_backward, torch.cat((h_backward[k - 1], _x), 0)) + bo_backward)
        h_backward[k] = o_backward[k] * torch.tanh(C_backward[k])
    yt = torch.softmax(torch.mm(Ws_forward, h_forward[lstm_len+1]) + torch.mm(Ws_backward, h_backward[lstm_len+1]), dim=0, dtype=torch.float32)
    loss = torch.mean(Y * torch.log(yt))
    Loss.append(loss)
    loss.backward()

I know that I’ve changed the value of several tensor-matrixes. So finally, it can’t compute the gradient. However, I use matrixes in order to store data instead of intending to change their values. How can I solve this problem?
Thanks for anybody’s response.

ptrblck · June 14, 2020, 9:41am

Do you have a copy-paste error or are you executing the for loop over lstm_len+1 twice?
If that’s not an error, then the second loop would overwrite the values, which raises this error.
Could you explain the use case why you would need to create these values twice?

Ralston · June 14, 2020, 2:37pm

I use BiLSTM so I execute the loop twice. One is the forward process and the other is the backward process. However, the parameters are different, like f_forward and f_backward.
Maybe there is a better way to write the code that I don’t know cause I’m a beginner in PyTorch.
Thanks a lot for your response.

ptrblck · June 14, 2020, 10:39pm

Ah, I missed the naming, so thanks for the information.
Could you post an executable code snippet, which yields this error (using random inputs)?