Can pytorch track gradient like this?

Cho · May 16, 2019, 10:30am

In my forward function, I resize and concatenate 2 output from 2 LSTM, then I put this concatenated tensor into a connected network with Relu and Sigmoid. I am wondering whether Pytotch is able to track gradient after I resize and concatenate the output from 2 LSTM. Since even the loss is decreasing after back propagation, but the accuracy on the test set is not improving.

        output_allhiden_1,(final_hidden_state_1,final_cell_state_1)=self.LSTM_1(input_1)
        output_allhiden_2,(final_hidden_state_2,final_cell_state_2)=self.LSTM_2(input_2)
        final_hidden_state_1=final_hidden_state_1.view(num_layers,batchsize,-1)
        final_hidden_state_2=final_hidden_state_2.view(num_layers,batchsize,-1)
        concat=torch.cat((final_hidden_state_1[num_layers-1],final_hidden_state_2[num_layers-1]),dim=1)
        output_p=self.output1(concat)
        tanhoutput=self.relu(output_p)
        output_p2=self.output2(tanhoutput)
        sigput_p2=self.sig(output_p2)

leowalkling · May 19, 2019, 8:58pm

torch.cat and index_select are definitely not blocking your gradients, what you’re doing in the methods not shown could, however.
Btw, output_allhiden_1 will be identical to the hidden state of the last layer, and the layout of the hidden state should rather be treated as implementation detail of LSTM).

You can easily test this yourself:

import torch
from torch import nn
from torch import autograd as ta

self = type("Net", (torch.nn.Module,), {})()
self.LSTM_1 = nn.LSTM(3, 40, 2)
self.LSTM_2 = nn.LSTM(3, 40, 2)
input_1 = torch.randn(5, 1, 3)
input_2 = torch.randn(5, 1, 3)
output_allhiden_1, (final_hidden_state_1, final_cell_state_1) = self.LSTM_1(input_1)
output_allhiden_2, (final_hidden_state_2, final_cell_state_2) = self.LSTM_2(input_2)
# No need to reshape, shape should already be (num_layers, 1, hidden_size)
# final_hidden_state_1 = final_hidden_state_1.view(num_layers, 1, -1)
# final_hidden_state_2 = final_hidden_state_2.view(num_layers, 1, -1)
assert (output_allhiden_1[-1] == final_hidden_state_1[-1]).all().item()
concat = torch.cat((output_allhiden_1[-1], output_allhiden_2[-1]), dim=1)
concat.sum().backward()
for n, t in self.named_parameters():
    if t.grad is not None:
        print(n, ": ", t.grad.var().item())