Backward propagation in many-to-one LSTM forward operation

Eugene_Alexeev · May 11, 2023, 7:49am

Hello! I am learning about LSTM networks and now I am training in many-to-one approach. Here is how my forward function looks like:

def forward(self, x):

        out0 = None
        for i in range(x.shape[1]):
            inp = x[:, i, :]

            out0 = torch.relu(self.inp_fc(inp))
            out0 = self.lstm(out0)

        out = torch.tanh(self.dense(out0))

        return out

x is [1, samples_count, features_count]. So I am processing each sample one at a time sequentially, using for loop. Once forward functions is done, I am doing a backward propagation using a loss function.

So my question is - Does the fact that I do not store out0 in for loop impact backward propagation in any way? I think I might’ve asked a stupid question, but I am still struggling to understand where exactly gradients are stored