Hello! I am learning about LSTM networks and now I am training in many-to-one approach. Here is how my forward function looks like:
def forward(self, x): out0 = None for i in range(x.shape): inp = x[:, i, :] out0 = torch.relu(self.inp_fc(inp)) out0 = self.lstm(out0) out = torch.tanh(self.dense(out0)) return out
[1, samples_count, features_count]. So I am processing each sample one at a time sequentially, using
for loop. Once forward functions is done, I am doing a backward propagation using a loss function.
So my question is - Does the fact that I do not store
out0 in for loop impact backward propagation in any way? I think I might’ve asked a stupid question, but I am still struggling to understand where exactly gradients are stored