Hello! I am learning about LSTM networks and now I am training in many-to-one approach. Here is how my forward function looks like:
def forward(self, x):
out0 = None
for i in range(x.shape[1]):
inp = x[:, i, :]
out0 = torch.relu(self.inp_fc(inp))
out0 = self.lstm(out0)
out = torch.tanh(self.dense(out0))
return out
x is [1, samples_count, features_count]
. So I am processing each sample one at a time sequentially, using for
loop. Once forward functions is done, I am doing a backward propagation using a loss function.
So my question is - Does the fact that I do not store out0
in for loop impact backward propagation in any way? I think I might’ve asked a stupid question, but I am still struggling to understand where exactly gradients are stored