I am developing a TSC classifier based on LSTM with a fully connected layer, below a portion of the code
def forward(self, x): if not self.get_is_stateful(): # stateless, reset hidden state at each forward pass h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(self.get_device()) c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(self.get_device()) out_lstm, _ = self.lstm(x, (h0,c0)) else: # stateful, keep hidden state between forward passes h0 = self.hidden_state.detach().clone() c0 = self.cell_state.detach().clone() out_lstm, (self.hidden_state, self.cell_state) = self.lstm(x, (h0, c0)) out = self.fc(out_lstm[:, -1, :]) if self.get_batch_first() else self.fc(out_lstm[-1, :, :])
What happens if I train the model with the whole sequence instead of passing the input timestamp per timestamp? From the documentation, I understand the output will have one dimension equal to the length of the sequence, but in this case: 1) the forward method would be called for each timestamp sequentially (and as a consequence hidden state will be the input of the next timestamp) or in parallel? 2)The only thing that I am quite sure is that there is only one model and not lenght(sequence) model in parallel