I have the following training loop
for epoch in range(num_epochs): hidden = model.init_hidden() for startidx in range(0, num_batches, batch_size): endidx = startidx + batch_size step = startidx//batch_size xbatch = xhot_seq[startidx:endidx] ybatch = yhot_seq[startidx:endidx] # Forward pass # Clear stored gradient model.zero_grad() y_pred, hidden = model(xbatch, hidden) target = torch.argmax(ybatch.long(),dim=1) loss = loss_fn(y_pred, target ) loss_hist[epoch] = loss.item() # Backward pass loss.backward() # Update parameters optimiser.step()
def forward(self, input, hiddenState): (h_state,c_state) = hiddenState lstm_out, (h_state,c_state) = self.lstm(input, (h_state,c_state)) out = self.linear(lstm_out[:,-1,:]) y_pred = self.softmax(out) return y_pred, (h_state,c_state)
my input has the shape of [batch size, seq length, num of features]
Now does backpropagation consider hidden state between sequence to sequence inside the mini-batch, i.e if my batch size is 16, in the last input sequence will the model backpropagate through sequence length only (in my case i set seq length = 40) or through my sequence length and the previous 15 sequences.
Note that i’m restting the hidden state each epoch. Also what should i do to make my model statefull ?
should i never reset the hidden states ?