Hello All,

I have the following training loop

```
for epoch in range(num_epochs):
hidden = model.init_hidden()
for startidx in range(0, num_batches, batch_size):
endidx = startidx + batch_size
step = startidx//batch_size
xbatch = xhot_seq[startidx:endidx]
ybatch = yhot_seq[startidx:endidx]
# Forward pass
# Clear stored gradient
model.zero_grad()
y_pred, hidden = model(xbatch, hidden)
target = torch.argmax(ybatch.long(),dim=1)
loss = loss_fn(y_pred, target )
loss_hist[epoch] = loss.item()
# Backward pass
loss.backward()
# Update parameters
optimiser.step()
```

```
def forward(self, input, hiddenState):
(h_state,c_state) = hiddenState
lstm_out, (h_state,c_state) = self.lstm(input, (h_state,c_state))
out = self.linear(lstm_out[:,-1,:])
y_pred = self.softmax(out)
return y_pred, (h_state,c_state)
```

my input has the shape of [batch size, seq length, num of features]

Now does backpropagation consider hidden state between sequence to sequence inside the mini-batch, i.e if my batch size is 16, in the last input sequence will the model backpropagate through sequence length only (in my case i set seq length = 40) or through my sequence length and the previous 15 sequences.

Note that i’m restting the hidden state each epoch. Also what should i do to make my model statefull ?

should i never reset the hidden states ?