How to handle BPTT with mini-batches

Mohamed_Nassar · January 16, 2020, 8:47pm

Hello All,
I have the following training loop

for epoch in range(num_epochs): 
  hidden = model.init_hidden()
  for startidx in range(0, num_batches, batch_size):
    endidx = startidx + batch_size
    step = startidx//batch_size
    xbatch = xhot_seq[startidx:endidx]
    ybatch = yhot_seq[startidx:endidx]

    # Forward pass

    # Clear stored gradient
    model.zero_grad()
    
    y_pred, hidden = model(xbatch, hidden)
    target = torch.argmax(ybatch.long(),dim=1)
    loss = loss_fn(y_pred, target )
    loss_hist[epoch] = loss.item()
    # Backward pass
    loss.backward()
    # Update parameters
    optimiser.step()

  def forward(self, input, hiddenState):
    (h_state,c_state) = hiddenState

    lstm_out, (h_state,c_state) = self.lstm(input, (h_state,c_state))

    out = self.linear(lstm_out[:,-1,:])
    y_pred = self.softmax(out)

    return y_pred, (h_state,c_state)

my input has the shape of [batch size, seq length, num of features]

Now does backpropagation consider hidden state between sequence to sequence inside the mini-batch, i.e if my batch size is 16, in the last input sequence will the model backpropagate through sequence length only (in my case i set seq length = 40) or through my sequence length and the previous 15 sequences.
Note that i’m restting the hidden state each epoch. Also what should i do to make my model statefull ?
should i never reset the hidden states ?