Iteratively passing elements of different sequences through LSTM

Right so here is what I am doing:

I made a custom dataset for my DataLoader; inside the load_file function of this DataLoader I incrementally load batches of a sequence (another DataLoader w/o shuffle). For example, when DataLoader would be loading data-i, inside the dataloder I load [batch-i.1, …, batch-i.m] in a loop. Furthermore, inside the dataset I also initialize two tensors of zeros b and c. Then the pseudo code for the loop looks as follows:

Init: j = 1

Loop:

  1. _, (b, c) = LSTM(batch-i.j, (b,c))
  2. j ← j+1

while j < m

data1, _ = LSTM(batch-i,m, (b,c))

So hidden states get passed iteratively back into the LSTM and all of this is handled by one worker. What do you think?

About this loop however I did see this post Training Stateful LSTM in Pytorch cause runtime error where the accepted answer says “I think you need to detach both hiddens because the hiddens that are output from the LSTM will require grad.” Wouldn’t you say this information is counterintuitive? If we are removing the gradient states from b and c form previous iterations using .detach(), then how do the gradient states get stored throughout the entire sequence?