LSTM learning with batches seems to ignore hidden state


after playing around with lstm i realized the following:

If I train the network with a fresh initialized hidden state for each batch, I will not be able to feed the network single timesteps with the hidden state of the previous timestep during testing. The test results are getting worse in that case. But if i feed sequences and a fresh hidden state during testing, the results are as expected.

Does that mean the lstm did just learn to ignore ALL hidden states and will also not learn the hidden state “sequence internally”?

Is there a good way to fix this “issue” with having batches at the same time? Organizing the batches in the natural order (kind of) seems to help, but I am not sure if this is the correct approach.



ok, so i guess my assumption is correct and initializing the hidden layer for each batch will cause the network to become stateless.

so to have the network statful, i either need to order the batches appropriately or “get rid” of batches (eg batch size 1). if somebody has an alternative solution, i would much appreciate it

here a some informative resources i found on this topic: