after playing around with lstm i realized the following:
If I train the network with a fresh initialized hidden state for each batch, I will not be able to feed the network single timesteps with the hidden state of the previous timestep during testing. The test results are getting worse in that case. But if i feed sequences and a fresh hidden state during testing, the results are as expected.
Does that mean the lstm did just learn to ignore ALL hidden states and will also not learn the hidden state “sequence internally”?
Is there a good way to fix this “issue” with having batches at the same time? Organizing the batches in the natural order (kind of) seems to help, but I am not sure if this is the correct approach.