Batch Training RNNs

Thanks guys, that did it. I think I just overthought it. I think part of problem was that batch isn’t the first dimension of the input. I will definitely use batch_first=True since that feels way more natural for me. Shame I have to wait 10 more hours until I can implement it, would love to do now. Thanks again, I will post some loss curves here in case it works out. :slight_smile:

One extra tip. You can use my_var.transpose(0,1) to swap the batch and time dimensions easily.

Hey, I got it running and trained quite a bit tonight. Problem for me atm is that I have a fixed sequence length and with every new sequence the first couple of predicted values are very meh. I’ve got some ideas on how to fix this. Right now I’m training with lets say sequences [1,2] and [3,4] during the first training iteration and then [5,6] and [7,8] during the 2nd iteration. It would probably be smarter to train with [1,2] and [5,6] at first and then with [3,4], [7,8] in the 2nd iteration so I can save the hidden states and reuse them between iterations. At least it’s worth a try I suppose.

Hi @jpeg729 and all.
I had trouble with understanding whether the hidden state is being transferred between batches, and am too lazy to do the calculation so I hope one of you answers (you all must be experts by now!).

So to correspond with your code, I have this (100, f) sequence and i split it into 20 subsequences of 5, I then take 10 of these as a batch.
so my input vector is of size (seq_len=5, batch_size=10, input_len=f) right? right.
My question is this:
Does the 2nd subsequence (i.e. element number 6 in the original series) get as initial hidden and cell states the output hidden and cell of the 1st subsequence (i.e. element number 5 of my original sequence)?
If not, how can I get it to do that? I see no point in initializing the hidden state again for every subsequence…

After getting the batch training to preserve the hidden between subsequences as I have just described, moving on to the next batch I need to detach the hidden layer and plug it as the initial h in model(input, initial_h)?

Thank you for reaching this far, and may your belief system reward you for helping.
<3

@jpeg729
Hi,

I tried to implement your suggestion.

self.hidden is initialized with None.

and in the forward pass I set it to last hidden state:

forward():
    out_pkd, (h, c) = self.rnn(in_pkd, self.hidden)
    self.hidden = (h.detach(), c.detach())

But hidden seems to reinitialized to None for every batch. :thinking: