When to call init_hidden() for RNN

I’m doing NLP sentence classification and for each epoch we have a batch of sentences and I call

hidden = repackage_hidden(hidden)

after each batch to clear the variable history.

My question is should I also call

hidden = net.init_hidden(batch_size)

after every batch? Meaning every batch of sentences will see a zero hidden state each time, or let the hidden that was learned from the previous batch be used as an input on the next one?

I’m not sure what you mean by “epoch”, but you should init_hidden for each beginning of sentence.

One epoch is just one pass of all the sentences in the training set. Each epoch is compromised of batches not one batch, didn’t write it clearly. Thanks for the answer, I get it!

1 Like

Could be wrong but I think that’s only when there is no ordering between subsequent sequences so there’s no good reason one would want to preserve the state.

In a Language model for you inithidden at the start of the epoch only and instead just detach the graph (‘repackage_hidden’) at the start of each sequence to make you dont backprop between batches. In keras I think this would be equivalent to passing stateful=True as a parameter to the LSTM. (can someone please confirm this though?).

You are correct! I should have been clearer and say init_hidden at beginning of each training sequence.

1 Like

According to my understanding, calling init_hidden() once every training epoch should do the trick, however it (hidden weights) must be updated for every sentence, so that updated weights would be used and they wouldn’t be changed to zero for every sentence as “init_hidden()” initializes weights to zero.
Also it should be kept in mind that only the latest set of weights be retained and not all of them. We wouldn’t want to keep all of the training weights.
So, according to me,

This can be used after every batch, but

This should only be used once an epoch,

I think you should call

hidden = net.init_hidden(batch_size)

for every batch because, the hidden state after a batch pass contains information about the whole previous batch. At test time you’d only have a new hidden state for every sentence so you probably want to train for that.

1 Like

Okay, but using init_hidden(), wouldn’t the weights be 0 for start of every batch, does it give more accuracy using init_hidden() for every batch ?

I’ll try it out once.

Hidden state is not the weights but just an input

Okay, I get what you mean, it’s about the input and not about the weight, but

would initialize the input state to all zeros, isn’t it ?
So, shouldn’t hidden rather be equal to previous batch’s hidden state ?

No because the previous hidden state contains information about the sentences on that batch. On test time you would have hidden state of zero on a new sentence every time. So you want your network to learn around that