I’m doing NLP sentence classification and for each epoch we have a batch of sentences and I call
hidden = repackage_hidden(hidden)
after each batch to clear the variable history.
My question is should I also call
hidden = net.init_hidden(batch_size)
after every batch? Meaning every batch of sentences will see a zero hidden state each time, or let the hidden that was learned from the previous batch be used as an input on the next one?
One epoch is just one pass of all the sentences in the training set. Each epoch is compromised of batches not one batch, didn’t write it clearly. Thanks for the answer, I get it!
Could be wrong but I think that’s only when there is no ordering between subsequent sequences so there’s no good reason one would want to preserve the state.
In a Language model for you inithidden at the start of the epoch only and instead just detach the graph (‘repackage_hidden’) at the start of each sequence to make you dont backprop between batches. In keras I think this would be equivalent to passing stateful=True as a parameter to the LSTM. (can someone please confirm this though?).
According to my understanding, calling init_hidden() once every training epoch should do the trick, however it (hidden weights) must be updated for every sentence, so that updated weights would be used and they wouldn’t be changed to zero for every sentence as “init_hidden()” initializes weights to zero.
Also it should be kept in mind that only the latest set of weights be retained and not all of them. We wouldn’t want to keep all of the training weights.
So, according to me,
for every batch because, the hidden state after a batch pass contains information about the whole previous batch. At test time you’d only have a new hidden state for every sentence so you probably want to train for that.
No because the previous hidden state contains information about the sentences on that batch. On test time you would have hidden state of zero on a new sentence every time. So you want your network to learn around that