Hi everyone,
I’ve started using Pytorch and I really love it. However, I was wondering how to correctly use hidden states in a LSTM or GRU networks.
From what I understood from the tutorial, before each sample, we should reinitialize the hidden states (as well as cell states in LSTM).
Let’s suppose I have:
if self.mode == 'GRU':
self.document_rnn = nn.GRU(embedding_size, embedding_size, num_layers=self.nb_layers, bias=True, dropout=self.dropout, bidirectional=False, batch_first=True)
elif self.mode == 'LSTM':
self.document_rnn = nn.LSTM(embedding_size, embedding_size, num_layers=self.nb_layers, bias=True, dropout=self.dropout, bidirectional=False, batch_first=True)
self.document_rnn_hidden = self.init_hidden()
and
def init_hidden(self):
document_rnn_init_h = nn.Parameter(nn.init.xavier_uniform(torch.Tensor(self.nb_layers, self.batch_size, self.embedding_size).type(torch.FloatTensor)), requires_grad=True)
if self.mode == 'GRU':
return document_rnn_init_h
elif self.mode == 'LSTM':
document_rnn_init_c = nn.Parameter(nn.init.xavier_uniform(torch.Tensor(self.nb_layers, self.batch_size, self.embedding_size).type(torch.FloatTensor)), requires_grad=True)
return (document_rnn_init_h, document_rnn_init_c)
Is it correct to do something like this ?
for epoch in range(nb_epochs):
for sample in samples():
model.train(mode=True)
optimizer.zero_grad()
model.document_rnn_hidden = model.init_hidden()
.... = model(xxx)
loss = ...
loss.backward()
torch.nn.utils.clip_grad_norm(model.parameters(), args.gradient_clipping)
optimizer.step()
I’ve seen this here: http://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html
But I’m confused because they don’t reinitialize the hidden states after training. Why ?
Thank you very much for your help !