Best practice for RNN hidden units

laoreja · February 23, 2018, 11:21am

Since I want the RNN hidden units can be updated as cuda Tensor or cast to other Tensor as the model by just calling model.cuda() or model.double(), what’s the best practice for initializing the RNN hidden units? Currently I’m using

def init(self, args):
…
self.hidden = self.init_hidden()

def init_hidden(self):
h0 = torch.zeros(self.num_RNN_layers * self.times, self.batch_size, self.hidden_dim)
h0 = h0.cuda()
return autograd.Variable(h0)

Should I use nn.register_buffer() or nn.register_parameter(), or is there any other good practice?
(they are not parameter to me. the example given by nn.register_buffer() is running_mean in bn layers, I’m very confused.)

Since I read that sometimes people made hidden units trainable, I want to reserve the right to decide during runtime whether make these hidden units require grad.

tom · February 23, 2018, 12:35pm

If you want them to be trainable, maybe using nn.Parameter is a natural idea.

Best regards

Thomas