Since I want the RNN hidden units can be updated as cuda Tensor or cast to other Tensor as the model by just calling model.cuda() or model.double(), what’s the best practice for initializing the RNN hidden units? Currently I’m using
def init(self, args):
self.hidden = self.init_hidden()
h0 = torch.zeros(self.num_RNN_layers * self.times, self.batch_size, self.hidden_dim)
h0 = h0.cuda()
Should I use nn.register_buffer() or nn.register_parameter(), or is there any other good practice?
(they are not parameter to me. the example given by nn.register_buffer() is running_mean in bn layers, I’m very confused.)
Since I read that sometimes people made hidden units trainable, I want to reserve the right to decide during runtime whether make these hidden units require grad.