def init_hidden(self, batch_size):
''' Initializes hidden state '''
# Create two new tensors with sizes n_layers x batch_size x n_hidden,
# initialized to zero, for hidden state and cell state of LSTM
weight = next(self.parameters()).data
if (train_on_gpu):
hidden = (weight.new(self.n_layers, batch_size, self.n_hidden).zero_().cuda(),
weight.new(self.n_layers, batch_size, self.n_hidden).zero_().cuda())
else:
hidden = (weight.new(self.n_layers, batch_size, self.n_hidden).zero_(),
weight.new(self.n_layers, batch_size, self.n_hidden).zero_())
return hidden
The code snippet shows different initializations for CPU and CUDA tensors, not only for the GPU.
Alternatively to using this condition, you could also write device-agnostic code and call .to(device)
on the newly created tensors.
Okay that is okay.
But why I need to create tuples with 2 elements?
hidden = (weight.new(self.n_layers, batch_size, self.n_hidden).zero_().cuda(),
weight.new(self.n_layers, batch_size, self.n_hidden).zero_().cuda())
You are creating one tuple
containing two tensors and are most likely passing them to the nn.LSTM
module as h0
and c0
(hidden and cell state) as described in the docs
1 Like