This initialization let me confused. Why I need weight.new two times for GPU & 2 times for CPU?

    def init_hidden(self, batch_size):
    ''' Initializes hidden state '''
    # Create two new tensors with sizes n_layers x batch_size x n_hidden,
    # initialized to zero, for hidden state and cell state of LSTM
    weight = next(self.parameters()).data
    
        if (train_on_gpu):
            hidden = (weight.new(self.n_layers, batch_size, self.n_hidden).zero_().cuda(),
              weight.new(self.n_layers, batch_size, self.n_hidden).zero_().cuda())
        else:
             hidden = (weight.new(self.n_layers, batch_size, self.n_hidden).zero_(),
                  weight.new(self.n_layers, batch_size, self.n_hidden).zero_())
    
    return hidden

@ptrblck
I hope you can help, Thanks.

The code snippet shows different initializations for CPU and CUDA tensors, not only for the GPU.
Alternatively to using this condition, you could also write device-agnostic code and call .to(device) on the newly created tensors.

Okay that is okay.

But why I need to create tuples with 2 elements?

hidden = (weight.new(self.n_layers, batch_size, self.n_hidden).zero_().cuda(),
              weight.new(self.n_layers, batch_size, self.n_hidden).zero_().cuda())

You are creating one tuple containing two tensors and are most likely passing them to the nn.LSTM module as h0 and c0 (hidden and cell state) as described in the docs

1 Like