Proper way to initialize h_0 c_0

bfeeny · June 28, 2020, 9:25am

I realize that the default in nn.LSTM is it will automatically initialize h_0 and c_0 states if it has to with zeros. I am using zeros below as an example. I often see people initialize like so:

    def init_hidden(self):
        self.h_0 = torch.zeros(self.num_layers, self.batch_size, self.hidden_size).to(device)
        self.c_0 = torch.zeros(self.num_layers, self.batch_size, self.hidden_size).to(device)

then call lstm.init_hidden() before they start training.

The issue I have seen with this, is that if you then goto do a prediction (eval) and pass it something less than batch_size, say a single batch, it errors. Example, lstm was trained with batch_size = 20 and now try to pass it a single prediction:

RuntimeError: Expected hidden[0] size (3, 1, 128), got (3, 20, 128)

I have also seen people initialize these in the forward function like so:

    def forward(self, x):
         self.h_0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)
         self.c_0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)

So this is being called on every forward step, and instead of batch_size it’s using x.size(0). But which way is correct? How do you make sure that if you train with say a batch_size=20 that you can pass a single data to get a prediction?