I realize that the default in nn.LSTM is it will automatically initialize h_0 and c_0 states if it has to with zeros. I am using zeros below as an example. I often see people initialize like so:
def init_hidden(self):
self.h_0 = torch.zeros(self.num_layers, self.batch_size, self.hidden_size).to(device)
self.c_0 = torch.zeros(self.num_layers, self.batch_size, self.hidden_size).to(device)
then call lstm.init_hidden()
before they start training.
The issue I have seen with this, is that if you then goto do a prediction (eval) and pass it something less than batch_size, say a single batch, it errors. Example, lstm was trained with batch_size = 20
and now try to pass it a single prediction:
RuntimeError: Expected hidden[0] size (3, 1, 128), got (3, 20, 128)
I have also seen people initialize these in the forward
function like so:
def forward(self, x):
self.h_0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)
self.c_0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)
So this is being called on every forward step, and instead of batch_size
it’s using x.size(0)
. But which way is correct? How do you make sure that if you train with say a batch_size=20
that you can pass a single data to get a prediction?