Batch prediction for a model

I have a LSTM model trained for a batch size = 512. This means that 512 hidden states are initialized for each sample in the batch. Now during prediction if I give a batch_size=100 it throws an inconsistent size error probably because it cannot initialize the hidden states for the 100 samples since we have already set it in init_hidden. It does not throw an error if I pass each sample in the batch as one sample to the model during prediction. The output in such a case is broadcasted to the batch_size. So my question is whether my understanding is correct? Cant I do a batch prediction for a model?


1 Like

Can you post the code for your model?

class CharLevelLanguageModel(torch.nn.Module):
    def __init__(self,vocab_size,emb_dim,hidden_dim,batch_size):
        self.embedddings = torch.nn.Embedding(vocab_size,emb_dim,padding_idx=0)
        self.lstm = torch.nn.LSTM(emb_dim,hidden_dim,1,batch_first=True)
        self.linear = torch.nn.Linear(hidden_dim,vocab_size)
        self.batch_size = batch_size
        self.hidden_dim = hidden_dim
        self.hidden_state = self.init_hidden()

    def init_hidden(self):
        return (Variable(torch.zeros(1,self.batch_size,self.hidden_dim)),Variable(torch.zeros(1,self.batch_size,self.hidden_dim)))
    def forward(self,x):
        embeds = self.embedddings(x)
        output,self.hidden_state = self.lstm(embeds,self.hidden_state)
        return F.log_softmax(self.linear(F.tanh(output[:,-1,:])))
train = data_utils.TensorDataset(torch.LongTensor(dataX),torch.LongTensor(dataY))
train_loader = data_utils.DataLoader(train,batch_size=100,drop_last=True)
model = CharLevelLanguageModel(len(char_to_id)+1,100,50,100)
criterion = torch.nn.NLLLoss()
optimizer = torch.optim.Adam(model.parameters())
losses = []
for i in range(3):
    total_loss = torch.FloatTensor([0])
    for batch_idx,train in enumerate(train_loader):
        model.hidden_state = model.init_hidden()
        x,y = Variable(train[0]),Variable(train[1])
        y_pred = model(x)
        loss = criterion(y_pred,y)

Above is the training code.

testX = dataX[0:2]
testVar = Variable(torch.LongTensor(testX))

The above code throws inconsistent size error.

But if I run the below code:

testX = dataX[0]
testVar = Variable(torch.LongTensor(testX))

It executes successfully but it returns a size of (batch_size*output_labels). Each row is repeated batch_size number of times.

Did you reset the hidden state after training? You might need to do this before eval.

model.hidden_state = model.init_hidden()

Oh I have not set that. I will do it and check. Was that the reason?

Still getting the same error but thanks will keep in mind to set the hidden state before eval

My mistake, you also have to set the correct new batch size.

model.batch_size = test_batch_size
model.hidden_state = model.init_hidden()

If you don’t know why you need to do that then you know little about how an LSTM works.

Wrong. Using a batch size of 512 simply means that you feed the model with 512 samples in parallel. It is faster to do 512 calculations in one go, than to do 512 calculations one after the other. There are hidden_dim hidden states initialised per sample.

Once you have trained the model using a batch size of 512, the resulting trained model is valid for any batch size. You just have to reinitialise the hidden state for the new batch size.

The hidden state is the model’s “short-term memory”, so each time you give your model a batch of data, then unless the new batch of data contains the continuation of the sequences in the previous batch, then you should reset the hidden state.


Thank you very much. I get it now. In parallel I am training keras model with almost same hyperparameters but I observe that keras model runs faster and is giving me better results. Pytorch model is outputting the same character again and again. Both the models have been trained for the same number of epochs. That is why I am doubting the correctness of my model in pytorch. I will incorporate all the changes you suggested and will retrain. Does the model build look good otherwise or am I still missing something? BTW heres the keras code:

seq_len = len(dataX[0]) 
input_layer = Input(shape=(seq_len,))
embedding_layer = Embedding(len(char_to_id)+1,100,input_length=seq_len)
embeds = embedding_layer(input_layer)
lstm_layer = LSTM(50,input_shape=(seq_len,100),return_sequences=False)
lstm_out = lstm_layer(embeds)
dense_layer = Dense(len(char_to_id),activation='softmax')
dense_out = dense_layer(lstm_out)
model = Model(input_layer,dense_out)

Did the retraining help?

I can see nothing obviously wrong with your code, but I am not especially familiar with embeddings and NLLLosses.

That said, it might help to have some idea what shape dataX and dataY have and how the data is arranged inside those arrays.

No still getting bad results. dataX is of shape (163717,100) (samples,timesteps) and each element is a word_index. dataY is of 163717. So the arrangement is keras based i.e batches,timesteps,embedding_size that is why i gave batch_size=True.


I would like to challenge that assumption, since you use the pytorch DataLoader to produce the batches. So you ought to check whether it outputs batches,timesteps,embedding_size or timesteps,batches,embedding_size.

The output shape of dataloader is (512,100)

I’m not sure you understood what I meant. You use the dataloader like this…

train_loader = data_utils.DataLoader(train,batch_size=100,drop_last=True)
for batch_idx,train in enumerate(train_loader):
    do training stuff

What shape is train?

train = data_utils.TensorDataset(torch.LongTensor(dataX),torch.LongTensor(dataY))
train_loader = data_utils.DataLoader(train,batch_size=512,drop_last=True)


< at 0x131880550>

train.data_tensor is LongTensor of size : (163717x100) which I feed into a DataLoader to get batches of size 512.

Not what I wanted.

train = data_utils.TensorDataset(torch.LongTensor(dataX),torch.LongTensor(dataY))
train_loader = data_utils.DataLoader(train,batch_size=512,drop_last=True)
for batch_idx,train in enumerate(train_loader):

Since train here is a tuple the shape of train[0] is torch.Size([512, 100]) and train[1] is torch.Size([512]) .

I am definitely missing something here.
I was expecting the input to the model to have 3 dimensions, but it only has 2.

So train[0] has shape (batch_size, timesteps)
embeds has shape (batch_size, timesteps, emb_dim) which looks OK.

I really can’t see any reason why this would perform less well than the keras version.

Finally figured it out.
return F.log_softmax(self.linear(F.tanh(output[:,-1,:]))) . In this snippet F.tanh was not necessary since it happens in each of the lstm cell implementation by default. I think that is where I was going wrong. Removed it and I am able to get equivalent results as in keras. Do correct me if I am wrong.

Thank you very much for your help.

1 Like