Batch prediction for a model

kaushalshetty · January 11, 2018, 12:05pm

I have a LSTM model trained for a batch size = 512. This means that 512 hidden states are initialized for each sample in the batch. Now during prediction if I give a batch_size=100 it throws an inconsistent size error probably because it cannot initialize the hidden states for the 100 samples since we have already set it in init_hidden. It does not throw an error if I pass each sample in the batch as one sample to the model during prediction. The output in such a case is broadcasted to the batch_size. So my question is whether my understanding is correct? Cant I do a batch prediction for a model?

Thanks,
Kaushal

jpeg729 · January 11, 2018, 1:40pm

Can you post the code for your model?

kaushalshetty · January 11, 2018, 3:25pm

class CharLevelLanguageModel(torch.nn.Module):
    def __init__(self,vocab_size,emb_dim,hidden_dim,batch_size):
        super(CharLevelLanguageModel,self).__init__()
        self.embedddings = torch.nn.Embedding(vocab_size,emb_dim,padding_idx=0)
        self.lstm = torch.nn.LSTM(emb_dim,hidden_dim,1,batch_first=True)
        self.linear = torch.nn.Linear(hidden_dim,vocab_size)
        self.batch_size = batch_size
        self.hidden_dim = hidden_dim
        self.hidden_state = self.init_hidden()

        
        
    def init_hidden(self):
        return (Variable(torch.zeros(1,self.batch_size,self.hidden_dim)),Variable(torch.zeros(1,self.batch_size,self.hidden_dim)))
    
    def forward(self,x):
        embeds = self.embedddings(x)
        output,self.hidden_state = self.lstm(embeds,self.hidden_state)
        return F.log_softmax(self.linear(F.tanh(output[:,-1,:])))
        
train = data_utils.TensorDataset(torch.LongTensor(dataX),torch.LongTensor(dataY))
train_loader = data_utils.DataLoader(train,batch_size=100,drop_last=True)
model = CharLevelLanguageModel(len(char_to_id)+1,100,50,100)
criterion = torch.nn.NLLLoss()
optimizer = torch.optim.Adam(model.parameters())
losses = []
for i in range(3):
    
    total_loss = torch.FloatTensor([0])
    for batch_idx,train in enumerate(train_loader):
        model.hidden_state = model.init_hidden()
        x,y = Variable(train[0]),Variable(train[1])
        y_pred = model(x)
        #print()
        loss = criterion(y_pred,y)
        total_loss+=loss.data
        loss.backward()
        optimizer.zero_grad()
        optimizer.step()
    losses.append(total_loss)

Above is the training code.

model.eval()
testX = dataX[0:2]
testVar = Variable(torch.LongTensor(testX))

The above code throws inconsistent size error.

But if I run the below code:

testX = dataX[0]
testVar = Variable(torch.LongTensor(testX))

It executes successfully but it returns a size of (batch_size*output_labels). Each row is repeated batch_size number of times.

jpeg729 · January 11, 2018, 3:49pm

Did you reset the hidden state after training? You might need to do this before eval.

model.hidden_state = model.init_hidden()

kaushalshetty · January 11, 2018, 4:25pm

Oh I have not set that. I will do it and check. Was that the reason?

kaushalshetty · January 11, 2018, 4:49pm

Still getting the same error but thanks will keep in mind to set the hidden state before eval

jpeg729 · January 11, 2018, 5:12pm

My mistake, you also have to set the correct new batch size.

model.batch_size = test_batch_size
model.hidden_state = model.init_hidden()

If you don’t know why you need to do that then you know little about how an LSTM works.

Wrong. Using a batch size of 512 simply means that you feed the model with 512 samples in parallel. It is faster to do 512 calculations in one go, than to do 512 calculations one after the other. There are hidden_dim hidden states initialised per sample.

Once you have trained the model using a batch size of 512, the resulting trained model is valid for any batch size. You just have to reinitialise the hidden state for the new batch size.

The hidden state is the model’s “short-term memory”, so each time you give your model a batch of data, then unless the new batch of data contains the continuation of the sequences in the previous batch, then you should reset the hidden state.

kaushalshetty · January 11, 2018, 6:14pm

Thank you very much. I get it now. In parallel I am training keras model with almost same hyperparameters but I observe that keras model runs faster and is giving me better results. Pytorch model is outputting the same character again and again. Both the models have been trained for the same number of epochs. That is why I am doubting the correctness of my model in pytorch. I will incorporate all the changes you suggested and will retrain. Does the model build look good otherwise or am I still missing something? BTW heres the keras code:

seq_len = len(dataX[0]) 
input_layer = Input(shape=(seq_len,))
embedding_layer = Embedding(len(char_to_id)+1,100,input_length=seq_len)
embeds = embedding_layer(input_layer)
lstm_layer = LSTM(50,input_shape=(seq_len,100),return_sequences=False)
lstm_out = lstm_layer(embeds)
dense_layer = Dense(len(char_to_id),activation='softmax')
dense_out = dense_layer(lstm_out)
model = Model(input_layer,dense_out)
model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['acc'])
model.fit(dataX,dataY,batch_size=100,epochs=3)

jpeg729 · January 12, 2018, 9:48am

Did the retraining help?

I can see nothing obviously wrong with your code, but I am not especially familiar with embeddings and NLLLosses.

That said, it might help to have some idea what shape dataX and dataY have and how the data is arranged inside those arrays.

kaushalshetty · January 12, 2018, 12:16pm

No still getting bad results. dataX is of shape (163717,100) (samples,timesteps) and each element is a word_index. dataY is of 163717. So the arrangement is keras based i.e batches,timesteps,embedding_size that is why i gave batch_size=True.

Thanks

jpeg729 · January 12, 2018, 1:41pm

I would like to challenge that assumption, since you use the pytorch DataLoader to produce the batches. So you ought to check whether it outputs batches,timesteps,embedding_size or timesteps,batches,embedding_size.

kaushalshetty · January 13, 2018, 6:48am

The output shape of dataloader is (512,100)

jpeg729 · January 13, 2018, 7:36am

I’m not sure you understood what I meant. You use the dataloader like this…

train_loader = data_utils.DataLoader(train,batch_size=100,drop_last=True)
for batch_idx,train in enumerate(train_loader):
    do training stuff

What shape is train?

kaushalshetty · January 13, 2018, 7:44am

train = data_utils.TensorDataset(torch.LongTensor(dataX),torch.LongTensor(dataY))
train_loader = data_utils.DataLoader(train,batch_size=512,drop_last=True)
print(train)

Output:

<torch.utils.data.dataset.TensorDataset at 0x131880550>

train.data_tensor is LongTensor of size : (163717x100) which I feed into a DataLoader to get batches of size 512.

jpeg729 · January 13, 2018, 7:45am

Not what I wanted.

train = data_utils.TensorDataset(torch.LongTensor(dataX),torch.LongTensor(dataY))
train_loader = data_utils.DataLoader(train,batch_size=512,drop_last=True)
for batch_idx,train in enumerate(train_loader):
    print(train.shape)

kaushalshetty · January 13, 2018, 7:51am

Since train here is a tuple the shape of train[0] is torch.Size([512, 100]) and train[1] is torch.Size([512]) .

jpeg729 · January 13, 2018, 2:20pm

I am definitely missing something here.
I was expecting the input to the model to have 3 dimensions, but it only has 2.
Hmmm.

So train[0] has shape (batch_size, timesteps)
embeds has shape (batch_size, timesteps, emb_dim) which looks OK.

I really can’t see any reason why this would perform less well than the keras version.

kaushalshetty · January 15, 2018, 7:51am

Finally figured it out.
return F.log_softmax(self.linear(F.tanh(output[:,-1,:]))) . In this snippet F.tanh was not necessary since it happens in each of the lstm cell implementation by default. I think that is where I was going wrong. Removed it and I am able to get equivalent results as in keras. Do correct me if I am wrong.

Thank you very much for your help.