Will optimizer handle batch size properly traning lstm?

BigBorg · March 9, 2018, 7:40am

I’m traning an CNN + LSTM model to do captcha recognition. The pytorch lstm tutorial only gives example of batch size 1. I wonder if I feedforward a batch with batch size larger than 1, will optimizer properly handle batch size? Should I do anything additional to provide optimizer with batch size information?

BigBorg · March 9, 2018, 8:05am

The model will accumulate gradients for examples in a batch. The optimizer does not need to know batch size. Simply use a smaller learning rate for a larger batch size. Is this correct?

jpeg729 · March 10, 2018, 11:15am

The builtin pytorch loss functions take the average loss per sample, so you don’t have to adjust the learning rate for a different batch size.

One thing to note is that nn.LSTM expects input of shape (timesteps, batch_size, features). If you want to give input of shape (batch_size, timesteps, features) then you will need to use the batch_first=True argument to nn.LSTM, or alternatively you can use input.transpose(0,1) to switch the first two dimensions.

BigBorg · March 15, 2018, 1:57am

Thanks. Already got the model working. It performs well for fixed-length captcha. But for variable length captcha, the model isn’t good enough. I’m trying to implement CTC loss now.