I’m traning an CNN + LSTM model to do captcha recognition. The pytorch lstm tutorial only gives example of batch size 1. I wonder if I feedforward a batch with batch size larger than 1, will optimizer properly handle batch size? Should I do anything additional to provide optimizer with batch size information?
The model will accumulate gradients for examples in a batch. The optimizer does not need to know batch size. Simply use a smaller learning rate for a larger batch size. Is this correct?
The builtin pytorch loss functions take the average loss per sample, so you don’t have to adjust the learning rate for a different batch size.
One thing to note is that nn.LSTM expects input of shape (timesteps, batch_size, features). If you want to give input of shape (batch_size, timesteps, features) then you will need to use the batch_first=True
argument to nn.LSTM, or alternatively you can use input.transpose(0,1)
to switch the first two dimensions.
Thanks. Already got the model working. It performs well for fixed-length captcha. But for variable length captcha, the model isn’t good enough. I’m trying to implement CTC loss now.