Dataloader and BatchSize

Hi, I’d recommend you start with a batch size that’s close to what others have used, maybe on the high end of that (say, 64 or 128). In practice, training with too large a batch size is said to not generalize well to new data (overfitting). I think there can also be floating point issues. Here’s an experiment where they show that, at least in their setup, large batch sizes do poorly.

Regarding the stateful LSTM, I’m not sure if there’s a direct way to do that in PyTorch, but there are other threads such as this one that discuss how to accomplish the same thing.