I made a model for time series forecasting. The model works very well. I tried different batch sizes 32, 50, 64, 100, 128 and 256.
I got the best result for a batch size of 50. With a batch size of 32 the model still works, but it takes too long until the error converges.
I tried multiple experiments with a batch size of 64. With that batch size the model is not able to learn. The MAE is very high, the training error decreases, but the validation error is very high. It seems, that the model is overfitting for a batch size of 64.
So I made further experiments with a batch size of 100, 128 and 256.
100 works of, but not as good as 50. 128 shows the same results as 64. 256 shows good results in training and validation error, but not in the MAE.
Do you have any ideas why 64 and 128 do not work?
I know that models tend to overfit, when the batch size is too large, but this reason makes no sense for me in this case.
Thanks for your help!