Larger batch size in HF Trainer vs PyTorch

mmg · March 8, 2023, 9:04am

Using the ‘bert-base-uncased’ model, I was able to use a batch size of 64 for the train dataset using the code from NLP with Transformers book by HF Team (GitHub - nlp-with-transformers/notebooks: Jupyter notebooks for the Natural Language Processing with Transformers book). No special steps like grad accumulation and mixed precision were done.
However, with my custom pytorch training loop using the same model and dataset, I was only able to train on a batch size of 16 - increasing this would result in OOM error. Has anyone faced a similar situation?

mmg · March 17, 2023, 9:57am

Never mind. I figured it out after ‘extensive’ debugging. It turns out that i was padding to max_length and hf datasets was padding to max_length of the entire dataset. Once i tweaked it, both could be trained on bs of 64