Hi! I’d like to highlight a feature request made on the GitHub repo for automatic tuning of batch_size and num_workers, and start some discussion around this topic.
Much like tensorflow has introduced atf.data.experimental.AUTOTUNE flag to automatically tune these parameters, I think this feature would be very relevant for PyTorch users as well.
I have a couple questions for the community to start gathering building concensus -
Have you previously thought about this autotuning flag?
If you have thought about it before, what was the blocker to implementing it?
If this feature was introduced, would you use it?
What parameters do you use for batch_size and num_workers right now, and how do you set them?
It would be interesting to know, how many users are choosing the batch size based on the computation performance vs. the methodological perspective (e.g. bad training behavior for tiny or huge batch sizes).
E.g. if hypothetically a very small batch size would yield the best speedup, wouldn’t it also make some architectural changes necessary (replacing batchnorm layers for groupnorm etc.)?
I think a flag like this might be most useful for inference jobs, where we care exclusively about performance, without regard for training behavior. You’re right - adjusting batch size for users automatically will have side effects for training; but we can avoid this issue by at least narrowing the scope to inference.
Additionally, tuning num_workers would improve performance for both training and inference, especially for large scale (GB/TB) inference jobs.
Also, experimentally, it seems that large batch sizes tend to yield the best speedups, up until data can no longer fit in memory.
@Sean_O_Bannon I think it makes sense to have an optimized implementation of image folder and transforms for pytorch. The current API is nice, but the implementation is inefficient, and it slows down the entire system when it needs to feed 8xV100s and a small network (eg mobilenet_v2).