So here’s what I’ve been running into.
Imagine you are trying to scale up your batch size by some factor N, and you can only increase the number of your CPUs but not so much your RAM.
The default behaviour of DataLoader is to spin up num_workers processes, each of which samples a full batch before returning it. The issue that this causes is that in order for me to benefit from the increased availability of CPUs to scale my batch size up efficiently, I need to basically convert a situation where:
This will ensure that I am able to properly utilize the extra CPUs I can get access to. However, in doing so I am also effectively increasing the memory that I need to use from B x W to N x B x N x W, which is problematic since I have access to more CPUs, but not additional RAM. And in most cases, such as on cloud compute, the factor of CPU to RAM increase is usually equal, therefore, an increase of N^2 complexity for memory could be very problematic.
I was wondering if there is a way to change the default behaviour of the DataLoader from a situation where instead of each worker loading a full batch, it instead has a pool of workers sampling single samples each which are then integrated into a batch once they reach a certain number. That way I could do something like what I need above, without needing all that extra memory, and potentially saving memory as well.
Please let me know your thoughts.