I have 1.5M samples, but my GPU usage doesn’t quite increase at all. Stays low, which means I can probably load more per batch. Currently I have a batch_size of 8192, but if I increase to 16384, then I get some CUDA errors. So I’m wondering what I can do to speed it up.
Your data loader looks OK, do you suspect that it is slow on loading the data?
Increasing batch size might increase the GPU utilization, but it also affects the learning process, mini-batches have an important part in training, they provide generalization (some would argue that batch_size=1 will be the best but no need to go too extreme)
So keep an eye on the accuracy
I’m only using 2313MiB of GPU RAM (out of 24GB) with a batch_size of 8192. If I set increase my batch_size to 16384, then sometimes it randomly crashes.