I was trying to use 3 gpus for training, however the train time changes very negligibly. When training on one gpu I set the batch size to 200, when training with 3 I set it to 1000.
The gpu consumption is distributed across the 3 gpus, however the training time is still the same for one epoch.
What are the possible causes for the same?
you could time the execution speed of dataloading, model execution etc to see what’s the real bottleneck. Are you sure it’s the model, but not disk io?
Any pointers to time the execution speed of dataloading?
If you are using dataloader for your training loop, you can measure dataloading time simply like below.
loader_time, st = 0, time.time()
for i, data in enumerate(loader):
loader_time += time.time() - st
# sth for training
st = time.time()
# end of the loop
Then you can measure the proportion of dataloading time from whole training.
Or the better way is to do the profiling like this.
Ben Levy and Jacob Gildenblat, SagivTech
PyTorch is an incredible Deep Learning Python framework. It makes prototyping and debugging deep learning algorithms easier, and has great support for multi gpu training.
However, as always with...