No appreciable difference when using multiple gpus

(Arka Sadhu) #1

I was trying to use 3 gpus for training, however the train time changes very negligibly. When training on one gpu I set the batch size to 200, when training with 3 I set it to 1000.
The gpu consumption is distributed across the 3 gpus, however the training time is still the same for one epoch.

What are the possible causes for the same?

(Arul) #2

you could time the execution speed of dataloading, model execution etc to see what’s the real bottleneck. Are you sure it’s the model, but not disk io?

(Arka Sadhu) #3

Any pointers to time the execution speed of dataloading?

(Jeong TaeYeong) #4

If you are using dataloader for your training loop, you can measure dataloading time simply like below.

loader_time, st = 0, time.time()
for i, data in enumerate(loader):
    loader_time += time.time() - st
    # sth for training
    # ...
    st = time.time()
    # end of the loop

Then you can measure the proportion of dataloading time from whole training.

Or the better way is to do the profiling like this.