Computation time increases with batch size

I have written code using all PyTorch functions like torch.mul, torch.sub etc.
With the increase in batch size (for my example 32—>64) the time for the code increases (I am running my code on Titan XP GPUs). I thought PyTorch does computation in parallel.

Is this because I am bottlenecked by the pipelining in GPU?


The increase will be sublinear at the beginning, but once all the cores are fully used, we cannot do more parallelism, so it is expected that the runtime will increase linearly after that point.

Yeah I kind of observed that and it does make sense. Thanks!