Why does the inference time suddenly increase in a certain batch size?

jhp · April 17, 2022, 6:22pm

When I ran a model with different batch sizes and I found that the inference time increases linearly with above certain batch size.
Can any bottleneck occur when increasing batch size too much? Don’t think CPU utilization matters since I measure the time only for the inference line. (I might be wrong)
Below is the inference time I measure.

batchsize:5
0.015910625457763672
batchsize:30
0.015015363693237305
batchsize:50
0.017632007598876953
batchsize:100
0.033460140228271484
batchsize:200
0.06149935722351074
batchsize:400
0.11984658241271973

ptrblck · April 18, 2022, 6:47am

I’m not sure what exactly you are measuring and how you are doing it.
If you are trying to profile a GPU operation, you would have to synchronize the code before starting and stopping the timers. If that’s already done, you might see e.g. quantization effects but this depends on your actual use case, the used kernels etc.