When I ran a model with different batch sizes and I found that the inference time increases linearly with above certain batch size.
Can any bottleneck occur when increasing batch size too much? Don’t think CPU utilization matters since I measure the time only for the inference line. (I might be wrong)
Below is the inference time I measure.
batchsize:5
0.015910625457763672
batchsize:30
0.015015363693237305
batchsize:50
0.017632007598876953
batchsize:100
0.033460140228271484
batchsize:200
0.06149935722351074
batchsize:400
0.11984658241271973