When I ran a model with different batch sizes and I found that the inference time increases linearly with above certain batch size.
Can any bottleneck occur when increasing batch size too much? Don’t think CPU utilization matters since I measure the time only for the inference line. (I might be wrong)
Below is the inference time I measure.
batchsize:5 0.015910625457763672 batchsize:30 0.015015363693237305 batchsize:50 0.017632007598876953 batchsize:100 0.033460140228271484 batchsize:200 0.06149935722351074 batchsize:400 0.11984658241271973