Speed up inference time using larger batch size but not improve. Any better way?

I have to speed up the inference time using GPU. Currently, the testing image has 100 images. I used very deep network, and it takes about 10 seconds to inference all 100 images with batch size of inference is 1. Now, I used batch size of 10, thus each time I send 10 images to the network to predict. However, the times does not improve so much. It reduce from 10 seconds to 9.5 seconds. Do we have any way to speed up the inference time?