Trainer Class Predict Func == Decreasing Prediction Time

Hi everyone,

  1. I have a dataloader with 100 images.
    If I set batch size =100 in “InferenceDataset” func,Total prediction time==23,6 s one image =236 ms

If I set batch size=2 in “InferenceDataset” func, Total prediction time==36,7 s
one image =367 ms

I want to improve prediction time, what is your recommendation for this process?
the code is :

inference_dataset = InferenceDataset(path=images_path, image_size=(256, 256))
test_dataloader = DataLoader(dataset=inference_dataset,batch_size=100, pin_memory=True)
start=time.time()
predictions = trainer.predict(model=model, dataloaders=test_dataloader)[0]                 
end=time.time()
print(f'Total prediction time: {round((end-start)*1000,2)} ms ')

The inference time should not increase if you lower the batch size, so something looks wrong.
Are you using your GPU? If so, note that CUDA kernels are executed asynchronously so you would need to synchronize the code before starting and stopping host timers via torch.cuda.synchronize().

First of all, thank you for your prompt response. I understood what you said. Using torch.cuda.synchronize() doesn’t help in speeding up. My GPU looks like active.==>

import torch
print(torch.cuda.is_available())
print(torch.cuda.current_device())
print(torch.cuda.device(0))
print(torch.cuda.device_count())
print(torch.cuda.get_device_name(0)) 

True
0
<torch.cuda.device object at 0x0000013E3BE594B0>
1
NVIDIA RTX A4000

I didn’t mean to claim synchronizations would speed up your code, but should be used if you want to profile the actual GPU execution.
I.e. you should add a synchronization via torch.cuda.synchronize() before using host timers via e.g. time.perf_counter().

1 Like

I am ı little bit confused. Here it says “batch size affects inference time”

Is the situation different with Torch?

No, it’s the same in PyTorch.

1 Like