Will running torch.cuda.synchronize slow down inference performance?

Vedant_Roy · December 23, 2022, 4:49am

I’m doing inference in a loop. Think:

for batch in dl:
    outputs = model(batch)
    np.savez(outputs.cpu().numpy())

I’m trying to benchmark how long it takes for my code to do inference on a single batch.

If I add a torch.cuda.synchronize() before the np.savez line, will this slow down my inference code at all?

ptrblck · December 23, 2022, 10:00am

No, you should not see any additional slowdown by adding torch.cuda.synchronize() since pushing the CUDATensor to the CPU via outputs.cpu() will already synchronize your code.