I’m doing inference in a loop. Think:
for batch in dl:
outputs = model(batch)
np.savez(outputs.cpu().numpy())
I’m trying to benchmark how long it takes for my code to do inference on a single batch.
If I add a torch.cuda.synchronize()
before the np.savez
line, will this slow down my inference code at all?