I’m doing inference in a loop. Think:
for batch in dl: outputs = model(batch) np.savez(outputs.cpu().numpy())
I’m trying to benchmark how long it takes for my code to do inference on a single batch.
If I add a
torch.cuda.synchronize() before the
np.savez line, will this slow down my inference code at all?