Is this the right way to time profile the model prediction on a test sample?
import torch.autograd.profiler as profiler
x.to("cuda:0") # test sample
model.eval()
with profiler.profile(use_cuda=True) as prof:
with torch.no_grad():
pred = model(x)
print(prof)
I would recommend to add some warum iterations and also calculate the actual runtime as a mean over iterations. This should be automatically done for you by using torch.utils.benchmark from the current master branch or nightly binary.