Trying to measure the GPU computation time of an operation, doing something like:
a = torch.randn(10,10).cuda()
b = torch.randn(10,100).cuda()
with torch.autograd.profiler.profile(use_cuda=True) as prof:
ret = a.mm(b)
perf = prof.total_average()
Where perf
is a FunctionEventAvg
object that has attributes cuda_time
, cuda_time_total
. I’ve notices the ratios between these don’t agree, as in for two different operations being evaluated cuda_time_A/cuda_time_B != cuda_time_total_A/cuda_time_total_B
.
Does someone know which ratio I should be using to evaluate performance / is this the right procedure to do so?
Thanks.