Torch.autograd.profiler.profile attributes, measuring operation performance on GPU

Trying to measure the GPU computation time of an operation, doing something like:

a = torch.randn(10,10).cuda()
b = torch.randn(10,100).cuda()
with torch.autograd.profiler.profile(use_cuda=True) as prof:
    ret = a.mm(b)

perf = prof.total_average()

Where perf is a FunctionEventAvg object that has attributes cuda_time, cuda_time_total. I’ve notices the ratios between these don’t agree, as in for two different operations being evaluated cuda_time_A/cuda_time_B != cuda_time_total_A/cuda_time_total_B .

Does someone know which ratio I should be using to evaluate performance / is this the right procedure to do so?

Thanks.