Trying to measure the GPU computation time of an operation, doing something like:
a = torch.randn(10,10).cuda() b = torch.randn(10,100).cuda() with torch.autograd.profiler.profile(use_cuda=True) as prof: ret = a.mm(b) perf = prof.total_average()
perf is a
FunctionEventAvg object that has attributes
cuda_time_total. I’ve notices the ratios between these don’t agree, as in for two different operations being evaluated
cuda_time_A/cuda_time_B != cuda_time_total_A/cuda_time_total_B .
Does someone know which ratio I should be using to evaluate performance / is this the right procedure to do so?