Following is an example code showing what I am trying to measure. Here I am using time.perf_counter() to measure time. Is this the correct way to measure execution time in this scenario? If not, what is the correct way? My concern is, GPU evaluations are asynchronous and GPU execution might not be completed when ExecTime is measured below.
import torch import torch.nn.functional as F import time Device = torch.device("cuda:0") ProblemSize = 100 NumChannels = 5 NumFilters = 96 ClassType = torch.float32 X = torch.rand(1, NumChannels, ProblemSize, ProblemSize, dtype=ClassType).to(Device) weights = torch.rand(NumFilters, NumChannels, 10, 10, dtype=ClassType).to(Device) #warm up Y = F.conv2d(X, weights) Y = F.conv2d(X, weights) #time t = time.perf_counter() Y = F.conv2d(X, weights) ExecTime = time.perf_counter() - t