I want to measure the GPU inference time for multi-tasking on a single GPU. I used the code here:
with torch.no_grad(): starter, ender = torch.cuda.Event(enable_timing=True), torch.cuda.Event(enable_timing=True) starter.record() output = model(input_batch) ender.record() torch.cuda.synchronize() curr_time = starter.elapsed_time(ender)
However, cuda.synchronize seems to synchronize across GPU. But I want to measure the GPU inference time for each of them separately. How to do synchronization for each one and measure the actual GPU inference time?
Appreciate for any help.