Measure the inference performance for multi-tasking

zzsunshine · August 25, 2021, 3:17am

Hi all,

I want to measure the GPU inference time for multi-tasking on a single GPU. I used the code here:

with torch.no_grad():
    	starter, ender = torch.cuda.Event(enable_timing=True), torch.cuda.Event(enable_timing=True)
        starter.record()      
        output = model(input_batch)
        ender.record()
                
        torch.cuda.synchronize()
        curr_time = starter.elapsed_time(ender)

However, cuda.synchronize seems to synchronize across GPU. But I want to measure the GPU inference time for each of them separately. How to do synchronization for each one and measure the actual GPU inference time?

Appreciate for any help.

ptrblck · August 25, 2021, 4:10am

torch.cuda.synchronize accepts a device argument as seen in the docs. Also the Event object provides a synchronize() method in case you want to use it.
PS: you can post code snippets by wrapping them into three backticks ```, which makes debugging easier.

zzsunshine · August 25, 2021, 4:32am

Thanks for reply. torch.cuda.synchronize accepts a device argument. However, I co-run multi-tasking on the same single GPU, the device will be only one. I am not sure if I can use torch.cuda.synchronize under co-running condition.

For event object, it seems that torch.cuda.Event.synchronize() does not works here.

with torch.no_grad():
    	starter, ender = torch.cuda.Event(enable_timing=True), torch.cuda.Event(enable_timing=True)
        starter.record()      
        output = model(input_batch)
        ender.record()    
        torch.cuda.synchronize()
        curr_time = starter.elapsed_time(ender)

ptrblck · August 25, 2021, 5:29am

I’m not sure how you are “co-running”, but in case you are using streams you might want to sync them.

zzsunshine · August 29, 2021, 9:22pm

Thanks for your help!