Say I have python process A and python process B scheduled concurrently like this:
#!/bin/bash
python my_model_A.py &
python my_model_B.py &
I want to understand the inference time for each of the models while they run standalone and concurrently.
For a standalone pytorch model I follow the guides here, that essentially tells me to synchronize the CUDA device.
Now the same code still works for two concurrently pytorch models. The issue here is that the inference isn’t really reported correctly, because the call torch.cuda.synchronize()
blocks the other process, so there is a lot of idling reported in the timing.
An example: Say Model A takes 150ms and so does Model B. For standalone application, both inference time is correctly reported using above link. If I schedule them concurrently, the above linked code will report 300ms for both models, since it is essentially reporting both Model A and Model B inference due to the blocking of torch.cuda.synchronize()
.
Is there a way to schedule these two processes concurrently and still measure the inference separately?