Timing pytorch inference for concurrent python processes

Say I have python process A and python process B scheduled concurrently like this:

#!/bin/bash

python my_model_A.py &
python my_model_B.py &

I want to understand the inference time for each of the models while they run standalone and concurrently.

For a standalone pytorch model I follow the guides here, that essentially tells me to synchronize the CUDA device.

Now the same code still works for two concurrently pytorch models. The issue here is that the inference isn’t really reported correctly, because the call torch.cuda.synchronize() blocks the other process, so there is a lot of idling reported in the timing.

An example: Say Model A takes 150ms and so does Model B. For standalone application, both inference time is correctly reported using above link. If I schedule them concurrently, the above linked code will report 300ms for both models, since it is essentially reporting both Model A and Model B inference due to the blocking of torch.cuda.synchronize().

Is there a way to schedule these two processes concurrently and still measure the inference separately?

I think it depends a bit what you actually want to measure. If you want to measure the entire runtime of the script you could run it via time python script.py and make sure the execution is indeed finished e.g. by manually synchronizing or by printing a value of the result.
Manual syncs are needed, if you want to measure the actual GPU runtime since CUDA code is executed asynchronously.