This is more on the inference side of things, but while I am passing an image through a network and waiting on the GPU, I would like to get a head start on the performing CPU bound tasks on the next image. Would python’s asyncio be a path to go down? If so can someone help get me started? Here is the pseudo code.
Async def process_frame():
while frame is not None:
frame = next_frame() #CPU bound
data = await process_frame(frame) #GPU bound inference
CUDA calls are already asynchronously executed, so as long as you don’t wait manually for a return value of this computation, your CPU program counter will continue the execution until a synchronization point is met (e.g.
does it mean that training of a NN should not block the other function in that single-thread?
I tried it and it was, unfortunately, the case.
Do you know and can provide some examples where training will be executed with asyncio await syntax and does not block other processes.
Thank a lot!
I haven’t used asyncio, since the CUDA calls are asynchronous by default.
You should see overlapping compute if your CPU is fast enough to schedule the kernels and can run ahead. If that’s not the case (e.g. if the GPU workload is too small) you would be CPU-bound and the kernel execution won’t be overlapping.
how can/see I check that?
You can check it with a profiler such as Nsight Systems and check e.g. for whitespaces between the kernel execution.
Is there a way to use asyncio to return control to the loop when waiting for synchronization points?
I’m not familiar enough with
asyncio to give a clear answer.