Using asyncio while waiting on GPU

Oscar_Romero · December 2, 2022, 12:49pm

In our research we end up wrapping pytorch into a ThreadPool with a single thread and allow the asyncio free up the current MainThread to execute other asyncio tasks like data download while pytorch is computing.

_POOL = futures.ThreadPoolExecutor(max_workers=1)
def compute_wrapper(data): 
     model = thread_context.get("model") # way to find the model into the thread context or create.
     return model.forward(data)

... into the asyncio loop ..... 
async compute_event(data_url): 
   data = await http_client.get(data_url)
   result = await asyncio.get_event_loop().run_in_executor(_POOL, functools.partial(compute_wrapper, data))
   return result


... somewhere in the async code ...

def main(): 
   coros = [compute_event(data_url) for data in data_list ]
   results  = await asyncio.gather(*coros)

This code will perform len(data_list) concurrent downloads using asyncio main thread and perform forward pass on the single model without blocking the main thread waiting the result of pytorch and let it download more data because the thread that is waiting the result of pytorch is the one that is on the ThreadPool.

If the time of download is bigger than the time of compute you will see a several improvements on the speed of your script. If the cpu time is much much lower than your download time the result will be the same as waiting the pytorch to compute while downloading.

Note: Actually there is no way to guarantee that all pytorch code is writed into a thread safe way so we end up to create as single thread that interacts with the model. If you can load a different model for each thread you can increase the thread pool size. Anyway you won’t see any speed improvements loading multiple models per thread and sharing the same GPU, multiple GPUs can improve.

This code is good if you work with FastAPI, ASGI, aiopika ( RabbitMQ ), aiokafka, aiompq, or other async environments where the number of concurrent IO is really high and must be shared with the CPU bound task made by pytorch. But in a normal research, scripting and/or development case this can only complicate the work.