Is Pytorch have any asynchronous inference API?

Forceless · May 7, 2022, 1:15pm

Wonder if Pytorch could cooperate with other coroutine and functions like

async def infer():
  await return model(data)
async def wait():
  await asyncio.sleep(1)
asyncio.run([infer(),wait()])

tom · May 7, 2022, 1:53pm

It does not out of the box.
However, JITed models release the GIL for their processing, so launching the model in a background thread and waiting for completion is quite efficient.

We do this in Chapter 15 of our book. While the free download on the PyTorch.org website has ended, the example code is available on github: dlwpt-code/request_batching_jit_server.py at master · deep-learning-with-pytorch/dlwpt-code · GitHub

Best regards

Thomas

Forceless · May 14, 2022, 10:40am

Sorry about bothering you again and replying too late,
I read and tested many cases and read many documents recently and I have a guess.
Is pytorch’s async inference worked like asyncio.sleep()?
Because I can’t feel it’s async in the main thread.
It needs you manually put it in an event loop.
Otherwise, it will not yield.
So I designed two situations and correspond with guess

def cpu_task():
    2*5
def cpu(result):
    a = result.mean()
# 1
for i in range(100):
  cpu_task()
  result = model(data[i])
  cpu(result)

# 2
for i in range(100):
  cpu_task()
result = []
for i in range(100):
  result.append(model(data[i]))
for i in range(100):
  cpu(result[i])

If my suppose is true, situation 1 worked like blocked
situation2 worked like async.
When I test this code, situation2 is 10x faster than #1.
Is async the reason why?
Best wish

Forceless · May 14, 2022, 11:08am

And I have an another suppose:

queue = queue()
thread1: queue.push(model(data))
thread2: 
while(!queue.empty()):
  task(queue.pop())

will this code work appropriately?

tom · May 14, 2022, 6:27pm

I don’t know what queue is for you, but the async primitives can be touchy w.r.t. multithreading.

Forceless · May 15, 2022, 3:24pm

So is my first suppose correct?

tom · May 15, 2022, 5:46pm

I’m not entirely sure what you’re trying to achieve / what the semantics of that code should be.