Is Pytorch have any asynchronous inference API?

Wonder if Pytorch could cooperate with other coroutine and functions like

async def infer():
  await return model(data)
async def wait():
  await asyncio.sleep(1)
asyncio.run([infer(),wait()])

It does not out of the box.
However, JITed models release the GIL for their processing, so launching the model in a background thread and waiting for completion is quite efficient.

We do this in Chapter 15 of our book. While the free download on the PyTorch.org website has ended, the example code is available on github: dlwpt-code/request_batching_jit_server.py at master · deep-learning-with-pytorch/dlwpt-code · GitHub

Best regards

Thomas

Sorry about bothering you again and replying too late,
I read and tested many cases and read many documents recently and I have a guess.
Is pytorch’s async inference worked like asyncio.sleep()?
Because I can’t feel it’s async in the main thread.
It needs you manually put it in an event loop.
Otherwise, it will not yield.
So I designed two situations and correspond with guess

def cpu_task():
    2*5
def cpu(result):
    a = result.mean()
# 1
for i in range(100):
  cpu_task()
  result = model(data[i])
  cpu(result)

# 2
for i in range(100):
  cpu_task()
result = []
for i in range(100):
  result.append(model(data[i]))
for i in range(100):
  cpu(result[i])

If my suppose is true, situation 1 worked like blocked
situation2 worked like async.
When I test this code, situation2 is 10x faster than #1.
Is async the reason why?
Best wish

And I have an another suppose:

queue = queue()
thread1: queue.push(model(data))
thread2: 
while(!queue.empty()):
  task(queue.pop())

will this code work appropriately?

I don’t know what queue is for you, but the async primitives can be touchy w.r.t. multithreading.

So is my first suppose correct?

I’m not entirely sure what you’re trying to achieve / what the semantics of that code should be.