Threading/Pipelining support

I see the overview about this in the doc: ExecuTorch Runtime Overview — ExecuTorch documentation

But im curious if there are more use-cases that are being considered for Executorch. For example, Im particularly interested in the case where the accelerator has an asynchronous interface:

  • enqueue execution for a given set of inputs (maybe takes a callback or returns an ID/handle)
  • either calls callback when execution completes, or some mechanism for checking status of ID/handle
  • accelerator has multiple input buffers to overlap copying inputs with previous execution
  • accelerator has multiple output buffers to overlap copying outputs with subsequent execution

And how this might work if the model is entirely supported by accelerator (single blob), versus partially supported and only a subgraph is executed by accelerator.

Does Executorch support overlapping compute/data-movement when subgraph is executing on accelerator?

ExecuTorch does not have assumption on hardware specification (like how acclerator works, or data is moved). However, we provide entry points so that async is possible. i.e., users can have an async call to move data inside it’s own accelerator delegate, and have a custom wait operator later when the moved data is ready.