I see the overview about this in the doc: ExecuTorch Runtime Overview — ExecuTorch documentation
But im curious if there are more use-cases that are being considered for Executorch. For example, Im particularly interested in the case where the accelerator has an asynchronous interface:
- enqueue execution for a given set of inputs (maybe takes a callback or returns an ID/handle)
- either calls callback when execution completes, or some mechanism for checking status of ID/handle
- accelerator has multiple input buffers to overlap copying inputs with previous execution
- accelerator has multiple output buffers to overlap copying outputs with subsequent execution
And how this might work if the model is entirely supported by accelerator (single blob), versus partially supported and only a subgraph is executed by accelerator.
Does Executorch support overlapping compute/data-movement when subgraph is executing on accelerator?