Question about NativeRT ParallelGraphExecutor status and inter-op parallelism support

Hi everyone,

I’m evaluating NativeRT’s `ParallelGraphExecutor` for a production inference use case and would like to understand its current status.

Our current TensorFlow-based inference system relies heavily on automatic inter-op parallelism for DAG execution, where multiple independent subgraphs can run concurrently. We are investigating whether PyTorch has an equivalent direction.

During our investigation, we found NativeRT’s `ParallelGraphExecutor` via `maxParallelOps > 1`. However, the documentation seems to describe inter-op parallelism as experimental, and mentions that it does not currently work with memory planning enabled.

I have a few questions:

1. Is `ParallelGraphExecutor` still an actively maintained direction in NativeRT?

2. Is NativeRT inter-op parallelism expected to become production-ready, or is it mainly experimental at this stage?

3. Are there known limitations beyond the memory planning incompatibility?

4. If this is not the recommended path, what is the current recommended approach for automatic inter-op parallelism in PyTorch inference?

5. Would contributions in this area be welcome if we identify issues during evaluation?

For context, our use case is CPU-bound online inference with DAG-structured models and multiple independent subgraphs.

I also opened a GitHub issue before, but it was closed by the bot as a usage question:

I also left a comment on the NativeRT RFC PR:

Thanks!