Tensor sharing across multiple venvs

guill · July 9, 2025, 11:43pm

Context

Hello! I’m one of the developers working on ComfyUI. It’s a tool built for modular execution of AI workflows via a node graph system. One category of issues we encounter a lot is “custom nodes” with conflicting dependencies (e.g. one requiring numpy 1.x while another needs numpy 2.x).

One solution we’re exploring is allowing Python extensions to run in isolated virtual environments while sharing PyTorch tensors with the host process via PyTorch’s torch.multiprocessing support. This enables running extensions with conflicting dependencies while still allowing efficient tensor sharing for AI workloads.

To this end, we’ve developed a library named pyisolate

Our approach:

Create separate venvs for different sets of dependencies
Use multiprocessing.set_executable() to spawn processes using each venv’s Python interpreter. (It’s a little more complicated for Windows.)
Use torch.multiprocessing with spawn method to enable tensor sharing between host and extension processes
All processes have the same PyTorch version installed

Current Status

This approach currently works well - tensors (including CUDA tensors) are successfully shared between the host process and extension processes running with different Python executables. The shared memory mechanisms and file descriptors work correctly across different Python executables.

Question

Is this use case (tensor sharing between processes spawned with different Python executables via set_executable()) something PyTorch is likely to stop supporting in the future?

We want to ensure we’re not relying on behavior that is likely break in future versions. While the standard use case is sharing tensors between processes using the same Python executable, our use case extends this to processes using different executables from isolated environments.

Thanks in advance for any thoughts!

ezyang · July 16, 2025, 4:00am

Is this using our CUDA refcounted IPC stuff? We haven’t really touched it in forever so it’s “stable”, but if the setup there changes then yes it would break across versions. The most reliable thing to do is reimplement the sharing yourself (it’s not that complicated, and you might want to have a better refcounting scheme than what we do) and then use Python APIs to convert it into Tensors.