How can torchscript C++ parallelize modulelist calculation?

Hi all,

I want to know how can I parallelize my modulelist in torchscript C++ codes.

I have read that the modulelist is completely unrolled from this blog:

  • Code that iterates over torch.nn.ModuleList or torch.nn.ModuleDict is completely unrolled so that elements of torch.nn.ModuleList or keys of torch.nn.ModuleDict can be of different subclasses of torch.nn.Module.

Does it mean we don’t need to modify the C++ code to enabel parallelism of Modulelist
because the modulelist is unrolled automatically?

On the other hand, I also read from this blog Dynamic Parallelism in TorchScript — PyTorch Tutorials 1.12.0+cu102 documentation
that we can use torch.jit.fork and torch.jit.wait to manually parallelize the modulelist calculations. I want to know will the fork &wait also affects C++ torchscript code so that modulelist can be calculated in parallel automatically ?

Finally, if above two methods cannot automatically enable C++ parallel calculation of modulelist, could we implemented a new torchscript custom class that accept some torch.nn.Modules as parameters and parallel these modules manually by ourselves?

Thanks very much for any advices.