Is `[m(x) for m in self.sub_module_list]` run parallel or sequential?

My model has several sub-modules, stored in a nn.ModuleList member, in forward() if I calling each sub-module like this

def forward(self, x):
    out = [m(x) for m in self.sub_module_list]   # where self.sub_module_list is nn.ModuleList object
    out = torch.cat(out)

is forward() of each submodule run parallel or sequential?

I suppose if module is jit traced the answer is clear: it is parallel since that’s exactly what static computation graph want to do. But if not jit traced, will this line of code run sequentially?

It looks like there is one tensor and all of them are on the same device so I would expect them to run sequentially. However, there can be some pipeline parallelism because there aren’t dependencies between the module lists.