When I try to use lightning.Fabric.setup()
to load torch.nn.Module
under multi-process, the program will meet deadlock and stuck in lightning.fabric.strategies.launchers.subprocess_script
.
I doubt this problem comes from popen
start method of process, but I have not more evidence.
I tried to reproduce it in this demo, but I failed, so it’s a pity that I can’t provide minimal reproducible example now.
And I have seen this topic in the forum, but it can’t help me solve the problem.
Is there any other method to solve the problem or reproduce the problem and get enough information to debug at least?
Hope your reply.