I am trying to use DDP to do multi-GPU training of my model, however I am facing the following error:
ProcessExitedException: process 0 terminated with signal SIGSEGV
I am using PyTorch lightening. My code works perfectly for a single GPU machine.
My system environment is as follows:
I searched in the internet and people were talking about downgrading Python version to 3.8 from 3.9 however all those posts are old back in 2021, and wondering if there is any solution to this problem (since downgrading python version may not be an option for me especially going to Python 3.8).
Following are some more info about this error: