I do not know if this behavior comes from the multiprocessing module, or PT itself, but I find that when I start running a model with DDP, it randomly reads the files from disk sometimes in the middle of the processes which prevents me from being able to change anything on the disk when the process is running.
This can cause an annoyance when I want to run a model and then implement something else in the codebase while that is running, because if there is ever an intermediate save which has some kind of syntax error, then the whole running program will crash.
If it were a non-DDP module, I could start the code, and then once the code is loaded into memory, I could go on implementing whatever I need to which would not affect the running process.
Does this behavior come from PT or is it part of mp.spawn
? Is there any way to avoid accidentally crashing the running program without being forced to work on a different machine?