Why multiprocess behavior differ between v1.12 and v1.9

samson-wang · March 30, 2023, 3:55am

When I use torch==1.9.0, the following code runs fine.

import torch
from multiprocessing import Process
import multiprocessing

def run():
    print('in proc', torch.cuda.is_initialized())
    print('in proc', torch.cuda._is_in_bad_fork())
    torch.zeros((5,)).cuda()

def fk_run():
    print('in fork proc', torch.cuda.is_initialized())
    print('in fork proc', torch.cuda._is_in_bad_fork())
    torch.zeros((5,)).cuda()

if __name__ == "__main__":
    print('in main', torch.cuda.is_initialized())
    print('in main', torch.cuda._is_in_bad_fork())

    sp_ctx = multiprocessing.get_context('spawn')
    a = sp_ctx.Process(target=run)
    a.start()

    fk_ctx = multiprocessing.get_context('fork')
    b = fk_ctx.Process(target=fk_run)
    b.start()
    a.join()
    b.join()

However, when run with torch=1.12 get a RuntimeError

RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

I check the source code of 1.9 and 1.12 and find no difference.

github.com

pytorch/pytorch/blob/v1.12.0/torch/csrc/cuda/Module.cpp#L42


      
          #ifndef WIN32
          #include <pthread.h>
          #endif
          
          
using namespace torch;
          
          
static bool in_bad_fork = false;  // True for children forked after cuda init
          
          
#ifndef WIN32
          // Called in the forked child if cuda has already been initialized
          static void forked_child() {
            in_bad_fork = true;
            torch::utils::set_run_yet_variable_to_false();
          }
          #endif
          
          
// Should be called before the first cuda call.
          // Note: This is distinct from initExtension because a stub cuda implementation
          // has some working functions (e.g. device_count) but cannot fully initialize.
          static void poison_fork() {
          #ifndef WIN32

github.com

pytorch/pytorch/blob/v1.9.0/torch/csrc/cuda/Module.cpp#L37


      
          #include <pthread.h>
          #endif
          
          
using namespace torch;
          
          
THCState *state = nullptr;
          static bool in_bad_fork = false;  // True for children forked after cuda init
          
          
#ifndef WIN32
          // Called in the forked child if cuda has already been initialized
          static void forked_child() {
            in_bad_fork = true;
            torch::utils::set_run_yet_variable_to_false();
            state = nullptr;
          }
          #endif
          
          
// Should be called before the first cuda call.
          // Note: This is distinct from initExtension because a stub cuda implementation
          // has some working functions (e.g. device_count) but cannot fully initialize.
          static void poison_fork() {

Besides, the demo code doesn’t init CUDA in the main process, why prompts Cannot re-initialize CUDA in forked subprocess ?

ptrblck · March 30, 2023, 8:10am

In PyTorch 2.0.0 I’m seeing:

in main False
in main False
in fork proc False
in fork proc False
in proc False
in proc False

samson-wang · March 30, 2023, 10:16am

Ok. Still confused by the v1.12.0’s RuntimeError.

ptrblck · March 30, 2023, 11:37pm

I don’t know what might have caused this issue in this older version, but are you still seeing the same issue in 2.0.0 or a recent nightly?

samson-wang · March 31, 2023, 9:23am

No，2.0.0 works fine.

ptrblck · March 31, 2023, 5:15pm

OK, great! Thanks for confirming.