Hi,
I am using torch.compile for optimizing some preprocessing components, which are run on CPU, like depth_to_3d_opt = torch.compile(kornia.geometry.depth_to_3d, mode="reduce-overhead")
Then I am trying to run multi-GPU training via transformers
+ accelerate
.
However, the torch.compile fails with CUDA-related error torch._dynamo.exc.InternalTorchDynamoError: Cannot re-initialize CUDA in forked subprocess.
[rank3]: File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 172, in _fn
[rank3]: cuda_rng_state = torch.cuda.get_rng_state()
[rank3]: File "/opt/conda/lib/python3.10/site-packages/torch/cuda/random.py", line 31, in get_rng_state
[rank3]: _lazy_init()
[rank3]: File "/opt/conda/lib/python3.10/site-packages/torch/cuda/__init__.py", line 300, in _lazy_init
[rank3]: raise RuntimeError(
[rank3]: torch._dynamo.exc.InternalTorchDynamoError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
[rank3]: You can suppress this exception and fall back to eager by setting:
[rank3]: import torch._dynamo
[rank3]: torch._dynamo.config.suppress_errors = True
Is there a way to force torch.compile to run in CPU-only mode?