I am using pytorch version 2.0.1 with python 3.11.3 on a mac.
I am training a model and it works absolutely fine when I am using num_workers=0
. I have also manually iterated over the dataset and everything is fine.
The problem happens as soon as I want to use multiprocessing and parallel data loading. So even setting num_workers=2
results in the following error:
File "/Users/luca/opt/anaconda3/envs/ideep/lib/python3.11/site-packages/lightning/__init__.py", line 25, in <module>
from lightning.fabric.fabric import Fabric # noqa: E402
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/luca/opt/anaconda3/envs/ideep/lib/python3.11/site-packages/lightning/fabric/__init__.py", line 29, in <module>
from lightning.fabric.fabric import Fabric # noqa: E402
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/luca/opt/anaconda3/envs/ideep/lib/python3.11/site-packages/lightning/fabric/__init__.py", line 29, in <module>
from lightning.fabric.fabric import Fabric # noqa: E402
from lightning.fabric.fabric import Fabric # noqa: E402
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/luca/opt/anaconda3/envs/ideep/lib/python3.11/site-packages/lightning/fabric/fabric.py", line 21, in <module>
File "/Users/luca/opt/anaconda3/envs/ideep/lib/python3.11/site-packages/lightning/fabric/fabric.py", line 21, in <module>
import torch
import torch
File "/Users/luca/opt/anaconda3/envs/ideep/lib/python3.11/site-packages/torch/__init__.py", line 457, in <module>
File "/Users/luca/opt/anaconda3/envs/ideep/lib/python3.11/site-packages/torch/__init__.py", line 457, in <module>
for name in dir(_C):
for name in dir(_C):
^^
NameError: name '_C' is not defined
It seems importing torch between multiple processes is creating an issue. Is this a known issue and is there a fix? Like I said everything works when using a single worker.