DataLoader worker (pid(s)) exited unexpectedly

When running my code I recieved this error message “ RuntimeError: DataLoader worker (pid(s) 8992) exited unexpectedly " after training start

i use pytorch 1.12.1+cu113
NUM_Worker : 0
batch_size : 1
cuda : True
multi_gpu : True
image_size : 224
can you help me

Model : resnet_split0 Experience : cross_validation
self.dataset : None
videos_split : [[‘SM686-7’ ‘train’]
[‘LYI1079-2’ ‘train’]
[‘GA817-1-8’ ‘train’]

[‘TA239-2’ ‘test’]
[‘GM537-7’ ‘test’]
[‘AM33-2’ ‘test’]]

videos_split : [[‘SM686-7’ ‘train’]
[‘LYI1079-2’ ‘train’]
[‘GA817-1-8’ ‘train’]

[‘TA239-2’ ‘test’]
[‘GM537-7’ ‘test’]
[‘AM33-2’ ‘test’]]

Epoch_begin
Epoch 1 : train

im in bloc loop :slight_smile:
0 / 259398
LGA881-1-2 9 0
Traceback (most recent call last):
File “/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py”, line 1163, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File “/usr/lib/python3.7/multiprocessing/queues.py”, line 113, in get
return _ForkingPickler.loads(res)
File “/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/reductions.py”, line 297, in rebuild_storage_fd
fd = df.detach()
File “/usr/lib/python3.7/multiprocessing/resource_sharer.py”, line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File “/usr/lib/python3.7/multiprocessing/resource_sharer.py”, line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File “/usr/lib/python3.7/multiprocessing/connection.py”, line 492, in Client
c = SocketClient(address)
File “/usr/lib/python3.7/multiprocessing/connection.py”, line 620, in SocketClient
s.connect(address)
FileNotFoundError: [Errno 2] No such file or directory

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “/content/drive/MyDrive/memoire-SSD/code/trainVal.py”, line 449, in
main()
File “/content/drive/MyDrive/memoire-SSD/code/trainVal.py”, line 446, in main
run(args)
File “/content/drive/MyDrive/memoire-SSD/code/trainVal.py”, line 348, in run
trainFunc(**kwargsTr)
File “/content/drive/MyDrive/memoire-SSD/code/trainVal.py”, line 30, in epochSeqTr
for batch_idx,batch in enumerate(loader):
File “/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py”, line 681, in next
data = self._next_data()
File “/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py”, line 1359, in _next_data
idx, data = self._get_data()
File “/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py”, line 1325, in _get_data
success, data = self._try_get_data()
File “/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py”, line 1176, in _try_get_data
raise RuntimeError(‘DataLoader worker (pid(s) {}) exited unexpectedly’.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 8992) exited unexpectedly

The code is failing in:

so maybe an if-guard would help:

if __name__ == '__main__':

i use google colab to run my code not in my desktop how i can set program package in colab ?

i can set connection.py and add
if name == “main”:
main()

but the problem is not resolved

have you encountered this problem in your projects?
Anny solutions ?

No, I’ve never seen this issue in my setups. You should also not change any internal connection.py scripts from the multiprocessing package, but your main script.

ok :smiley:

after upgrade pymp with command :!pip install --upgrade pymp-pypi the error of directory is resolved
but i can’t training my model

raise RuntimeError(‘DataLoader worker (pid(s) {}) exited unexpectedly’.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 1833) exited unexpectedly

error still persists