File “/app/train.py”, line 48, in train
for dataset in train_data_loader:
File “/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 631, in next
idx, batch = self._get_batch()
File “/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 610, in _get_batch
return self.data_queue.get()
File “/opt/conda/lib/python3.6/multiprocessing/queues.py”, line 94, in get
res = self._recv_bytes()
File “/opt/conda/lib/python3.6/multiprocessing/connection.py”, line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File “/opt/conda/lib/python3.6/multiprocessing/connection.py”, line 407, in _recv_bytes
buf = self._recv(4)
File “/opt/conda/lib/python3.6/multiprocessing/connection.py”, line 379, in _recv
chunk = read(handle, remaining)
File “/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 274, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 92) is killed by signal: Killed.
User session exited
That isn’t the backtrace or error message for the initial error but one that would appear to be the one on the other end after the initial error happened or so.
At any rate, the original message seems to be a bug of Python < 3.8 handling large objects badly when it communicates between the processing:
Of course, that you do things send that large, chances are that something is amiss with what your program does for multiprocessing. Is something (i.e. another library) keeping you from using PyTorch’s multiprocessing wrapper?