Python multiprocessing struct.error: 'i' format requires -2147483648 <= number <= 2147483647

I have a huge text data and at certain amount of data (usually more than 700MB), I keep getting this message error.

Cpython…
Python multiprocessing struct.error: ‘i’ format requires -2147483648 <= number <= 2147483647

But I don’t use any of that and it’s probably because of pytorch features such as Dataset and so on.

Any idea how to solve this?

Yeah, well, usually you should get a backtrace that helps pinpointing where that might be happening.

Best regards

Thomas

Thanks for response, message is like this.

File “/app/train.py”, line 48, in train
for dataset in train_data_loader:
File “/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 631, in next
idx, batch = self._get_batch()
File “/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 610, in _get_batch
return self.data_queue.get()
File “/opt/conda/lib/python3.6/multiprocessing/queues.py”, line 94, in get
res = self._recv_bytes()
File “/opt/conda/lib/python3.6/multiprocessing/connection.py”, line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File “/opt/conda/lib/python3.6/multiprocessing/connection.py”, line 407, in _recv_bytes
buf = self._recv(4)
File “/opt/conda/lib/python3.6/multiprocessing/connection.py”, line 379, in _recv
chunk = read(handle, remaining)
File “/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 274, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 92) is killed by signal: Killed.
User session exited

Best regards
Ryan

That isn’t the backtrace or error message for the initial error but one that would appear to be the one on the other end after the initial error happened or so.

At any rate, the original message seems to be a bug of Python < 3.8 handling large objects badly when it communicates between the processing:

Of course, that you do things send that large, chances are that something is amiss with what your program does for multiprocessing. Is something (i.e. another library) keeping you from using PyTorch’s multiprocessing wrapper?

Best regards

Thomas

I think I found issues based on your comment.

from torch.utils.data import DataLoader

train_data_loader = DataLoader(train_data_meta,
shuffle=False, collate_fn=lambda x: x[0],
num_workers=config.Req[“num_workers”], drop_last=True)

Yes, it’s about multiprocessing and if num_workers > 1 and data is large, the program is killed or error occurs because of that.

I use python 3.5~3.6 and I will probably think to change my python version higher.