Errors occurred when using multiprocessing

Hi! I want to use torch.multiuprocessing to speed-up my training process. In short, the original training structure is as below.

train_loader = DataLoader(train_dataset, batch_size=train_batch, shuffle=True)
model = Model(...)
optimizer = optim.SGD(model.parameters(), ...)
for i in range(epochs):
    for pos, neg in enumerate(train_loader):
        pos = pos.to(device).to(torch.long)
        neg = neg.to(device).to(torch.long)
        optimizer.zero_grad()
        loss = model(pos, neg)
        loss.backward()
        optimizer.step()

I want to use multiprocessing in each epoch, so I changed my code structure into this:

def train(model, train_loder, device):
    optimizer = optim.SGD(model.parameters(), ...)
    for pos, neg in enumerate(train_loader):
        pos = pos.to(device).to(torch.long)
        neg = neg.to(device).to(torch.long)
        optimizer.zero_grad()
        loss = model(pos, neg)
        loss.backward()
        optimizer.step()

train_loader = DataLoader(train_dataset, batch_size=train_batch, shuffle=True)
model = Model(...)
for i in range(epochs):
    processes = []
    for rank in range(num_processes):
        p = mp.Process(target=train, args=(model, train_loader, device))
        p.start()
        processes.append(p)
    for p in processes:
        p.join()

But when I ran, I got errors I don’t understand: (My system is Win10)

THCudaCheck FAIL file=C:\w\1\s\windows\pytorch\torch/csrc/generic/StorageSharing.cpp line=245 error=71 : operation not supported
Traceback (most recent call last):
  File "D:/PyCharm/Projects/TransE/train.py", line 113, in <module>
    p.start()
  File "D:\Python3.7.3\lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)
  File "D:\Python3.7.3\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "D:\Python3.7.3\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "D:\Python3.7.3\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
    reduction.dump(process_obj, to_child)
  File "D:\Python3.7.3\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "D:\Python3.7.3\lib\site-packages\torch\multiprocessing\reductions.py", line 232, in reduce_tensor
    event_sync_required) = storage._share_cuda_()
RuntimeError: cuda runtime error (71) : operation not supported at C:\w\1\s\windows\pytorch\torch/csrc/generic/StorageSharing.cpp:245
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "D:\Python3.7.3\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "D:\Python3.7.3\lib\multiprocessing\spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)

Is my structure wrong? And how to fix it? Thanks!

PS: It seems that there are not many tutorials on how to use torch.multiprocessing. And I am not familiar with multiprocessing :frowning:

Hi,

the error states that THCudaCheck, in StorageSharing.cpp you are doing operation not supported.
So my guess is that sharing cuda Tensors is not supported on windows.

Anyway, in general, multiprocessing is very unlikely to give you any performance benefit. What are the properties of your workload? Why do you want to use multiprocessing?

Hi! Thanks for your answer. In my current situation, without multiprocessing each epoch costs about 8 seconds. I saw other implementations used multiprocessing to speed-up training process and it really helped (but they are written with Tensorflow).
You said it’s likely that sharing cuda Tensors is not supported on windows. To verify this I tried runing a PyTorch official example program (code link) which uses a very similar multiprocessing structure as mine, and same error occurred if I used cuda in windows system. I believe your guess in right!

But I still want to speed-up my process. In the next try, I will use multiprocessing not torch.multiprocessing and see if it works!