Multiprocessing failed using single GPU

Lorentz.B · January 27, 2019, 10:31am

Hi, I am new to the machine learning community. For some reasons, I try to parallelly do inference using multi-core CPU and single GPU, however I just got following runtime errors.

THCudaCheck FAIL file=c:\a\w\1\s\tmp_conda_3.6_091443\conda\conda-bld\pytorch_1544087948354\work\torch\csrc\generic\StorageSharing.cpp line=232 error=71 : operation not supported
File "C:\Users\Anaconda3\lib\site-packages\torch\multiprocessing\reductions.py", line 213, in reduce_tensor
    (device, handle, storage_size_bytes, storage_offset_bytes) = storage._share_cuda_()
RuntimeError: cuda runtime error (71) : operation not supported at c:\a\w\1\s\tmp_conda_3.6_091443\conda\conda-bld\pytorch_1544087948354\work\torch\csrc\generic\StorageSharing.cpp:232

The following is a simplified example which can reproduce the errors.

import torch
from torch import nn

# model used to do inference
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.fc1 = nn.Linear(100,1)
        
    def forward(self,x):
        return self.fc1(x)
    
# class running inference
class A(object):
    def __init__(self):
        pass

    def do_something(self, model):
        # do something
        x = torch.randn(100).view(-1)
        print(model.forward(x))
    
    def run(self):
        mp = torch.multiprocessing.get_context('spawn')
        processes = []

        for i in range(2):
            p = mp.Process(target=self.do_something, args=(Model().cuda(),))
            processes.append(p)

        for p in processes:
            p.start()

if __name__ == '__main__':
    a = A()
    a.run()

It would be greatly appreciated if anyone can help solve this problem. By the way, my PC runs on Windows 10 with one GTX 1070 GPU.

peterjc123 · January 27, 2019, 3:15pm

https://pytorch.org/docs/stable/notes/windows.html#cuda-ipc-operations

pietern · January 27, 2019, 5:56pm

In your example you could choose to instantiate your model in the sub process. Then you won’t need to share CUDA tensors between the parent and the child process.

xxxxxi-gg · July 21, 2020, 12:55am

Hi,Have you solve it? I have the same problem.