Parallel aplication of the function to cuda tensor strings

Hello, everyone!

I need to apply user defined function to all strings of cuda tensor. Because of that function takes very a few of GPU resourses but significant time I want to speed up the calculation applying that function to the several strings contemporaneously.

But when I try to execute that code

ctx = torch.multiprocessing.get_context(‘spawn’)
for ind in range(0,self.pop_size,th_cnt):
processes = []
for proc_n in range(th_cnt):
p = ctx.Process(target=pw.forw, args=((ind+proc_n, wgts[ind+proc_n,:,:], errs)))
for p in processes:

I’m gettring and error

File “C:\ProgramData\Anaconda3\lib\multiprocessing\”, line 105, in start
self._popen = self._Popen(self)
File “C:\ProgramData\Anaconda3\lib\multiprocessing\”, line 322, in _Popen
return Popen(process_obj)
File “C:\ProgramData\Anaconda3\lib\multiprocessing\”, line 65, in init
reduction.dump(process_obj, to_child)
File “C:\ProgramData\Anaconda3\lib\multiprocessing\”, line 60, in dump
ForkingPickler(file, protocol).dump(obj)
File “C:\ProgramData\Anaconda3\lib\site-packages\torch\multiprocessing\”, line 108, in reduce_storage
metadata = storage.share_cuda()
RuntimeError: cuda runtime error (71) : operation not supported at c:\users\administrator\downloads\new-builder\win-wheel\pytorch\torch\csrc\generic\StorageSharing.cpp:253
Traceback (most recent call last):
File “”, line 1, in
File “C:\ProgramData\Anaconda3\lib\multiprocessing\”, line 105, in spawn_main
exitcode = _main(fd)
File “C:\ProgramData\Anaconda3\lib\multiprocessing\”, line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

Could you explain me, what should I fix in my code or may be you will addvice better way to achieve my goal?

I also meet a same problem. My code is working on the small MNIST, but not in my own dataset. I don’t know what is the problem…