I want to use torch.multiprocessing to accelerate my loop, however there are some errors .
I can’t absolutely understand the shared cuda menmery for subprocess .
Does anyone give some explanations ?
from torch.multiprocessing import Pool
def use_gpu():
t = []
for i in range(5):
time.sleep(1)
a = torch.randn(1000, 1000).cuda(3)
t.append(a)
return t
if __name__ == "__main__":
# torch.cuda.set_device(3)
pool = Pool()
result = []
a = time.time()
for i in range(10):
result.append(pool.apply_async(use_gpu))
pool.close()
pool.join()
print("cost time :", time.time() - a)
This snippet can work well .
if I used torch.cuda.set_device(3) for allocation gpu, there are some errors as follow .
Thanks a lot for the help so far . After adding torch.multiprocessing .set_start_method(“spawn”), there arise a new problem:
Traceback (most recent call last):
File "test9.py", line 45, in <module>
torch.multiprocessing.set_start_method('spawn')
File "/usr/local/lib/python3.5/multiprocessing/context.py", line 231, in set_start_method
raise RuntimeError('context has already been set')
RuntimeError: context has already been set
This usually happens if you have not properly wrapped you main in if __name__ == '__main__': construct. Another issue might be that your project has multiple files which have that construct, and these will set the contexts when importing these files.
One option is to have only a single entry point in your project which is properly wrapped in that construct,
Or you can call the set_start_method with the argument force=True
I had a slightly different problem when training multiple models in multiple gpu in parallel. This is the only answer I found helpful so I hope I can bring it up here.
It can be shown by modifying the example based on the above suggestion. Here I want to run the processes (use_gpu) in all the devices in parallel.
import time
import torch
from torch.multiprocessing import Pool
torch.multiprocessing.set_start_method('spawn', force=True)
def use_gpu(ind):
t = []
for i in range(5):
time.sleep(1)
a = torch.randn(1000, 1000).cuda(ind)
t.append(a)
return t
if __name__ == "__main__":
# torch.cuda.set_device(3)
pool = Pool()
result = []
a = time.time()
for i in range(torch.cuda.device_count()):
result.append(pool.apply_async(use_gpu, (i,)))
pool.close()
pool.join()
print("cost time :", time.time() - a)
However I couldn’t figure out why I got the error message
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524590031827/work/aten/src/THC/THCTensorRandom.cu line=25 error=46 : all CUDA-capable devices are busy or unavailable
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524590031827/work/aten/src/THC/THCTensorRandom.cu line=25 error=46 : all CUDA-capable devices are busy or unavailable
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524590031827/work/aten/src/THC/THCTensorRandom.cu line=25 error=46 : all CUDA-capable devices are busy or unavailable