Hi, I’m a rookie in parallel.
Recently, I was trying to make my code parallel.
Because of python’s global interpreter lock, I choose to use process parallel.
I tried python’s native lib multiprocess, It raise an error when importing torch.
So I tried pytorch’s multiprocess, it also failed because of some weird errors.
For example, CUDA: Out of Memory, and It shouldn’t happen.
It will occupy tons of memory, and I don’t get it.
I found the logic of multiprocess is also weird.
By the way, when num of subprocesses is low, it won’t fail anymore.
And another tough question is paralleled code is much slower.
Here’s comparison:
8.354568481445312 # paralleled
0.00092315673828125 #unparalleled
When I debug, it seems like every subprocess doesn’t only run functions, they run from the start.
from multiprocessing.dummy import freeze_support
from time import time
import torch
from torch import multiprocessing as mp
def f(x):
x *= torch.randn((3,24,24),device='cuda:0')
if __name__ == "__main__":
freeze_support()# for windows support
data = torch.randn(16,3,24,24,device='cuda:0')
mp.set_start_method('spawn')# fail log prompt me to run this
pool = mp.Pool(mp.cpu_count())
start = time()
for i in data:
pool.apply_async(f,i)
pool.close()
pool.join()
print(time()-start)
start = time()
for i in data:
f(i)
print(time()-start)
Full Error Log
My Running Environment:
i9-10940X
RTX3090
Ubuntu
Python3.8
torch 1.10.0+cu113
Thank you!