my raw GPU tensor is so big,can’t afford to copy that in multiprocessing.
And I need to use multiproceesing,cuz I have to do calculation in both cpu/gpu.
is there any way to do that?should I use libtorch or cupy or something else?
Here is my code:
import torch
import time
def task(o):
print(o.data_ptr())
print(o, '034')
stream = torch.cuda.Stream()
with torch.cuda.stream(stream):
print(o+o)
if __name__ == '__main__':
s_mp = []
o = torch.randn((5000, 5000)).pin_memory().cuda()
torch.multiprocessing.set_start_method('spawn')
print(o.data_ptr(), 3434)
for i in range(4):
s_mp.append(torch.multiprocessing.Process(target=task, args=(o,)))
for mp in s_mp:
mp.start()
for mp in s_mp:
mp.join()
thanks for your time to read this