PyTorch: How to parallelize over multiple GPU using multiprocessing.pool

heavyfranz · October 5, 2020, 11:06am

Any news? Have you solved the problem? How? I think that the heart of @bapi answer is that you have to manually transfer each input array (a fraction of it or the same, it depends on your problem)

I solved like this:

import time
import torch
from torch.multiprocessing import Pool
torch.multiprocessing.set_start_method('spawn', force=True)


def use_gpu(ind, arr):
    return (arr.std() + arr.mean()/(1+ arr.abs())).sum()


def mysenddata(mydata):
    return [(ii, mydata[ii].cuda(ii)) for ii in range(4)]


if __name__ == "__main__":
    print('create big tensor')
    aa = 10*torch.randn(4,10000,10000).double()
    print('send data')
    b = mysenddata(aa)

    for ii in range(10):
        pool = Pool(processes=4)
        a = time.time()
        print('start')
        with Pool(processes=4) as p:
        #result = pool.starmap(use_gpu, b,)
            results = p.starmap(use_gpu, b,)
        print('end')
        print("cost time :", time.time() - a)
        
        for ii, (rr, bb) in enumerate(zip(results, b)):
            print('idx:{}, inshape:{}, indevice:{}, intype:{}, outshape:{}, outdevice:{}, outtype:{}'.format(ii, bb[1].shape, bb[1].get_device(), bb[1].type(), rr.shape, rr.get_device(), rr.type()))

This code seems ok for general gpu processing, but it will not work if the backward method has to be called. Do someone have a simple tutorial on simple multi gpu processing done on multi-gpus?