Run two GPU task in parallel

zmonoid · March 19, 2019, 6:32am

I want do to something like this:

model1 = torch.nn.DataParallel(Net(), device_ids=[0,1]).cuda()
model2 = torch.nn.DataParallel(Net(), device_ids=[2,3]).cuda()

def task(model, data):
  ...
  return out

out1 = task(model1, data1)
out2 = task(model2, data2)

I want to get out1 out2 in parallel, using torch.multiprocessing will cause error. Any idea on how to do this

albanD · March 19, 2019, 10:35am

Hi,

You can just call these functions one after the other and they will run in parallel. The cuda api is completely asynchronous and so it will happen automatically.

zmonoid · March 19, 2019, 7:02pm

Not really. I checked the GPU usage with

watch -n 0.1 nvidia-smi

It seems that they actually runs in sequential. The GPU usage will be high to low one after the other.
I can better confirm that if there is any profiling tool for pytorch GPU.

albanD · March 20, 2019, 10:48am

Hi,

In that case that means that you have some synchronization points in your model.
You don’t want to do any cpu/gpu op. which means copy, printing or .item() call on gpu data.
In particular, you should send all data on the gpu. Then forward both nets. Then use the output of both nets.

You can use nvidia visual profiler to check for this in more details but it is quite extensive and might be too complex for what you need.
A simple trick to check where the sync happens is to time how long a torch.cuda.synchronize() call takes to return: just after the forward, it should take a but of time as all the network’s ops still run. if you do a .item() on a gpu tensor then time the synchronize, it will return instantly.

zmonoid · March 20, 2019, 5:21pm

Many thanks for your answer!