Run two models parallel more than one GPU?

Z_Huang · April 29, 2022, 9:13pm

The following is my code example, but don’t know how to make that happen. I tried to look for torch multiprocessing, but no luck.

device = 'cuda:0'
net1 = nn.DataParallel(net1, device_ids=[0, 1, 2])
net1 = net1.to(device)
net2 = nn.DataParallel(net2, device_ids=[0, 1, 2])
net2 = net2.to(device)

for epoch in range(start_epoch, 20):   
        train_model(epoch, net1, train_data1, test_data1, optimizer1, scheduler1)
        train_model(epoch, net2, train_data2, test_data2, optimizer2, scheduler2)

Any help?

Dazitu616 · April 30, 2022, 2:46am

You warp net1 with nn.DataParallel but then you reassign it with net1 = wide_net.to(device)? Since device is just one GPU, I guess that’s why net1 (and net2) only run on one GPU

Z_Huang · April 30, 2022, 2:50am

I got a typo, wide_net is net1. I just fixed my post now.

Z_Huang · April 30, 2022, 3:03am

Should I only need the following code?

net1 = nn.DataParallel(net1, device_ids=[0, 1, 2])
net2 = nn.DataParallel(net2, device_ids=[0, 1, 2])

The following code is unnecessary?

net1 = net1.to(device)
net2 = net2.to(device)

When I do the following code

device = 'cuda:0'
net1 = nn.DataParallel(net1, device_ids=[0, 1, 2])
net1 = net1.to(device)
net2 = nn.DataParallel(net2, device_ids=[0, 1, 2])
net2 = net2.to(device)

They run on multiple GPUs, but not parallel.

I am following this tutorial through: Optional: Data Parallelism — PyTorch Tutorials 1.11.0+cu102 documentation

Dazitu616 · April 30, 2022, 8:52pm

I think your code should work… Usually I just do

net1 = net1.cuda()
net1 = nn.DataParallel(net1, device_ids=[0, 1, 2])

What do you mean by parallel? If they run on multiple GPUs, that means it splits the batch data and run a subset of them on each GPU. I think this is exactly parallel in PyTorch means. Data parallel in PyTorch is not the multiprocessing behavior we have in python

Z_Huang · May 1, 2022, 12:33am

I mean multiple GPUs and multiple models.