Z_Huang
(Z Huang)
April 29, 2022, 9:13pm
1
The following is my code example, but don’t know how to make that happen. I tried to look for torch multiprocessing, but no luck.
device = 'cuda:0'
net1 = nn.DataParallel(net1, device_ids=[0, 1, 2])
net1 = net1.to(device)
net2 = nn.DataParallel(net2, device_ids=[0, 1, 2])
net2 = net2.to(device)
for epoch in range(start_epoch, 20):
train_model(epoch, net1, train_data1, test_data1, optimizer1, scheduler1)
train_model(epoch, net2, train_data2, test_data2, optimizer2, scheduler2)
Any help?
You warp net1
with nn.DataParallel
but then you reassign it with net1 = wide_net.to(device)
? Since device
is just one GPU, I guess that’s why net1
(and net2
) only run on one GPU
Z_Huang
(Z Huang)
April 30, 2022, 2:50am
3
I got a typo, wide_net is net1. I just fixed my post now.
Z_Huang
(Z Huang)
April 30, 2022, 3:03am
4
Should I only need the following code?
net1 = nn.DataParallel(net1, device_ids=[0, 1, 2])
net2 = nn.DataParallel(net2, device_ids=[0, 1, 2])
The following code is unnecessary?
net1 = net1.to(device)
net2 = net2.to(device)
When I do the following code
device = 'cuda:0'
net1 = nn.DataParallel(net1, device_ids=[0, 1, 2])
net1 = net1.to(device)
net2 = nn.DataParallel(net2, device_ids=[0, 1, 2])
net2 = net2.to(device)
They run on multiple GPUs, but not parallel.
I am following this tutorial through: Optional: Data Parallelism — PyTorch Tutorials 1.11.0+cu102 documentation
I think your code should work… Usually I just do
net1 = net1.cuda()
net1 = nn.DataParallel(net1, device_ids=[0, 1, 2])
What do you mean by parallel? If they run on multiple GPUs, that means it splits the batch data and run a subset of them on each GPU. I think this is exactly parallel
in PyTorch means. Data parallel in PyTorch is not the multiprocessing behavior we have in python
Z_Huang
(Z Huang)
May 1, 2022, 12:33am
6
I mean multiple GPUs and multiple models.