SU801T
(S)
May 15, 2020, 7:30pm
1
Hi,
I’m training a simple autoencoder over several GPUs (probably 4) with a batch size of 256 to 512. I have millions of examples to train. I want to make sure I am doing the right thing.
Below I have defined the autoencoder where I add self.encoder= nn.DataParallel(self.encoder)
. Please see below:
class AutoEncoder(nn.Module):
def __init__(self, n_embedded):
super(AutoEncoder, self).__init__()
self.encoder = nn.Sequential(
nn.Linear(6144, n_embedded))
self.encoder= nn.DataParallel(self.encoder)
self.decoder = nn.Sequential(nn.Linear(n_embedded, 6144))
def forward(self, x):
encoded = self.encoder(x)
decoded = self.decoder(encoded)
return encoded, decoded
I initiate my model before training by:
model = AutoEncoder(2048)
model= nn.DataParallel(model)
model.to(device)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(),weight_decay=1e-5)
Would this get the most out of the GPUS?
mrshenli
(Shen Li)
May 15, 2020, 7:51pm
2
Hey @SU801T I think you need do call model.to(device)
before calling DataParallel
ctor.
Would this get the most out of the GPUS?
DistributedDataParallel
is expected to be faster than DataParallel
. See this example
1 Like
SU801T
(S)
May 16, 2020, 12:29am
3
Hi,
I have another issue.
So I have placed model.to(device) before calling nn.DataParallel. However, get this error:
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:1
I’m assuming everything needs to be on cuda0 before splitting to the other devices. How do I fix that?
Cheers,
Taran
mrshenli
(Shen Li)
May 16, 2020, 2:19am
4
The code below works for me. Your original code has a DataParallel
submodule within AutoEncoder
(commented out below)? Is that intentional?
import torch
import torch.nn as nn
class AutoEncoder(nn.Module):
def __init__(self, n_embedded):
super(AutoEncoder, self).__init__()
self.encoder = nn.Sequential(
nn.Linear(n_embedded, n_embedded))
#self.encoder= nn.DataParallel(self.encoder)
self.decoder = nn.Sequential(nn.Linear(n_embedded, n_embedded))
def forward(self, x):
encoded = self.encoder(x)
decoded = self.decoder(encoded)
return encoded, decoded
n_embedded = 20
model = AutoEncoder(n_embedded)
model.to(0)
model= nn.DataParallel(model)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(),weight_decay=1e-5)
loss = model(torch.ones(n_embedded, n_embedded))[0].sum()
loss.backward()
1 Like
SU801T
(S)
May 16, 2020, 1:06pm
5
Yes, apologies. I did uncomment that line in the autonencoder class. I thought perhaps it was also needed…