Adding nn.dataparallel to autoencoder

SU801T · May 15, 2020, 7:30pm

Hi,

I’m training a simple autoencoder over several GPUs (probably 4) with a batch size of 256 to 512. I have millions of examples to train. I want to make sure I am doing the right thing.

Below I have defined the autoencoder where I add self.encoder= nn.DataParallel(self.encoder) . Please see below:

class AutoEncoder(nn.Module):
    def __init__(self, n_embedded):
        super(AutoEncoder, self).__init__()
        self.encoder = nn.Sequential(
            nn.Linear(6144, n_embedded))
        self.encoder= nn.DataParallel(self.encoder)
        self.decoder = nn.Sequential(nn.Linear(n_embedded, 6144))
       
    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return encoded, decoded

I initiate my model before training by:

model = AutoEncoder(2048)
model= nn.DataParallel(model)
model.to(device)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(),weight_decay=1e-5)

Would this get the most out of the GPUS?

mrshenli · May 15, 2020, 7:51pm

Hey @SU801T I think you need do call model.to(device) before calling DataParallel ctor.

Would this get the most out of the GPUS?

DistributedDataParallel is expected to be faster than DataParallel. See this example

SU801T · May 16, 2020, 12:29am

Hi,

I have another issue.

So I have placed model.to(device) before calling nn.DataParallel. However, get this error:

RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:1

I’m assuming everything needs to be on cuda0 before splitting to the other devices. How do I fix that?

Cheers,

Taran

mrshenli · May 16, 2020, 2:19am

The code below works for me. Your original code has a DataParallel submodule within AutoEncoder (commented out below)? Is that intentional?

import torch
import torch.nn as nn

class AutoEncoder(nn.Module):
    def __init__(self, n_embedded):
        super(AutoEncoder, self).__init__()
        self.encoder = nn.Sequential(
            nn.Linear(n_embedded, n_embedded))
        #self.encoder= nn.DataParallel(self.encoder)
        self.decoder = nn.Sequential(nn.Linear(n_embedded, n_embedded))

    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return encoded, decoded


n_embedded = 20
model = AutoEncoder(n_embedded)
model.to(0)
model= nn.DataParallel(model)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(),weight_decay=1e-5)
loss = model(torch.ones(n_embedded, n_embedded))[0].sum()
loss.backward()

SU801T · May 16, 2020, 1:06pm

Yes, apologies. I did uncomment that line in the autonencoder class. I thought perhaps it was also needed…