Adding nn.dataparallel to autoencoder

Hi,

I’m training a simple autoencoder over several GPUs (probably 4) with a batch size of 256 to 512. I have millions of examples to train. I want to make sure I am doing the right thing.

Below I have defined the autoencoder where I add self.encoder= nn.DataParallel(self.encoder) . Please see below:

class AutoEncoder(nn.Module):
    def __init__(self, n_embedded):
        super(AutoEncoder, self).__init__()
        self.encoder = nn.Sequential(
            nn.Linear(6144, n_embedded))
        self.encoder= nn.DataParallel(self.encoder)
        self.decoder = nn.Sequential(nn.Linear(n_embedded, 6144))
       
    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return encoded, decoded

I initiate my model before training by:

model = AutoEncoder(2048)
model= nn.DataParallel(model)
model.to(device)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(),weight_decay=1e-5)

Would this get the most out of the GPUS?

Hey @TaranRai I think you need do call model.to(device) before calling DataParallel ctor.

Would this get the most out of the GPUS?

DistributedDataParallel is expected to be faster than DataParallel. See this example

1 Like

Hi,

I have another issue.

So I have placed model.to(device) before calling nn.DataParallel. However, get this error:

RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:1

I’m assuming everything needs to be on cuda0 before splitting to the other devices. How do I fix that?

Cheers,

Taran

The code below works for me. Your original code has a DataParallel submodule within AutoEncoder (commented out below)? Is that intentional?

import torch
import torch.nn as nn

class AutoEncoder(nn.Module):
    def __init__(self, n_embedded):
        super(AutoEncoder, self).__init__()
        self.encoder = nn.Sequential(
            nn.Linear(n_embedded, n_embedded))
        #self.encoder= nn.DataParallel(self.encoder)
        self.decoder = nn.Sequential(nn.Linear(n_embedded, n_embedded))

    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return encoded, decoded


n_embedded = 20
model = AutoEncoder(n_embedded)
model.to(0)
model= nn.DataParallel(model)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(),weight_decay=1e-5)
loss = model(torch.ones(n_embedded, n_embedded))[0].sum()
loss.backward()
1 Like

Yes, apologies. I did uncomment that line in the autonencoder class. I thought perhaps it was also needed…