Calling only encoder of autoencoder after training in parallel


I am new to GPU training, especially training in parallel on multiple GPU’s. I sometimes get lost moving data around devices and figuring out which model is where. Right now I am working with 4 V100 GPUs and training using parallel GPU training.

My issue currently is using an autoencoder for inference (i.e. generating reduced dimensionality data) after training on multiple GPUs in parallel. I need to first train or load an autoencoder, then use the ‘encode’ method of this autoencoder to generate data to train a second model. The code for my autoencoder is here:

# General neural net class
class Net(nn.Module):
    This class implements a decoder or encoder for the autoencoder class
    # initialize model                                                              
    def __init__(self, n_input, n_hidden_layer, n_hidden, n_output):                
        super(Net, self).__init__()                                                 
        # dense input layer                                                         
        self.input_layer = nn.Linear(n_input, n_hidden)                             
        # leaky ReLU nonlinear activation                                           
        self.internal_act = nn.SELU()                                               
        self.output_act = nn.Tanh()                                                 
        # number of hidden layer for looping operations                             
        self.n_hidden_layer = n_hidden_layer                                        
        # dropout layer                                                             
        self.drop = nn.Dropout(p=0.001)                                             
        # loop to generate uniform dense hidden layers                              
        for i in range(n_hidden_layer):                                             
            setattr(self, "h"+str(i), nn.Linear(n_hidden, n_hidden))                
        # output layer with specified shape                                         
        self.output_layer = nn.Linear(n_hidden, n_output)                           
    # feedforward calculation
    def forward(self, x):
        # take in input and output to hidden layer shape
        x = self.input_layer(x)

        # loop through nested hidden layer + LR activation
        for i in range(self.n_hidden_layer):
            x = getattr(self,"h"+str(i))(self.internal_act(x))
            # x = self.drop(x)

        # pass through final output layer
        x = self.output_layer(self.internal_act(x))

        # return output normalized to (-1,1) using Tanh
        return self.output_act(x)

    # save generated model
    def save(self):, "ED_net.pkl")

    # load existing model from pickle file
    def load(self):

# autoencoder class, inherits from NN class
class AutoEncoder(nn.Module):
    Implements an autoencoder using the above net class
    # initialize autoencoder
    def __init__(self, n_input, n_hidden_layer, n_hidden, n_reduced):

        # generate two Nets, for decoder and encoder operations
        self.encoder = Net(n_input, n_hidden_layer, n_hidden, n_reduced)
        self.decoder = Net(n_reduced, n_hidden_layer, n_hidden, n_input)

        # pass to nn.Sequential object
        self.train_pipeline = nn.Sequential(self.encoder, self.decoder)

    # forward propagation (encode + decode)
    def forward(self, x):
        return self.train_pipeline(x)

    # save generated model
    def save(self, fname):
        if fname is not None:
  , "Models/"+fname+".pkl")
  , "Models/Autoencoder.pkl")

    # load existing model from pickle file
    def load(self, fname):
        if fname is not None:

    # encode data to reduced dimensionality form
    def encoder_forward(self,x):
        return self.encoder(x)

    # decode reduced dimensionality data
    def decoder_forward(self,x):
        return self.decoder(x)

    DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    # store feature number constant
    NUM_REDUCED = 20
    NUM_HIDDEN = 2 
    SIZE_HIDDEN = 40

    # instantiate model and send to device
    # use all GPU's of available
    if torch.cuda.device_count() > 1 and not args.loadauto:
        auto = nn.DataParallel(auto, device_ids=[0,1,2,3])

After I train this model, I try to use it for inference. I was having trouble with trying to use this parallel GPU model for inference, mainly because I need to call the encoder_forward function so I only use the encoder, but I can’t access that because my function is wrapped in ‘DataParallel’. What I’ve been trying to do is save the model parameters and reinitialize the model by loading those saved parameters. It was the easiest solution since I need to do lots of inference later.

    # save autoencoder from CPU or GPU training
        if isinstance(auto, DataParallel):


    # regenerate model for inference
    print('\n### RELOADING MODEL FOR INFERENCE ###\n')
    print('### MODEL RELOADED ###\n')


    # generate array with all timestep data
    fname = os.path.join(os.getcwd(),'timestepdata_gri.npy')
    timestepdata = np.load(fname)

    # convert (selected) timestep data to tensor
    normeddata = torch.tensor(traindata.scale_extern(timestepdata[:,3:]), dtype=torch.float32)

    # infer reduced dimension data
    with torch.no_grad():
        reduceddata = autoinf.encoder_forward(normeddata).detach().numpy()

The issue arises when I try to use autoinf.encoder_forward method with this autoencoder. I get this error:

Traceback (most recent call last):
  File "/panfs/roc/groups/13/suo-yang/dikem003/DimensionReductionNLE/auto_ode/", line 475, in <module>
    reduceddata = autoinf.encoder_forward(normeddata).detach().numpy()
  File "/panfs/roc/groups/13/suo-yang/dikem003/DimensionReductionNLE/auto_ode/", line 127, in encoder_forward
    return self.encoder(x)
  File "/home/suo-yang/dikem003/.conda/envs/torchcombust/lib/python3.9/site-packages/torch/nn/modules/", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/panfs/roc/groups/13/suo-yang/dikem003/DimensionReductionNLE/auto_ode/", line 69, in forward
    x = self.input_layer(x)
  File "/home/suo-yang/dikem003/.conda/envs/torchcombust/lib/python3.9/site-packages/torch/nn/modules/", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/suo-yang/dikem003/.conda/envs/torchcombust/lib/python3.9/site-packages/torch/nn/modules/", line 94, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/suo-yang/dikem003/.conda/envs/torchcombust/lib/python3.9/site-packages/torch/nn/", line 1753, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: Tensor for argument #2 'mat1' is on CPU, but expected it to be on GPU (while checking arguments for addmm)

I think argument #2 means the weights, meaning the weights of my encoder model.

Here’s my thinking: I remake the autoencoder from scratch using generically saved parameters, I push this model to the GPU since my device is listed as ‘cuda’ and training worked fine anyways. Why would this not work? If my overall model is on the GPU, shouldn’t its submodules also be on the GPU? I’m having a hard time understanding why it isn’t working how I expect it to. Is this some weirdness with training a model in parallel or am I screwing up pushing the data to the GPU?

The to() operation is not an inplace operation on tensors, so you would need to reassign normeddata:

normeddata =

Hi Peter,

Thanks a bunch, that does seem to be the problem! Much appreciated :slight_smile: