Autoencoder and Classification inside the same model

Hello everyone,

I am new to PyTorch . I would like to train a simple autoencoder and use the encoded layer as an input for a classification task (ideally inside the same model). This is my implementation:

class Mixed(nn.Module):
    def __init__(self, n_embedded):
        super(Mixed, self).__init__()
        self.encoder = nn.Sequential(
            nn.Linear(X_train.shape[1], n_embedded),
        self.decoder = nn.Sequential(
            nn.Linear(n_embedded, X_train.shape[1])
        self.classifier = nn.Sequential(
            nn.Linear(n_embedded, 1),
    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        out = self.classifier(encoded)
        return decoded, out
model = Mixed(40)
criterion1 = nn.MSELoss()
criterion2 = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

epochs = 150
for epoch in range(epochs): 
    for inputs, labels in train_loader:
        inputs = Variable(inputs)
        labels = Variable(labels)       
        decoded, out = model(inputs)
        loss1 = criterion1(decoded, inputs) 
        loss2 = criterion2(out, labels)
        loss = loss1 + loss2

When I run the model, it generates a meaningful result but I just want to make sure that the architecture is correct.

Thank you for your time.


The code looks fine.
Just a small side note: Variables are deprecated since PyTorch 0.4.0 so you can just use tensors instead in newer versions. :wink:

Thank you very much for your help!

Hi ptrblck,

I’m trying to implement a similar thing here, but

  1. use a pretrained network (e.g. mobilenetV2) as the classifier. So shall I just set self.classifier=models.mobilenet_v2(pretrained = pretrained)? Or is this not the correct way to do it?
  2. I have two image datasets. I wish to use dataset 1 in both the training of the AE and the classifier, but use dataset 2 only in the training of the AE (I don’t have the labels for dataset 2). Is it still possible to implement these inside one model?

Thank you in advance!

  1. A “classifier” in some CNNs such as VGG, ResNet etc. contains a few linear layers with activation functions between them. I’m not sure, if your self.classifier is supposed to work like that and to use the extracted features to output the predictions. If that’s the case, I wouldn’t use a complete pretrained model at this point.

  2. This should be possible and you could use e.g. a flag inside the forward method to switch between different branches. However, you should also consider the validation or deployment use case, where you might now have the information where the sample is coming from.

An SVM would also be a classifier, so I assume “instead of a classifier” means the model.classifier module.

It’s possible to feed your model outputs to other classifiers and the overall workflow depends what should be trained and which part of the model would be frozen.

E.g. if you do not want to train the PyTorch model anymore, but just use it’s features to train an SVM or another model, you could transform the outputs to numpy arrays and use e.g. scikit-learn or any other Python library.

However, if you want to train the PyTorch model, this won’t work out of the box, since Autograd won’t be able to track the numpy operations in the scikit-learn model and you would thus detach the computation graph by leaving PyTorch.

You could try to write the backward functions for all numpy operations manually via custom torch.autograd.Functions as described here or use MultiMarginLoss, which should be similar to the HingeLoss used in SVMs (I haven’t looked deeply into potential differences, but if I recall correctly, other users were using this criterion for the “SVM loss”).

1 Like

Thank You so much :slight_smile:

Are you able to see accuracy in a supervised manner?

hi @Petrucio & @ptrblck

is it necessary to add the losses, and then backward? I was also wondering if I could go through separate backward to evaluate them separately?


You can separately call backward on different losses (and would need to use loss1.backward(retain_graph=True) to keep the intermediate tensors for the second backward call if necessary), which would accumulate the gradients in all used parameters.

but setting retain_graph=True would connect the loss1 and loss2, right? I should set that attribute to True only if I want to connect them which summing them up before calling backward makes much more sense in that case. Additionally, is retain_graph=True necessary for the loss2 as well?


retain_graph doesn’t connect the losses, but keeps the intermediate activations after a backward call. This is necessary, if any other loss or output calling .backward() would need these activations from the forward pass to calculate the gradients.
The “connection” between the losses is defined by the forward pass.
E.g. if both losses were created using the same model (and thus parameters), the backward passes would calculate the gradients w.r.t. the same parameters as well.

1 Like

Hi @ptrblck sir
I am using a bit similar concept , found this question relevant so did not raise another one. I am trying to train to separate models using convolutional autoencoders and then will use both the encoded outputs for final classification, There is a paper on this concept. one of My model is this

class CNN1D(torch.nn.Module):
     def __init__(self, num_classes=7):
        super(CNN1D, self).__init__()     
     def forward(self,x):
         out,ind1= self.mp1(out)
         return encoded, out

The second one is also similar , I am stuck here dont know how to proceed, individually i can train both these networks , but how to extract these two encoded faetues and finally feed them to a classifier, Plz guide

Your code looks correct and you could feed both outputs to a new classifier.
I’m not familiar with the wanted workflow, but assume you would like to stack these features somehow before processing them further?

1 Like

Yes sir exactly. I need to use these encoded features, from both the models. I wanted to ask what should be the flow of model? Like do i need to train these models individually?
and then somehow need to store these features ? and then through third model final classification or, is there any way to do all this in a single code?

You could add the classifier into the same model or use it as a separate one and I think it depends more on your coding style or which approach would be more convenient to use in the long run.

Anyway, assuming you want to use these two features and a single classifier on top, you could use something like:

class CNN1D(torch.nn.Module):
     def __init__(self, num_classes=7):
        super(CNN1D, self).__init__()
        self.classifier = nn.Linear(in_features, nb_classes)
    def forward(self, x):
        encoded = out
        # flatten encoded
        encoded = encoded.view(encoded.size(0), -1)
        # concatenate both features in the feature dimension
        out =, out), dim=1)
        out = self.classifier(out)
        return out

Note that I’ve added the flattening operation to encoded, as I assume it’s a 3-dimensional tensor.
Also, you would have to check the in_features e.g. by adding a print statement in the forward and use this number in the self.classifier in_features.
You could of course use a “better” classifier by e.g. using nn.Sequential with more layers.

1 Like

Thank you so much for this answer sir.
Please correct me if m wrong…what I am getting here is
encoded is the bottleneck or reduced representation of the input data,
out is reconstructed representation of the data right?
if yes then why to concatenate these two?

what is my actual question is I have two such models different preprocessing for both of them , I need to train both the models for this bottleneck features and finally concatenate them for final classification plz find the attached picture.

Looking forward for your guidance

As both your networks outputs a fully connected layer, I think the work is much simplified for you, you don’t even need to flatten outputs or anything just concatenating them and passing through a linear layer should do the job for you

class MergerModel(nn.Module):
	def __init__(self, model1, model2):
		super(MergerModel, self).__init__()
		self.model1 = model1
		self.model2 = model2

		self.fc2 = nn.Linear(output_length_of_model1 + output_length_of_model2, nb_classes)

	def forward(self, x):
		x1 = self.model1(x)
		x2 = self.model2(x)

		x =, x2), dim=1)

		final_output = F.softmax(self.fc2(x))

		return x

Something like this should solve your problem

1 Like

Thank you @shivammehta007
But how to call these two models? I am using this function for training

Losses, Accuracies = fp.fit_sm(model=model, optimizer=optimizer, epochs=100,                                    trainloader=tr_loader, validloader=ts_loader,
                                   device=device, verbose=True)

Here fp.fit_sm is a function for model training,
I got these three models , model 1 , model 2 and merged model, but which model should i call here in this above snippet,I am blank here ,how to call these function here plz guide.

I assume you are using a higher-level API, which provides this fit_sm method.
If so, it should internally execute the forward and backward passes, as well as the optimization via:

output = model(data)
loss = criterion(output, target)

I don’t know what else is performed in this method, but these parts should be there at least.

Assuming you’ve floowed the suggestions and added the classifier to the entire model, you should be able to use it directly.
Note that I don’t see how the data is passed to this fit_sm method, but I guess that it might be part of the fp object.

1 Like