Using decoder as part of loss function

mtnpickles · May 2, 2022, 4:33am

Hi,

I have trained an autoencoder, e.g.:

class autoencoder(nn.Module):
    def __init__(self):
        super().__init__()

        self.encoder = nn.Sequential( )
        self.decoder= nn.Sequential( )

    def forward(self,x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

and let’s say

AE_model = autoencoder()

While I’m training another new model e.g.:

new_model = NewModel()

I tried to use the decoder part of the autoencoder as part of my loss function. First, I freeze the parameters:

for param in AE_model.parameters():
   param.requires_grad=False

Then, during training:

mseloss = nn.MSELoss()
optimizer = torch.optim.Adam(new_model.parameters())

prediction = new_model(batch_x)
decoded_prediction = AE_model.decoder(prediction)
Loss = mseloss(decoded_prediction, batch_y)
Loss.backward()
optimizer.step()

Problem: the loss doesn’t seem to decrease and is always stuck at one value… Can someone please help?

Thanks.

InnovArul · May 2, 2022, 5:41am

I am not sure of the exact reason.
Are you setting AE_model to eval mode?

mtnpickles · May 2, 2022, 5:48am

I did… Also, the new_model is training fine when I didn’t use the AE_model.decoder, i.e., when I just predict the encoded outputs directly.

InnovArul · May 2, 2022, 6:21am

Can you elaborate a bit more on this and the motivation for doing this?
What are the encoded outputs and how do you predict them directly?

ksmdanl · May 2, 2022, 9:14am

For debugging purpose, I would to try to remove the encoder part in autoencoder() and attach NewModel() to autoencoder() as the new encoder, train only the new attached encoder part while the trained decoder stays freezed, as you did here, and see what changes.

mtnpickles · May 2, 2022, 6:02pm

So I trained an autoencoder (AE_model) on a dataset of Domain B. I want to train another model (new_model) that takes in dataset of Domain A, and output predictions in Domain B.

First, I encode the dataset of Domain B using (AE_model.encoder).

Then, my idea was to have new_model to predict encoded dataset of Domain B, then have AE_model.decoder to decode back to the original state.

When I tried using a Loss function of:

encoded_prediction = new_model(X_valid)
Loss(encoded_prediction, encoded_GT)

it works fine. The training loss decreases nicely and so does my validation loss.

Then, when I tried incorporating the decoder as part of my Loss function:

encoded_prediction = new_model(X_valid)
decoded_prediction = AE_model.decoder(encoded_prediction )
Loss(decoded_prediction , decoded_GT)

the training and validation loss do not decrease at all.

mtnpickles · May 2, 2022, 6:09pm

I did exactly just this:

class combined(nn.Module): 
    def __init__(self):
        super().__init__()
        self.AE_model = autoencoder()
        self.AE_model.load_state_dict(torch.load('some_weights.pth'))
        
        #####Freezing all weights in AE
        for param in self.AE_model.parameters():
            param.requires_grad = False

        #### New Model
        self.block = nn.Sequential ( )
       
    def forward(self, x):
        x = self.block(x)
        x = self.AE_model.decoder(x)
        return x

The training and validation loss still does not change a bit. . This is so weird, as when I do the Loss directly at the encoded prediction, it works fine. Just by adding the decoder made it fail.

mtnpickles · May 2, 2022, 7:05pm

Update: I may have found the problem and a solution. I think it turns out that my encoded_prediction has positive values while my new_model takes in positive and negative values. So I guess the problem is very nonlinear and the NN wasn’t able to initialize the weights properly in the encoded_prediction, thus the autoencoder.decoder is having a hard time finding a gradient of steepest descent. I fixed it by changing my ReLU to Tanh activations.