Fixing weights locally in a neural network

I have trained a straighforward network and I have saved their weights. Then I wanted to train an autoencoder using the weights of the previous trained model, training only the decoder part. I have specificied that I do not want gradients in the encoder part in order to not optimize their parameters. Also I have specified in the optimizer that I only want to optimize the parameters of the decoder part. I do not get any error, but it is not working well because when I compare the two dictionaries of weights of the encoder and the pretrained model, they are not the same. I have attached the models I am using for: the pretrained model is “Class1”, and the autoencoder is “Class2”(the encoder part is self.layers).This two classes are in a file named models.py.




Please do not pay attention to the comments in the code, I have tried so many things. Thanks in advance!!!

Could you post a minimal, executable code snippet reproducing this issue by wrapping it into three backticks ```, please?

My code is a bit long and I am using 70000 images as dataset, but the parts in the code in which I save and load the model is what I have attached before. For example if I print “model.layers.state_dict()[‘4.bias’]” in FFN() I get “tensor([-0.0336, -0.0421])”. But if I print the same in the Auto_encod_trained I get “tensor([0.0705, 0.0294])”, and they should be the same. I basically want to know if there is something more to take into account in order to fix the weights of the encoder part during the training with the weights of the pretained model (in my case FFN()). Thanks!!

No, nothing else should be needed and it works fine using:

class Auto_encod_trained(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(768, 100),
            nn.ReLU(),
            nn.Linear(100, 100),
            nn.ReLU(),
            nn.Linear(100, 2)
        )
        
class FFN(nn.Module):
    def __init__(self):
        super().__init__()
        self.olayer = nn.Linear(100, 2)
        self.layers = nn.Sequential(
            nn.Linear(768, 100),
            nn.ReLU(),
            nn.Linear(100, 100),
            nn.ReLU(),
            self.olayer
        )
        
model_a = Auto_encod_trained()
model_b = FFN()

print((model_a.layers[0].weight == model_b.layers[0].weight).all())
# tensor(False)

model_b.layers.load_state_dict(model_a.layers.state_dict())
# <All keys matched successfully>

print((model_a.layers[0].weight == model_b.layers[0].weight).all())
# tensor(True)

Okey, thanks!!. I think the problem is when I read the state_dict() of model a. Because I have the two models in different scripts I have saved “model a” using torch.save() as this:

model_path = ‘/home/ben/Desktop/model_trained_encoder.pth’
torch.save(model.layers.state_dict(),model_path)

Then in the second script, when I load it like this:

model = models.Auto_encod_trained()
model.layers.load_state_dict(torch.load(‘/home/ben/Desktop/model_trained_encoder.pth’))

,does not copy the weights correctly. Maybe is the format of the file in which I am saving the model, I don’t know. Anyway, thanks a lot!!

No,I don’t think it’s the format and PyTorch should raise an error is keys are missing or mismatching while loading the state_dict. Let me know if you have a code snippet to reproduce the issue.

Okey, in your example if you save “model a” as model_a.pth with torch.save, and then you load it as follows:

‘’’
model = model_b()
model.layers.load_state_dict(torch.load(‘model_a.pth’))

‘’’
It works for you?. I mean both dictionaries, model_b.state_dict() and model_a.state_dict() match correctly?. Thanks!!

Yes, my previous code snippet still works for me by adding the torch.save and torch.load calls additionally.