Create new Model from some of layers of already Pre-trained model

Hi Anuj,

This definitely could work, but just wanted to point out a few things.

First, note that you did match the weights, but not the biases, which might cause an issue if you’re not aware of it:

print(
    torch.allclose(model.encoder[0].weight, hidden.layer1.weight),
    torch.allclose(model.encoder[0].bias,   hidden.layer1.bias)
)
Output:
True False

Second, I think the two models are now sharing same weight tensors in those layers, which means that updating the values in one place will change them in the other place. This might be fine if it’s what you intended, but it’s important to be aware that it’s happening.

To illustrate, suppose we make a copy of the shared tensor, and we confirm that the copy preserves the values:

weight_initial = model.encoder[0].weight.detach().clone()
print(torch.allclose(model.encoder[0].weight, weight_initial))
Output:
True

Then we train the “Latent” model for a few steps:

from torch import optim
criterion = nn.MSELoss()
optimizer = optim.SGD(hidden.parameters(), lr=1e-3)
for _ in range(10):
    optimizer.zero_grad()
    input = torch.randn(1, 29)
    target = torch.randn(1, 7)
    output = hidden(input)  # explicitly updating "latent" model only, not "autoencoder"
    loss = criterion(output, target)
    loss.backward()
    optimizer.step()

Even though we never explicitly manipulated the Autoencoder, you can see its weights have changed, since they were shared with the Latent model:

print(torch.allclose(model.encoder[0].weight, weight_initial))  # "autoencoder" weights have changed
Output:
False

Relatedly, keep in mind that when you train the Latent model, all of its parameters are going to be changed, including the ones that you transferred over from the Autoencoder. If you don’t want this behavior, and you only want to train some parameters (the new ones) but not others (not the transferred ones), you can “freeze” the desired parameters, as described here.

Good luck!
Andrei

1 Like