Why are my models initialized differently?

Hi all,

I’ve searched the forum for my question but couldn’t find anything similar. Here’s my problem.

I have this architecture with 4 different nn.Sequential blocks (shared below). I’m trying to investigate the impact of latent_size with the values of 4, 8, and 16. As it is seen, “self.lf_model” does not depend on the input argument latent_size. However, the initial weights of “self.lf_model” changes with the changing latent_size even though I use seeding as torch.manual_seed(15451), random.seed(15451), and, np.random.seed(15451).

class Model(nn.Module):
def init(self, latent_size):
super(Model, self).init()

    self.encoder    =  nn.Sequential(nn.Linear(in_features=4096, out_features=1024),
                                    nn.Linear(in_features=1024, out_features=256),
                                    nn.Linear(in_features=256, out_features=64),
                                    nn.Linear(in_features=64, out_features=latent_size)
                                    )

    self.lf_model   = nn.Sequential(nn.Linear(in_features=4096, out_features=512), nn.ReLU(), 
                                    nn.Linear(in_features=512, out_features=128), nn.ReLU(),
                                    nn.Linear(in_features=128, out_features=16), nn.ReLU(),
                                    nn.Linear(in_features=16, out_features=2)
                                    )
    

    self.lc_model   = nn.Sequential(nn.Linear(in_features=latent_size+2, out_features=32), nn.Linear(in_features=32, out_features=2))

    self.nlc_model  = nn.Sequential(nn.Linear(in_features=latent_size+2, out_features=32), nn.ReLU(),
                                    nn.Linear(in_features=32, out_features=32), nn.ReLU(),
                                    nn.Linear(in_features=32, out_features=2))

Why do I get different initial weights for “self.lf_model”? How can I initialize them the same?

Thank you in advance!

If you would like to see it, here are initial self.lf_model initial weights for different latent_size.

latent_size = 4
model = Model(latent_size).cuda()
print(model.lf_model[0].weight)

Parameter containing:
tensor([[ 8.1891e-03, 1.4299e-02, -8.9244e-03, …, -1.4864e-02,
1.0010e-02, -2.7618e-03],
[ 5.1851e-03, -8.8850e-03, 1.0219e-02, …, 3.1422e-04,
-2.7850e-03, 3.5177e-03],
[-2.3300e-05, -1.4028e-02, -8.1398e-03, …, 6.1992e-03,
-1.8799e-03, 3.8405e-03],
…,
[ 2.7780e-03, -3.9676e-03, 1.3176e-02, …, -3.9793e-03,
1.0665e-02, -9.7567e-03],
[-1.3217e-02, 4.1098e-04, 1.0644e-02, …, -5.4149e-03,
-1.0527e-02, 1.3252e-02],
[ 8.9856e-03, 9.2430e-03, -6.2179e-04, …, 3.8803e-03,
-1.3446e-02, 2.1152e-03]], device=‘cuda:0’, requires_grad=True)

latent_size = 8
model = Model(latent_size).cuda()
print(model.lf_model[0].weight)

Parameter containing:
tensor([[-0.0076, -0.0055, 0.0103, …, -0.0006, 0.0016, 0.0093],
[-0.0151, 0.0023, -0.0092, …, 0.0107, -0.0094, -0.0080],
[-0.0088, -0.0056, -0.0116, …, -0.0095, 0.0092, -0.0143],
…,
[ 0.0132, -0.0096, 0.0010, …, -0.0121, 0.0117, 0.0089],
[-0.0092, 0.0075, 0.0073, …, -0.0044, -0.0027, -0.0148],
[-0.0021, -0.0137, -0.0100, …, 0.0074, 0.0108, 0.0043]],
device=‘cuda:0’, requires_grad=True)

latent_size = 16
model = Model(latent_size).cuda()
print(model.lf_model[0].weight)

Parameter containing:
tensor([[-3.9591e-03, -1.2846e-02, 9.7075e-03, …, 1.2239e-02,
4.1199e-03, -8.0544e-03],
[ 6.0928e-03, -6.3468e-03, 2.9797e-03, …, 1.6150e-03,
4.3071e-03, -6.5914e-03],
[-4.8567e-03, 2.6767e-03, 2.6474e-03, …, -6.6011e-03,
1.5258e-02, 1.0542e-02],
…,
[-5.6493e-03, -1.1037e-02, 1.4620e-03, …, -6.7706e-03,
3.3715e-03, 1.0564e-02],
[ 1.0363e-02, -6.8797e-04, -6.7189e-03, …, 1.3069e-03,
-9.9623e-03, -1.0964e-03],
[ 1.1568e-03, -8.8054e-03, -7.8261e-03, …, 1.1733e-02,
4.6302e-05, -1.3099e-02]], device=‘cuda:0’, requires_grad=True)

self.encoder uses latent_size as the out_features of the last linear layer and thus to sample the parameters from the pseudorandom number generator (PRNG). Since seeding guarantees to generate the same sequence of random numbers, it’s expected that self.lf_model uses different random values, since the calls into the PRNG changed.

Thank you for the explanation.

I used a simple workaround. I initialized and saved a model. For each latent_size, I used that base model with one layer edited.