Different Output for Multiple Linear Layer Output

For example in a class

def __init__():
  multi_output = [nn.Linear(50, 2) for i in range(10)]

def forward():
   outputs = [head(x) for head in multi_output]

Theoutputs keep changing in values at each run if I feed just a simple input tensor of 0’s.

No dropouts, running on eval().

Why would the ouputs keep changing at each run when I’m using the same model’s weights (network.load_state_dict())?

That’s strange…

So, you are calling __init__ only once and then apply forward on the same input multiple times without doing any optimization or changing the weights in any way, right?

Edit: I don’t know if this has something to do with your issue but the class you declared has something wrong: the methods of the class should have self as first argument. Moreover usually you want to subclass nn.Module and calling super in the constructor.


I was debugging and I realize the way to have the same results is by using torch.manual_seed(10) which is likely affecting the weight initializations.

But this doesn’t make sense considering I’m loading weights via network.load_state_dict()? Very bizarre.

Yes I’ve done all that (the usual template). I omitted them in the code above so it’s easier to just see my intention of a multi linear layer at the end of the network having different results.

Have you checked manually if the weights are still the same after loading of the model? At this point they should be different because you are returning different results, even though I don’t know why they are changed.

Weights are staying the same at each run on checking.

I reproduced the error and if I use a nn.ModuleList instead of a simple list to contain the modules the error is fixed.

As you can see by running the following example when I load the state_dict the weights are the same as the ones of the original module. By using a simple list to hold the linear modules instead they differ.

import torch

class M(torch.nn.Module):      
  def __init__(self): 
    super(M, self).__init__() 
    self.multi_output = torch.nn.ModuleList([torch.nn.Linear(50, 2) for i in range(10)]) 

  def forward(self, x): 
    outputs = [head(x) for head in self.multi_output] 
    return outputs 

m = M()
input = torch.zeros(10,50)
print( m(input)[0] )
print( m.multi_output[0].weight )

torch.save( m.state_dict(), 'm.pt' )

m = M()
print( m(input)[0])
print( m.multi_output[0].weight )
1 Like

Shucks you are right! Stupid mistake of mine. Thanks.

only thing i don’t understand is: at first you have (50,2) and then you have (50, 2) again, how is that even possible? there should be (2, x) on the second layer