Varying updated weights in simple model for same input

Arun_Vishwanathan · August 28, 2019, 10:13pm

I have a simple Pytorch model with a single dense and relu layer.
I set the seed to have a fixed starting weight and also to have a fixed input to the model
as below.

torch.manual_seed(0)
net = nn.Sequential(OrderedDict({"fc1": nn.Linear(20, 2, bias=False),
                                     "relu": nn.ReLU()}))

# obtain the initial weights set
dense_weights = np.array(net.fc1.weight.data.numpy())

print("initial_weight")
print(dense_weights)

# this is the input to be supplied to the model
# of size (20,)
np.random.seed(0)
arr = np.random.rand(20)
print("supplied input")
print(arr)

# these are the desired weights if the model were to converge
# we do this so that we know we can have something to achieve
# if the model were to be trained
np.random.seed(1)
rndw2 = np.random.rand(*net.fc1.weight.shape)
potential_final_weights = rndw2
# this is the label so as to say.
sample_output = np.dot(np.array(potential_final_weights), arr)

 # SGD with momentum
 optimizer = optim.SGD(net.parameters(), lr=1, momentum=0.9)
 optimizer.zero_grad()

 output = net(torch.Tensor(np.expand_dims(arr, axis=0)))
 target = torch.Tensor(np.expand_dims(sample_output, axis=0))

 # use MSE loss
 criterion = nn.MSELoss()
 loss = criterion(output, target.float())

 print("loss obtained")
 print(loss)

 loss.backward()
 optimizer.step()

 # updated weights after one training input
 updated_weights = np.array(net.fc1.weight.data.numpy())
 print("updated_weights")
 print(updated_weights)

Now I see two types of results when I run and am pretty confused about why that is the case:
sometimes when I run I get this output:

initial_weight
[[-0.0016741   0.11995244 -0.18403849 -0.16456097 -0.08612314  0.0599618
  -0.00443037  0.17729548 -0.01984377  0.05916917 -0.0675769  -0.04395336
  -0.21362242 -0.14809078 -0.09217589  0.0082832   0.08839965  0.13416916
  -0.15159222 -0.09737244]
 [ 0.08121783  0.18568039 -0.04601832  0.16732758 -0.03604169  0.02366075
   0.20247063 -0.2074334  -0.14076896 -0.05660947 -0.08716191  0.19319645
  -0.14493737 -0.10293356 -0.15622076 -0.2094214  -0.13052833  0.19221196
   0.09977746  0.10837609]]

supplied input
[0.5488135  0.71518937 0.60276338 0.54488318 0.4236548  0.64589411
 0.43758721 0.891773   0.96366276 0.38344152 0.79172504 0.52889492
 0.56804456 0.92559664 0.07103606 0.0871293  0.0202184  0.83261985
 0.77815675 0.87001215]

loss obtained
tensor(28.8879, grad_fn=<MseLossBackward>)

updated_weights
[[ 2.6501863   3.575739    2.728507    2.4683082   1.960972    3.1809146
   2.109986    4.4863324   4.636564    1.911954    3.7580292   2.5116606
   2.5311623   4.3243814   0.2510695   0.42929098  0.1860947   4.1573787
   3.608452    4.106516  ]
 [ 3.3013914   4.382067    3.490707    3.36444     2.4497607   3.8134565
   2.7700217   5.0250616   5.5135403   2.193241    4.5582995   3.2964973
   3.1880748   5.3280225   0.26058462  0.30181146 -0.01189651  5.077625
   4.6656265   5.2131886 ]]

but sometimes I see very different updated weights though as you can see the initial weights and the input is just the same! Also the loss looks different. Could someone please help me understand the reason for this difference?

initial_weight
[[-0.0016741   0.11995244 -0.18403849 -0.16456097 -0.08612314  0.0599618
  -0.00443037  0.17729548 -0.01984377  0.05916917 -0.0675769  -0.04395336
  -0.21362242 -0.14809078 -0.09217589  0.0082832   0.08839965  0.13416916
  -0.15159222 -0.09737244]
 [ 0.08121783  0.18568039 -0.04601832  0.16732758 -0.03604169  0.02366075
   0.20247063 -0.2074334  -0.14076896 -0.05660947 -0.08716191  0.19319645
  -0.14493737 -0.10293356 -0.15622076 -0.2094214  -0.13052833  0.19221196
   0.09977746  0.10837609]]

supplied input
[0.5488135  0.71518937 0.60276338 0.54488318 0.4236548  0.64589411
 0.43758721 0.891773   0.96366276 0.38344152 0.79172504 0.52889492
 0.56804456 0.92559664 0.07103606 0.0871293  0.0202184  0.83261985
 0.77815675 0.87001215]

loss obtained
tensor(27.1065, grad_fn=<MseLossBackward>)

updated_weights
[[-1.6741008e-03  1.1995244e-01 -1.8403849e-01 -1.6456097e-01
  -8.6123139e-02  5.9961796e-02 -4.4303685e-03  1.7729548e-01
  -1.9843772e-02  5.9169173e-02 -6.7576900e-02 -4.3953359e-02
  -2.1362242e-01 -1.4809078e-01 -9.2175886e-02  8.2831979e-03
   8.8399649e-02  1.3416916e-01 -1.5159222e-01 -9.7372442e-02]
 [ 3.3013914e+00  4.3820672e+00  3.4907069e+00  3.3644400e+00
   2.4497607e+00  3.8134565e+00  2.7700217e+00  5.0250616e+00
   5.5135403e+00  2.1932409e+00  4.5582995e+00  3.2964973e+00
   3.1880748e+00  5.3280225e+00  2.6058462e-01  3.0181146e-01
  -1.1896506e-02  5.0776248e+00  4.6656265e+00  5.2131886e+00]]

ptrblck · August 28, 2019, 11:42pm

Are you wondering about the reproducibility of the script?
I just executed it several times on my machine and get always your second output.

Arun_Vishwanathan · August 28, 2019, 11:53pm

yes I was wondering about the consistency of results. Oh that is weird! I wonder why I get both those outputs when I re-run multiple times.

Arun_Vishwanathan · August 29, 2019, 3:45am

I tried it again it does seem that I keep getting those two results. My numpy and torch version are both the latest. Not sure what is going on. Any suggestions to debug this?

Arun_Vishwanathan · August 29, 2019, 3:58am

@ptrblck I think I just found the source of the issue ! So I printed the Pytorch net object and I noticed that there was an ordering issue in the ordered dictionary causing the different outputs.
So when the output is the second case (the one you get), the net looks like this when printed:

Sequential(
  (fc1): Linear(in_features=20, out_features=2, bias=False)
  (relu): ReLU()
)

but in the first case, the net looks like this !!!:

Sequential(
  (relu): ReLU()
  (fc1): Linear(in_features=20, out_features=2, bias=False)
)

the relu and fc are flipped! I think the ordered dictionary should have been specified this way, just took a look at an example in the documentation.

OrderedDict([("fc1", nn.Linear(20, 2, bias=False)),
                                     ("relu", nn.ReLU())])

instead of the

OrderedDict({"fc1": nn.Linear(20, 2, bias=False),
                                     "relu": nn.ReLU()})

my bad that I failed to realize that. With the other way, it now gives me the same results as you.