[resolved] The sum of model.parameters() does not match the sum of the weights

This is probably just me missing out on some critical information, I am getting some NaNs (loss -> inf) in my loss function, so I decided to try to investigate where these weights come from. And in the process I tried to print out the sum of all the different parameters in my mode (alongside with their dims)

def sum_params(model):
    s = []
    for p in model.parameters():
        dims = p.size()
        n = p.cpu().data.numpy()
        s.append((dims, np.sum(n)))
    return s

Now for the embedding layer I got the tuple: (torch.Size([20000, 300]), -38459.16)
However I have previously initialized the embedding as (tried both methods, not sure what
is “correct” in pytorch)

init.uniform(self.embedding.weight, -0.1, 0.1)
self.embedding.weight.data.uniform_(-0.1, 0.1)

So the sum of weights should be “close” to 0 as its expected value is 0.
By explicitly calling sum() on the self.embedding.weight I got the value -48.4358 that seems
more legit. Order of calls: init(), sum()->correct size, model.parameters() -> wrong size.

When I after this start training the sum() immediately matches the sum of model.parameters().
this is also observed in the first forward pass BEFORE any backprob is done.

So I am just wondering what am I missing? Why is my weights overwritten when I start
doing forward passes? The same happens when I remove all backprob/training.

Again, this is probably just missing some info, but I have a hard time wrapping my head around this,
and it is annoying as I would like to migrate from tensorflow to this excellent library :slight_smile:

Ok. Got it working, seems like I messed up a git merge and re inserted the code that
loads the pre-trained embeddings after initializing the model :slight_smile:

For some reason after struglling with this for some days writing a post here made
me find the bug :smiley: Now the difference is within the float accuracy.