Hi
This is probably just me missing out on some critical information, I am getting some NaNs (loss -> inf) in my loss function, so I decided to try to investigate where these weights come from. And in the process I tried to print out the sum of all the different parameters in my mode (alongside with their dims)
def sum_params(model):
s = []
for p in model.parameters():
dims = p.size()
n = p.cpu().data.numpy()
s.append((dims, np.sum(n)))
return s
Now for the embedding layer I got the tuple: (torch.Size([20000, 300]), -38459.16)
However I have previously initialized the embedding as (tried both methods, not sure what
is “correct” in pytorch)
init.uniform(self.embedding.weight, -0.1, 0.1)
self.embedding.weight.data.uniform_(-0.1, 0.1)
So the sum of weights should be “close” to 0 as its expected value is 0.
By explicitly calling sum() on the self.embedding.weight I got the value -48.4358 that seems
more legit. Order of calls: init(), sum()->correct size, model.parameters() -> wrong size.
When I after this start training the sum() immediately matches the sum of model.parameters().
this is also observed in the first forward pass BEFORE any backprob is done.
So I am just wondering what am I missing? Why is my weights overwritten when I start
doing forward passes? The same happens when I remove all backprob/training.
Again, this is probably just missing some info, but I have a hard time wrapping my head around this,
and it is annoying as I would like to migrate from tensorflow to this excellent library