Hello, my question is on the output of the loss function (cross entropy) for different models initialized with ones and randn. So, if I initialize as ones, the loss is a valid float (i.e. -107) but if I go with randn, the loss always appears as not-a-nr.
-> I have checked gradients, the grads flow in both cases (model gets updated as well). The only difference being ; one showing an actual loss and rand showing nan loss.
-> For extra info. my sequences are up to 40 timesteps long but as I said I do not suspect vanishing grads since I have checked it manually at every step.
So what could be the problem? Cuz i’m out of ideas Thank you…
edit: the code for loss function is below
def custom_entropy(output_seq, label_seq):
loss_all =  # for all steps
for t in range(len(label_seq)):
lbl = label_seq[t]
pred = output_seq[t]
loss = (-torch.log(pred) * lbl).mean()