In my model, in the forward function, I have one tensor on which I first apply sigmoid and then log. However, the output of sigmoid might be considerably low. The return value of the forward function is :
def forward(dots, path, probs)
apply_paths = dots * path
apply_sigmoid = torch.sigmoid(apply_paths)
apply_log = -1*torch.log(apply_sigmoid)
return torch.sum(probs*apply_log)
I’m getting a lot of time inf
as a return value. The return value is the loss itself, and so the gradients are calculated based on this return value. I’m not sure if this is okay or not. Should I do anything of these inf values after taking log? Or do something about very small values from the sigmoid function?