Hi,
The error in title is triggered by the following lines while training:
k = (y**2).sum(dim=2,keepdim=True)
r = k.sqrt()
So it appears we can have a nan derivative for sqrt(0), how can we circumvent the problem in my case?
Thanks!
Hi,
The error in title is triggered by the following lines while training:
k = (y**2).sum(dim=2,keepdim=True)
r = k.sqrt()
So it appears we can have a nan derivative for sqrt(0), how can we circumvent the problem in my case?
Thanks!
OK, I’ll avoid using sqrt() and formulate the problem in terms of r^2 instead.
can use
torch.sqrt(x + 1e-8)
replace
torch.sqrt(x)
to solve this problem
I met the same problem. However, I don’t recommend your choice to change the sqrt into the square, since it might make the number calculated in your model bigger and bigger. I suppose it might cause a problem if you have multiple this kind of layers.
Thanks for the suggestion!
Thanks, that’s it!