Hi,

The error in title is triggered by the following lines while training:

```
k = (y**2).sum(dim=2,keepdim=True)
r = k.sqrt()
```

So it appears we can have a nan derivative for sqrt(0), how can we circumvent the problem in my case?

Thanks!

Hi,

The error in title is triggered by the following lines while training:

```
k = (y**2).sum(dim=2,keepdim=True)
r = k.sqrt()
```

So it appears we can have a nan derivative for sqrt(0), how can we circumvent the problem in my case?

Thanks!

1 Like

OK, I’ll avoid using sqrt() and formulate the problem in terms of r^2 instead.

1 Like

can use

torch.sqrt(x + **1e-8**)

replace

torch.sqrt(x)

to solve this problem

7 Likes

I met the same problem. However, I don’t recommend your choice to change the sqrt into the square, since it might make the number calculated in your model bigger and bigger. I suppose it might cause a problem if you have multiple this kind of layers.

Thanks for the suggestion!

Thanks, that’s it!