Hi,

The error in title is triggered by the following lines while training:

```
k = (y**2).sum(dim=2,keepdim=True)
r = k.sqrt()
```

So it appears we can have a nan derivative for sqrt(0), how can we circumvent the problem in my case?

Thanks!

OK, I’ll avoid using sqrt() and formulate the problem in terms of r^2 instead.