Hi,

The following is the mwe of my issue -

```
import torch
import os
import numpy as np
import random
import torchvision
import torch.nn as nn
torch.set_printoptions(precision=60)
class CosineDistance(nn.Module):
def __init__(self, weight = 1, reduction = "sum"):
super(CosineDistance,self).__init__()
self.weight = weight
self.reduction = reduction
self.cos = nn.CosineSimilarity(dim=1, eps=0)
def forward(self, pred, gt):
cossim = self.cos(pred, gt)
print(cossim)
dist = 1 - cossim
if self.reduction == "sum":
dist = torch.sum(dist)
elif self.reduction == "mean":
dist = torch.mean(dist)
else:
raise ValueError("Please enter correct reduction.")
return self.weight * dist
loss_fn = CosineDistance(weight=1, reduction = "mean")
x = torch.randn((10, 512), device = "cuda", dtype=torch.double)
y = x
print((x!=y).sum())
#my_dot = torch.dot(x, y)/(torch.linalg.norm(y) * torch.linalg.norm(x))
cos = nn.CosineSimilarity(dim = 1, eps = 0)
cos_dot = loss_fn(x, y)
print(cos_dot)
```

and I get the following output -

```
tensor(0, device='cuda:0')
tensor([0.999999999999999888977697537484345957636833190917968750000000,
1.000000000000000000000000000000000000000000000000000000000000,
0.999999999999999888977697537484345957636833190917968750000000,
1.000000000000000000000000000000000000000000000000000000000000,
1.000000000000000000000000000000000000000000000000000000000000,
1.000000000000000000000000000000000000000000000000000000000000,
1.000000000000000000000000000000000000000000000000000000000000,
1.000000000000000000000000000000000000000000000000000000000000,
1.000000000000000222044604925031308084726333618164062500000000,
1.000000000000000000000000000000000000000000000000000000000000],
device='cuda:0', dtype=torch.float64)
tensor(0., device='cuda:0', dtype=torch.float64)
```

I am guessing this is a numerical issue, but I want to confirm if that is the case indeed. Also it seems like the numerical issue only happenes something for some tensors and not for others, any reason behind this?

Related issues are Why torch.nn.CosineSimilarity() gives different results for half and full tensor? - #5 by ptrblck and nn.CosineSimilarity and custom cosine similarity using dot product giving different results - #6 by ptrblck which have been answered, but I want to just make sure that the cause of this issue is the same as the other 2 links. I am thinking the probable cause is the division floating point op.

Thanks and please let me know if I am missing anything.

Megh