Thanks for the explanation, I understand. However, I have a follow up question, when I do the instantiation of x
and y
like the following, i.e. using double
to initialize, I still get different answers for the cosine similarity calculated in 2 different ways - the code is the following, seeds are set same as above -
x = torch.randn(28, device = "cuda", dtype=torch.double)
y = torch.randn(28, device = "cuda", dtype=torch.double)
my_dot = torch.dot(x, y)/torch.linalg.norm(y)
cos = nn.CosineSimilarity(dim = 0, eps = 0)
cos_dot = torch.linalg.norm(x) * cos(x,y)
print(my_dot.item())
print(cos_dot.item())
The output of the above snippet is -
-0.139646650365121
-0.13964665036512092
I know that this is a small difference, but nevertheless it is causing my gradients (in my original code) to be nonzero which is causing my back propagation to diverge.
Please let me know if this is expected and if I am missing something.
Thanks again!