nn.CosineSimilarity and custom cosine similarity using dot product giving different results

The following is a minimum working example of the problem that I have come across:

import torch 
import os
import numpy as np
import random
import torch.nn as nn

seed = 0
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic =  True

x = torch.randn(28, device = "cuda", dtype=torch.float)
y = torch.randn(28, device = "cuda", dtype=torch.float)

my_dot = torch.dot(x, y)/torch.linalg.norm(y)
cos = nn.CosineSimilarity(dim = 0, eps = 0)

cos_dot = torch.linalg.norm(x) * cos(x,y)


The output to this snippet on my system is the following -


They are different in the later decimal places, but both must be the same ideally.

When I cast x and y to double using the following lines instead of the above declaration like so -

x = torch.randn(28, device = "cuda", dtype=torch.float).double()
y = torch.randn(28, device = "cuda", dtype=torch.float).double()

I get the following outputs which are same and expected.


Why are they same in double precision but different in float precision?


The difference is ~1e-8 and is expected for float32 due to the limited floating point precision and a potentially different order or operations.

Thanks for the explanation, I understand. However, I have a follow up question, when I do the instantiation of x and y like the following, i.e. using double to initialize, I still get different answers for the cosine similarity calculated in 2 different ways - the code is the following, seeds are set same as above -

x = torch.randn(28, device = "cuda", dtype=torch.double)
y = torch.randn(28, device = "cuda", dtype=torch.double)

my_dot = torch.dot(x, y)/torch.linalg.norm(y)
cos = nn.CosineSimilarity(dim = 0, eps = 0)

cos_dot = torch.linalg.norm(x) * cos(x,y)


The output of the above snippet is -


I know that this is a small difference, but nevertheless it is causing my gradients (in my original code) to be nonzero which is causing my back propagation to diverge.

Please let me know if this is expected and if I am missing something.

Thanks again!

Increasing the bits in the numerical format will give you more precision (the new error is at ~1e-17) but will still be limited.
I would suggest to check your actual requirement (negative gradients) and maybe to apply a small eps value to your calculation or so. You should not expect to get more precision that what’s possible in the current numerical format.

What does numerical format exactly mean in the context of pytorch?

PyTorch uses float32 (i.e. floating point numbers stored in 32 bits) as its default and allows users to use also wider types with more bits (and thus range and precision) such as float64 as well as smaller types such as float16 or bfloat16.
E.g. take a look at this Wikipedia article about float32 which is also called “single-precision” float for more general information about this format and the precision limitations.

The “precision” section might be interesting for you and you could play around with some information about the rounding behavior of this numerical format.

Precision limitations on integer values - Integers between 2**24 and 2**25 round to a multiple of 2 (even number)

can be seen as:

x = torch.tensor(2**24, dtype=torch.float32)
# tensor(16777216.)
print(x + 1)
# tensor(16777216.)
print(x + 2)
# tensor(16777218.)

As you can see, 16777217 is not representable in float32 since the precision limits increase the larger the interval gets.

The round-off errors you are seeing are explained e.g. in this article with a few examples.

1 Like