nn.CosineSimilarity and custom cosine similarity using dot product giving different results

Megh_Bhalerao · October 3, 2022, 1:49am

Hello,
The following is a minimum working example of the problem that I have come across:

import torch 
import os
import numpy as np
import random
torch.use_deterministic_algorithms(True)
os.environ["CUBLAS_WORKSPACE_CONFIG"]=":16:8"
os.environ["CUBLAS_WORKSPACE_CONFIG"]=":4096:8"
import torch.nn as nn

seed = 0
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic =  True

x = torch.randn(28, device = "cuda", dtype=torch.float)
y = torch.randn(28, device = "cuda", dtype=torch.float)


my_dot = torch.dot(x, y)/torch.linalg.norm(y)
cos = nn.CosineSimilarity(dim = 0, eps = 0)

cos_dot = torch.linalg.norm(x) * cos(x,y)

print(my_dot.item())
print(cos_dot.item())

The output to this snippet on my system is the following -

0.15492278337478638
0.15492276847362518

They are different in the later decimal places, but both must be the same ideally.

When I cast x and y to double using the following lines instead of the above declaration like so -

x = torch.randn(28, device = "cuda", dtype=torch.float).double()
y = torch.randn(28, device = "cuda", dtype=torch.float).double()

I get the following outputs which are same and expected.

0.15492288182677755
0.15492288182677755

Why are they same in double precision but different in float precision?

Thanks!

ptrblck · October 3, 2022, 2:33am

The difference is ~1e-8 and is expected for float32 due to the limited floating point precision and a potentially different order or operations.

Megh_Bhalerao · October 4, 2022, 3:52am

Thanks for the explanation, I understand. However, I have a follow up question, when I do the instantiation of x and y like the following, i.e. using double to initialize, I still get different answers for the cosine similarity calculated in 2 different ways - the code is the following, seeds are set same as above -

x = torch.randn(28, device = "cuda", dtype=torch.double)
y = torch.randn(28, device = "cuda", dtype=torch.double)

my_dot = torch.dot(x, y)/torch.linalg.norm(y)
cos = nn.CosineSimilarity(dim = 0, eps = 0)

cos_dot = torch.linalg.norm(x) * cos(x,y)

print(my_dot.item())
print(cos_dot.item())

The output of the above snippet is -

-0.139646650365121
-0.13964665036512092

I know that this is a small difference, but nevertheless it is causing my gradients (in my original code) to be nonzero which is causing my back propagation to diverge.

Please let me know if this is expected and if I am missing something.

Thanks again!

ptrblck · October 4, 2022, 4:07am

Increasing the bits in the numerical format will give you more precision (the new error is at ~1e-17) but will still be limited.
I would suggest to check your actual requirement (negative gradients) and maybe to apply a small eps value to your calculation or so. You should not expect to get more precision that what’s possible in the current numerical format.

Megh_Bhalerao · October 4, 2022, 4:10am

What does numerical format exactly mean in the context of pytorch?

ptrblck · October 4, 2022, 4:23am

PyTorch uses float32 (i.e. floating point numbers stored in 32 bits) as its default and allows users to use also wider types with more bits (and thus range and precision) such as float64 as well as smaller types such as float16 or bfloat16.
E.g. take a look at this Wikipedia article about float32 which is also called “single-precision” float for more general information about this format and the precision limitations.

The “precision” section might be interesting for you and you could play around with some information about the rounding behavior of this numerical format.
E.g.:

Precision limitations on integer values - Integers between 2**24 and 2**25 round to a multiple of 2 (even number)

can be seen as:

x = torch.tensor(2**24, dtype=torch.float32)
print(x)
# tensor(16777216.)
print(x + 1)
# tensor(16777216.)
print(x + 2)
# tensor(16777218.)

As you can see, 16777217 is not representable in float32 since the precision limits increase the larger the interval gets.

The round-off errors you are seeing are explained e.g. in this article with a few examples.