Hello,

The following is a minimum working example of the problem that I have come across:

```
import torch
import os
import numpy as np
import random
torch.use_deterministic_algorithms(True)
os.environ["CUBLAS_WORKSPACE_CONFIG"]=":16:8"
os.environ["CUBLAS_WORKSPACE_CONFIG"]=":4096:8"
import torch.nn as nn
seed = 0
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
x = torch.randn(28, device = "cuda", dtype=torch.float)
y = torch.randn(28, device = "cuda", dtype=torch.float)
my_dot = torch.dot(x, y)/torch.linalg.norm(y)
cos = nn.CosineSimilarity(dim = 0, eps = 0)
cos_dot = torch.linalg.norm(x) * cos(x,y)
print(my_dot.item())
print(cos_dot.item())
```

The output to this snippet on my system is the following -

```
0.15492278337478638
0.15492276847362518
```

They are different in the later decimal places, but both must be the same ideally.

When I cast `x`

and `y`

to double using the following lines instead of the above declaration like so -

```
x = torch.randn(28, device = "cuda", dtype=torch.float).double()
y = torch.randn(28, device = "cuda", dtype=torch.float).double()
```

I get the following outputs which are same and expected.

```
0.15492288182677755
0.15492288182677755
```

Why are they same in `double`

precision but different in `float`

precision?

Thanks!

The difference is ~1e-8 and is expected for `float32`

due to the limited floating point precision and a potentially different order or operations.

Thanks for the explanation, I understand. However, I have a follow up question, when I do the instantiation of `x`

and `y`

like the following, i.e. using `double`

to initialize, I still get different answers for the cosine similarity calculated in 2 different ways - the code is the following, seeds are set same as above -

```
x = torch.randn(28, device = "cuda", dtype=torch.double)
y = torch.randn(28, device = "cuda", dtype=torch.double)
my_dot = torch.dot(x, y)/torch.linalg.norm(y)
cos = nn.CosineSimilarity(dim = 0, eps = 0)
cos_dot = torch.linalg.norm(x) * cos(x,y)
print(my_dot.item())
print(cos_dot.item())
```

The output of the above snippet is -

```
-0.139646650365121
-0.13964665036512092
```

I know that this is a small difference, but nevertheless it is causing my gradients (in my original code) to be nonzero which is causing my back propagation to diverge.

Please let me know if this is expected and if I am missing something.

Thanks again!

Increasing the bits in the numerical format will give you more precision (the new error is at ~1e-17) but will still be limited.

I would suggest to check your actual requirement (negative gradients) and maybe to apply a small `eps`

value to your calculation or so. You should not expect to get more precision that whatâ€™s possible in the current numerical format.

What does numerical format exactly mean in the context of pytorch?

PyTorch uses `float32`

(i.e. floating point numbers stored in 32 bits) as its default and allows users to use also wider types with more bits (and thus range and precision) such as `float64`

as well as smaller types such as `float16`

or `bfloat16`

.

E.g. take a look at this Wikipedia article about `float32`

which is also called â€śsingle-precisionâ€ť float for more general information about this format and the precision limitations.

The â€śprecisionâ€ť section might be interesting for you and you could play around with some information about the rounding behavior of this numerical format.

E.g.:

Precision limitations on integer values - Integers between `2**24`

and `2**25`

round to a multiple of 2 (even number)

can be seen as:

```
x = torch.tensor(2**24, dtype=torch.float32)
print(x)
# tensor(16777216.)
print(x + 1)
# tensor(16777216.)
print(x + 2)
# tensor(16777218.)
```

As you can see, `16777217`

is not representable in `float32`

since the precision limits increase the larger the interval gets.

The round-off errors you are seeing are explained e.g. in this article with a few examples.

1 Like