What is the machine precision of pytorch with CPUs (or GPUs)?

Brando_Miranda · November 1, 2017, 5:11pm

What is the machine precision when working with pytorch? I know from (https://www.cfd-online.com/Wiki/Machine_precision) that for 32 bit computer its:

2^-52 ~ 10^-16 for double
2^-23 ~ 10^-7 for single

Does that mean that for torch.FloatTensor we have 2^-23 ~ 10^-7? and for 64 bit we get: 2^-104 ~ 10^ -32?

I’m not sure how to actually check this but in numpy/python one can do as its done here https://stackoverflow.com/questions/19141432/python-numpy-machine-epsilon

but if you throw in torch it doesn’t work:

>>> np.finfo(torch.FloatTensor)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/user/miniconda3/envs/hbf_env/lib/python3.6/site-packages/numpy/core/getlimits.py", line 392, in __new__
    raise ValueError("data type %r not inexact" % (dtype))
ValueError: data type <class 'numpy.object_'> not inexact

cbarrick · November 1, 2017, 5:39pm

AFAIK, the PyTorch dtypes have the same representation as the Numpy dtypes. They should all follow the IEEE 754 standard. So you’re right about the precision being 2^-23 for FloatTensor.

I’m not sure how to check it with code though…

Brando_Miranda · September 9, 2020, 7:27pm

fyi if you want to see the datatype of your tensor you might need to access it’s datatype value (assuming pytorch versions don’t change etc):

# %%

import torch

x = torch.randn(3)

print(x)
print(x.dtype)

output:

tensor([-0.8643, -0.6282,  1.3406])
torch.float32

fyi:

recall machine precision:

Machine precision is the smallest number ε such that the difference between 1 and 1 + ε is nonzero, ie., it is the smallest difference between two numbers that the computer recognizes. On a 32 bit computer, single precision is 2-23 (approximately 10-7) while double precision is 2-52 (approximately 10-16) .

I am trying to figure out if what I have is enough to recognize my current error that is 2.00e-7

Brando_Miranda · September 9, 2020, 7:50pm

I am running experiments on synthetic data (e.g. fitting a sine curve) and I get errors in pytorch that are really small. One if about 2.00e-7. I was reading about machine precision and it seems really close to the machine precision. How do I know if this is going to cause problems (or if perhaps it already has e.g. I can’t differentiate between the different errors since they are “machine zero”).

errors:

p = np.array([2.3078539778125768e-07,
               1.9997889411762922e-07,
               2.729681222011256e-07,
               3.2532371115080884e-07])

m = np.array([3.309504692539563e-07,
                 4.1058904888091606e-06,
                 6.8326703386053605e-06,
                 7.4616147721799645e-06])

what confuses me is that I tried adding what I thought was to small of a number so that it returned no difference but it did return a difference (i.e. I tried to do a+eps = a using eps = smaller than machine precision):

import torch

x1 = torch.tensor(1e-6)
x2 = torch.tensor(1e-7)
x3 = torch.tensor(1e-8)
x4 = torch.tensor(1e-9)

eps = torch.tensor(1e-11)

print(x1.dtype)
print(x1)
print(x1+eps)

print(x2)
print(x2+eps)

print(x3)
print(x3+eps)

print(x4)
print(x4+eps)

output:

torch.float32
tensor(1.0000e-06)
tensor(1.0000e-06)
tensor(1.0000e-07)
tensor(1.0001e-07)
tensor(1.0000e-08)
tensor(1.0010e-08)
tensor(1.0000e-09)
tensor(1.0100e-09)

I expected everything to be zero but it wasn’t. Can someone explain to me what is going on? If I am getting losses close to 1e-7 should I use double rather than float? googling it seems that single is the precision for float afaik.

Brando_Miranda · September 10, 2020, 5:24pm

@ptrblck sorry for tagging you. Can someone route me to the person that might know things about numerical issues? I am getting an error of loss = 4.650724394150382e-16 but from what I read in the docs (https://pytorch.org/docs/stable/tensors.html) it seems that pytorch is using single by default and not double…so why is my loss at a scale of -16 if that’s much lower than what it should handle as far as I understand? In a different place I have e-7…so have I hit zero or not? would I be safer if the loss was at -5 or higher and no confusion if it’s zero or not?

ptrblck · September 11, 2020, 5:35am

1e-6 is not the absolute minimal value before the value is rounded to zero as explained e.g. here.
As you can see in the Precision limitation on decimal values section, the fixed interval between “small integer values” is approx. 1e-7, which is why this can be used as the minimal step size between these values.
The interpretation is of course depending on your use case and e.g. in the ML domain absolute errors of ~1e-6 are commonly accepted.

Here is a small example using the minimal denormal values as well as larger values with a fixed interval of 1:

torch.set_printoptions(precision=30)

a = torch.tensor(2**(-149)) # smalles denormal number
print(a)

b = torch.tensor(2**(-150)) # not representable
print(b)

x = torch.tensor(1.)
y = x + torch.tensor(2**(-23)) # addition representable
print(y)
print(x - y)

z = x + torch.tensor(2**(-24)) # not representable
print(z)

c = torch.tensor(2**23) # step size is 1
d = c + torch.tensor(0.5) # rounded
print(d - c)
e = c + torch.tensor(1.) # representable
print(e - c)

Brando_Miranda · September 15, 2020, 10:04pm

I’ve read the article you sent sort of carefully plus this one (https://www.doc.ic.ac.uk/~eedwards/compsys/float).

What I am trying to understand is when is adding two numbers result in errors. My understanding from the article is that 7.33 decimal places is what is “remembered” in the mantissa. But it is also tracking the floating point location with the exponent.

So my current understanding is that if both numbers are in the same exponent range then adding them is fine (as long as it’s not smaller than 7.3 decimals…and if has parts in that range then the computer wouldn’t even be able to store that part anyway).

So what I am struggling to really articulate is when we get into issues. If we have both number be (1.0 + 0.M)*2^-7 and (1.0+M)*2^4 range…would we get into problems?

I guess what I’m trying to understand is when underflows and overflows occur (and hoping that is the only type of error).

My loss is really small (from my synthetic data set), so it’s not a standard scenario.

Brando_Miranda · September 16, 2020, 8:20pm

Ok let’s see if this is a good summary of what I think is correct (modulo ignoring some details that I don’t fully understand right now of floats, like the bias).

But I’ve concluded that the best thing for me is to make sure my errors/numbers have two properties:

1. they are within 7decimals of each other (due to the mantissa being 24 bigs like you pointed out the log_10(2^24) = 7.225)
1. they are far enough from the edges. For this I take the mantissa to be 23 bits away from the lower edge (point position about -128+23) and the same for the largest edge but 127-23.

As long we satisfy that more or less we avoid adding two numbers that are too small for the machine to distinguish (condition 1) and avoid overflows/underflows (condition 2).

Perhaps there is a small detail I might be missing with the bias or some other float detail (like representing infinity, NaN). But I believe that is correct.

If anyone can correct the details, that would be fantastic.

my current conclusion is because most learning rates are 1e-1 ish I will have everything be 7 decimals from zero. That seems the safest. As lost as every number is in the range of 1.0 ± 7 decimals it should be fine. This really guarantees there won’t be weird machine precision if we stick to an absolute scale rather than a fixed one.

liangbright · July 14, 2021, 3:43pm

“in the ML domain absolute errors of ~1e-6 are commonly accepted.” Could you provide a reference paper? so that I can put it in my paper to justify 1e-6. thank you.

ptrblck · July 14, 2021, 9:27pm

No, I don’t have a reference paper at hand.