Why do torch.sub produce different results for the same data when shape is 1 and 32?
def fn(shape):
alpha = -2.278090e-04
x = torch.empty(shape, dtype=torch.float16).fill_(0.004044)
y = torch.empty(shape, dtype=torch.float16).fill_(-17.39)
print(torch.sub(x, y, alpha=alpha)[0])
fn(32)
fn(1)
prints
tensor(8.1837e-05, dtype=torch.float16)
tensor(8.0109e-05, dtype=torch.float16)
ptrblck
February 19, 2025, 4:35am
2
This is caused by the limited floating point precision and a different order of operations depending on the used algorithm as also seen e.g. in this simple example:
x = torch.randn(100, 100)
s1 = x.sum()
s2 = x.sum(0).sum(0)
print(s1 - s2)
# tensor(1.5259e-05)
For elementwise, could you elaborate on
different order of operations
I was wondering if this is due to vectorized instructions used for shape > 1.
ptrblck
February 19, 2025, 2:05pm
4
Yes, maybe. You could check which kernel is called via a profiler for your use case exactly.