Linear behave differently with 2D and 3D input

XinJiade · March 1, 2024, 8:02am

Hi guys,
I run the following code on torch 1.13.1, but got the inequal results. Why Linear behave differently with 2D and 3D input?

a = torch.randn(3, 4, 5)
l1 = torch.nn.Linear(5, 7, bias=True)
torch.equal(l1(a)[0, 0, :], l1(a[0, 0, :]))    # False
torch.equal(l1(a)[0:2, 0:4, :], l1(a[0:2, 0:4, :]))   # True
torch.equal(l1(a)[0:2, 0:3, :], l1(a[0:2, 0:3, :]))   # False

AlphaBetaGamma96 · March 1, 2024, 12:13pm

Hi @XinJiade,

The difference is most likely due to floating point precision, which for torch.float32 precision is around 1e-7. Although the operation are mathematically identical, their floating point operation will differ.

a = torch.randn(3, 4, 5)
l1 = torch.nn.Linear(5, 7, bias=True)
torch.allclose(l1(a)[0, 0, :], l1(a[0, 0, :])) #True
torch.allclose(l1(a)[0:2, 0:4, :], l1(a[0:2, 0:4, :])) #True
torch.allclose(l1(a)[0:2, 0:3, :], l1(a[0:2, 0:3, :])) #True

You can print out the difference between the terms and see that they have elements which differ by 1e-8, when checking the difference between 2 tensors use torch.allclose rather than torch.equal

XinJiade · March 8, 2024, 7:50am

Thank you for your response.
In my situation, these differences will be acummulated as the layers go deeper. I wonder why the operation orders of 2D and 3D linear are different? Are there any documents about the operation orders of 2d and 3d linear?

J_Johnson · March 8, 2024, 8:10am

The bulk of machine learning is a minimization problem. Thus small discrepancies in rounding can act as a form of regularization(albeit, minor) - so it can help prevent overfitting. For this reason, it’s better to use float16 or bfloat16.

ptrblck · March 8, 2024, 1:31pm

This is expected and neither result is “more correct” in such a case and should show a similar error to a wider dtype. If these expected numerical errors, caused by the limited floating point precision, are causing issues for your use case, you might want to use e.g. float64.