Matmul boradcasting inconsistent results

Hi there!

I have two matrices proj_mat and cam_coords which shape as (B, 4, 4) and (B, 4, N). The matrix cam_coords is just copies of matrix shape as (4, N), i.e., cam_coords[0] == cam_coords[1] == … == cam_coords[B-1]. So according to the broadcast rules I think we should have

proj_mat @ cam_coords == proj_mat @ cam_coords[0].

And this holds true if I run it using CPU. However, when I switch to GPU the two results no longer matched. So does this happen because of the internal algorithm for Cuda? And how could I fix this problem?

Likely the results are equal up to numerical precision.

However, the results can have about ±5 difference and have a real impact on my model training.

Yes, you typically have approximately fixed relative error (given tensor shapes). If you have 1e-5ish relative error with 1e0ish and now have operands are 1e5ish then the error can get 1e0ish.
You could compare to the same computation with .double() to see which is closer, but in the end there isn’t that much you can do.

Best regards

Thomas

1 Like