I have two matrices proj_mat and cam_coords which shape as (B, 4, 4) and (B, 4, N). The matrix cam_coords is just copies of matrix shape as (4, N), i.e., cam_coords[0] == cam_coords[1] == … == cam_coords[B-1]. So according to the broadcast rules I think we should have
And this holds true if I run it using CPU. However, when I switch to GPU the two results no longer matched. So does this happen because of the internal algorithm for Cuda? And how could I fix this problem?
Yes, you typically have approximately fixed relative error (given tensor shapes). If you have 1e-5ish relative error with 1e0ish and now have operands are 1e5ish then the error can get 1e0ish.
You could compare to the same computation with .double() to see which is closer, but in the end there isn’t that much you can do.