I have two matrices proj_mat and cam_coords which shape as (B, 4, 4) and (B, 4, N). The matrix cam_coords is just copies of matrix shape as (4, N), i.e., cam_coords == cam_coords == … == cam_coords[B-1]. So according to the broadcast rules I think we should have
proj_mat @ cam_coords == proj_mat @ cam_coords.
And this holds true if I run it using CPU. However, when I switch to GPU the two results no longer matched. So does this happen because of the internal algorithm for Cuda? And how could I fix this problem?