This is not directly a PyTorch question, but I’m going to post it here in hopes that someone smarter than I might be able to help anyway.
I am working on a Cuda kernel, which calls some simple helper methods for linear algebra. In particular, one helper method is the dot product. It always returns NaN. My question is why?
The Kernel:
__global__
void find_intersection_kernel(
int n_rays, int n_faces,
at::PackedTensorAccessor32<float_t, 2> vertex_acc,
at::PackedTensorAccessor32<int32_t, 2> face_acc,
at::PackedTensorAccessor32<float_t, 2> origin_acc,
at::PackedTensorAccessor32<float_t, 2> direct_acc,
at::PackedTensorAccessor32<float_t, 2> results) {
int ray_ix = blockIdx.x * blockDim.x + threadIdx.x;
int face_ix = blockIdx.y * blockDim.y + threadIdx.y;
if (ray_ix < n_rays) {
if (face_ix < n_faces) {
// vectors: ray origin and direction
float_t *ray_direction_ptr = &direct_acc[ray_ix][0];
float_t *ray_origin_ptr = &origin_acc[ray_ix][0];
// vectors: vertices of current face
float_t *v0_ptr = &vertex_acc[face_acc[face_ix][0]][0];
float_t *v1_ptr = &vertex_acc[face_acc[face_ix][1]][0];
float_t *v2_ptr = &vertex_acc[face_acc[face_ix][2]][0];
// get edge vectors of current face
float_t edge1[3];
float_t edge2[3];
sub3d(v1_ptr, v0_ptr, edge1);
sub3d(v2_ptr, v0_ptr, edge2);
// determinant of matrix A for eqn Ax = b
float_t h[3];
cross3d(ray_direction_ptr, edge2, h);
printf("(%d, %d)  value of e1 vector: [%f, %f, %f]\n", ray_ix, face_ix, edge1[0], edge1[1], edge1[2]);
printf("(%d, %d)  value of h vector: [%f, %f, %f]\n", ray_ix, face_ix, h[0], h[1], h[2]);
float_t determinant = dot3d(edge1, h);
results[ray_ix][face_ix] = 0;
The troublesome helper:
__device__
float_t dot3d(float_t v1[3], float_t v2[3]) {
printf("(%d, %d)  v1: [%f, %f, %f], v2: [%f, %f, %f]\n", threadIdx.x, threadIdx.y, v1[0], v1[1], v1[2], v2[0], v2[1], v2[2]);
float_t result = 0;
for (int r = 0; r < 3; r++){
float_t prod = v1[r] * v2[r];
printf("(%d, %d, %d)  product: %f\n", r, threadIdx.x, threadIdx.y, prod);
result = result + prod;
}
printf("(%d, %d)  result: %f\n", threadIdx.x, threadIdx.y, result);
return result;
}
What I’ve tried on my own:

using
printf
inside the helper method, I have verified that, for all threads and for allr
values,v1[r]
andv2[r]
are correct. 
I have also verified that the intermediate product in each iteration of the dot product calculation is correct, again except for the last one.

I have verified that the arguments are correct both inside and outside the scope of the function.

I have also worked out the math on paper and verified that the determinant is 4.
(0, 0)  value of e1 vector: [2.000000, 2.000000, 0.000000]
(1, 0)  value of e1 vector: [2.000000, 2.000000, 0.000000]
(0, 0)  value of h vector: [2.000000, 0.000000, 0.000000]
(1, 0)  value of h vector: [2.000000, 0.000000, 0.000000]
(0, 0)  v1: [2.000000, 2.000000, 0.000000], v2: [2.000000, 0.000000, 0.000000]
(1, 0)  v1: [2.000000, 2.000000, 0.000000], v2: [2.000000, 0.000000, 0.000000]
(0, 0, 0)  product: 4.000000
(0, 1, 0)  product: 4.000000
(1, 0, 0)  product: 0.000000
(1, 1, 0)  product: 0.000000
(2, 0, 0)  product: nan
(2, 1, 0)  product: nan
(0, 0)  result: nan
(1, 0)  result: nan
Summary/Question
Why are the first two results correct but the second always NaN? How do I troubleshoot issues like this in the future?