I have a cuda kernel taking the argument: torch::PackedTensorAccessor32<scalar_t, 4, torch::RestrictPtrTraits> input
How do I perform atomic add on the elements of input
?
I.e. input[n][c][y][x] += (scalar_t) 1
works, but atomicAdd(&input[n][c][y][x], 1)
does not. The problem is, that it adds 256 instead of 1 to the tensor at the specified position. Thus I assume, that there is a problem with type conversions.
How can I do atomicAdd properly?