I have a cuda kernel taking the argument:
torch::PackedTensorAccessor32<scalar_t, 4, torch::RestrictPtrTraits> input
How do I perform atomic add on the elements of
input[n][c][y][x] += (scalar_t) 1 works, but
atomicAdd(&input[n][c][y][x], 1) does not. The problem is, that it adds 256 instead of 1 to the tensor at the specified position. Thus I assume, that there is a problem with type conversions.
How can I do atomicAdd properly?