I have a cuda kernel taking the argument: `torch::PackedTensorAccessor32<scalar_t, 4, torch::RestrictPtrTraits> input`

How do I perform atomic add on the elements of `input`

?

I.e. `input[n][c][y][x] += (scalar_t) 1`

works, but `atomicAdd(&input[n][c][y][x], 1)`

does not. The problem is, that it adds 256 instead of 1 to the tensor at the specified position. Thus I assume, that there is a problem with type conversions.

How can I do atomicAdd properly?